# OpenAI - Demystifying Temperature and TopP

Let's see this behavior in an example using the .NET OpenAI SDK and the Chat Completion API:

### Step 1: Create *OpenAIClient*:

In [6]:
#r "nuget: Azure.AI.OpenAi, 1.0.0-beta.5"
#r "nuget: Azure.Core, 1.30.0"

using Azure;
using Azure.AI.OpenAI; 

//Azure OpenAI model information 
Uri apiEndpoint = new Uri("provide your API endpoint");
string apiKey = "provide your API key"; 
string modelDeploymentName = "provide your deployment name"; 

//Create OpenAIClient & ChatApi
AzureKeyCredential azureKeyCredential = new AzureKeyCredential(apiKey);
OpenAIClient openAIClient = new OpenAIClient(
    apiEndpoint,
    azureKeyCredential
);

#!share --from c# openAIClient --as openAIClient 
#!share --from c# modelDeploymentName --as modelDeploymentName


P.S.: The necessary Azure environment (Azure OpenAI, model deployment etc.) can be created using the provided [Azure CLI script](../CreateEnv/CreateEnv.azcli).The API endpoint, API key and model deployment name to run the code snippet are provided in environment variables:

```azurecli
$ENV:AZURE_OPENAI_ENDPOINT = $csEndpoint
$ENV:AZURE_OPENAI_API_KEY = $csApiKey
$ENV:AZURE_OPENAI_DEPLOYMENTNAME = $modelDeploymentName
```

### Step 2: Call Azure Open AI chat API endpoint:

The first call uses the following parameter:

- **System Message:** "You are an AI assistant that completes statements and phrases. You just finish the provided statement!
- **User:** "Once upon a time"
- **Temperature:** Set to 1 indicating to the model to be "creative" with responses.

In [13]:
#r "nuget: Azure.AI.OpenAi, 1.0.0-beta.5"

using Azure.AI.OpenAI; 

string system = "You are an AI assistant that completes statements and phrases. You just finish the provided statement!";
string user = "Once upon a time"; 

//Compose Chat
ChatCompletionsOptions chatCompletionsOptions = new ChatCompletionsOptions();
chatCompletionsOptions.Messages.Add(new ChatMessage(ChatRole.System, instructions));
chatCompletionsOptions.Messages.Add(new ChatMessage(ChatRole.User, user));

//Request Properties
chatCompletionsOptions.Temperature = 1;  
chatCompletionsOptions.StopSequences.Add(".");

//Call OpenAI
for (int i= 0; i<3; i++) {
    Response<ChatCompletions> response = await openAIClient.GetChatCompletionsAsync(
        modelDeploymentName,
        chatCompletionsOptions);
    ChatCompletions completions = response.Value;
    Console.WriteLine($"Assistant: {completions.Choices[0].Message.Content}".Trim());
}

Assistant: there was a small village tucked away in the mountains
Assistant: there was a small village located on the edge of a vast forest
Assistant: there was a small town that held an annual talent show competition


Three different responses are created. GPT models are trained on large and diverse datasets. Meaning there are plenty of possible completions to the above simple ***Once upon a time*** user interaction. Hence the different responses from the LLM.


## Temperature & TopP

By providing Temperature and/or TopP to the model the variability of responses can be influenced. Let's first have a look to Temperature

### Temperature

Temperature is a float value with a range between 0 and 1. Where 0 indicates to the model to be more deterministic meaning less variable response should be created. 1 indicates to the model that it can respond with more "creativity" and be less deterministic.

Let's re-run the example with a Temperature of 0 to indicate to the model to be more deterministic:

In [14]:
//Request Properties
chatCompletionsOptions.Temperature = 0;  
chatCompletionsOptions.StopSequences.Add(".");

//Call OpenAI
for (int i= 0; i<3; i++) {
    Response<ChatCompletions> response = await openAIClient.GetChatCompletionsAsync(
        modelDeploymentName,
        chatCompletionsOptions);
    ChatCompletions completions = response.Value;
    Console.WriteLine($"Assistant: {completions.Choices[0].Message.Content}".Trim());
}

Assistant: there was a small village nestled in the heart of a dense forest
Assistant: there was a small village nestled in the heart of a dense forest
Assistant: there was a small village nestled in the heart of a dense forest


As we've seen above Temperature controls the variability of the models responses based on it's training data.

### Temperature set to close to 1 is not a guarantee to get creative responses

Let's try the same setting with a different user interaction: ***May the force be with***. A Temperature of 0.7 indicates to the model to be "creative" in responses:

In [15]:
user = "May the force be with ";

//Request Properties
chatCompletionsOptions.Temperature = (float)0.7;  
chatCompletionsOptions.StopSequences.Add(".");
chatCompletionsOptions.Messages.Add(new ChatMessage(ChatRole.User, user));

//Call OpenAI
for (int i= 0; i<3; i++) {
    Response<ChatCompletions> response = await openAIClient.GetChatCompletionsAsync(
        modelDeploymentName,
        chatCompletionsOptions);
    ChatCompletions completions = response.Value;
    Console.WriteLine($"Assistant: {completions.Choices[0].Message.Content}".Trim());
}

Assistant: you
Assistant: you
Assistant: you


The model responses 3 times with the same result because it's training data doesn't provided to many variations of ***May the force be with...*** even if we provide a Temperature closer to 1. Meaning providing a Temperature value close to 1 does not automatically ensure that different responses are created. It makes it more likely if in the models training data contained multiple variants to respond or complete. 

## TopP

 TopP can be used to achieve a similar outcome but it works differently. TopP is also a float value between 0 and 1 and it limits the amount of potential responses from a LLM to the request. 

Let's assume the LLM has 100 potential tokens to complete the response a TopP value of 0.3 instructs the model to consider just 30 percent of the potential completions. Providing 0 as TopP limits the potential responses to the top completion possibility.

Let's take the first example with the simplified prompt ***Once upon a time***, providing a Temperature of 1 which indicates the model to be "creative" with completions but provide a TopP of 0 which indicates to the model to just use the top completion.

In [16]:
user = "Once upon a time"; 

//Prompt 
chatCompletionsOptions.Messages.Add(new ChatMessage(ChatRole.User, user));

//Request Properties
chatCompletionsOptions.Temperature = (float)1;  
chatCompletionsOptions.NucleusSamplingFactor = (float)0;
chatCompletionsOptions.StopSequences.Add(".");

//Call OpenAI
for (int i= 0; i<3; i++) {
    Response<ChatCompletions> response = await openAIClient.GetChatCompletionsAsync(
        modelDeploymentName,
        chatCompletionsOptions);
    ChatCompletions completions = response.Value;
    Console.WriteLine(completions.Choices[0].Message.Content);
}

, in a far-off kingdom, there lived a beautiful princess named Isabella
, in a far-off kingdom, there lived a beautiful princess named Isabella
in a far-off kingdom, there lived a beautiful princess named Isabella


## Summary

Temperature and TopP influence the creativity and completions from the model and can be used in combination and provides some fine tuned control over the responses from the LLM. 

Sometimes influencing the completions from the LLM with one parameter is enough. Therefore as a rule of thumb: 

- Influencing responses using TopP -> Set Temperature to 1
- Influencing responses using Temperature -> Set TopP to 1