# OpenAI - Demystifying Temperature and TopP

## Intro

Both Temperature and TopP influence the diversity of responses from LLM models. LLMs aren't deterministic systems, meaning return values can differ even if the same prompt is provided to the LLM. 

Let's see this behavior in an example using Powershell and curl:

### Call Azure Open AI chat API endpoint:

The first call uses the following parameter:

- **System Message:** "You are an AI assistant that completes statements and phrases. You just finish the provided statement!
- **User:** "Once upon a time"
- **Temperature:** Set to 1 indicating to the model to be "creative" with responses.

P.S.: The necessary Azure environment (Azure OpenAI, model deployment etc.) can be created using the provided [Azure CLI script](../CreateEnv/CreateEnv.azcli).The API endpoint, API key and model deployment name to run curl are provided in environment variables:

```azurecli
$ENV:AZURE_OPENAI_ENDPOINT = $csEndpoint
$ENV:AZURE_OPENAI_API_KEY = $csApiKey
$ENV:AZURE_OPENAI_DEPLOYMENTNAME = $modelDeploymentName
```

In [3]:
$apiEndpoint = "provide your API endpoint"
$apiKey = "provide your API key"
$deploymentName = "provide your deployment name"


$url = "$apiEndpoint/openai/deployments/$deploymentName/chat/completions?api-version=2023-03-15-preview"

$jsonPayload = @"
{
    "messages": [
        {
            "role": "system", 
            "content": "You are an AI assistant that completes statements and phrases. You just finish the provided statement!"
        }, 
        {
            "role": "user",
            "content": "Once upon a time" 
        }
    ], 
    "max_tokens": 800,
    "temperature": 1,
    "stop": ["."]
}
"@

for ($i=1; $i -le 3; $i++) {
    $response = curl $url `
    -H "Content-Type: application/json" `
    -H "api-key: $apiKey" `
    -d $jsonpayload 
    
    Write-Host ($response | ConvertFrom-Json).choices.message.content 
}

#!set --value @pwsh:apiEndpoint --name apiEndpoint
#!set --value @pwsh:apiKey --name apiKey
#!set --value @pwsh:deploymentName --name deploymentName


[31;1m  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current[0m
[31;1m                                 Dload  Upload   Total   Spent    Left  Speed[0m
[31;1m[0m
[31;1m  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0[0m
[31;1m100   742  100   364  100   378    308    320  0:00:01  0:00:01 --:--:--   630[0m
[31;1m100   742  100   364  100   378    308    320  0:00:01  0:00:01 --:--:--   630[0m
in a faraway land, there lived a handsome prince who was cursed by an evil sorceress
[31;1m  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current[0m
[31;1m                                 Dload  Upload   Total   Spent    Left  Speed[0m
[31;1m[0m
[31;1m  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0[0m
[31;1m100   745  100   367  100   378    345    355  0:00:01  0:00:01 --:--:--   703[0m
[31;1m100   745  100   367  100   378    344    355  0:00:01  0:00:01 --:--:--  

Three different responses are created. GPT models are trained on large and diverse datasets. Meaning there are plenty of possible completions to the above simple ***Once upon a time*** user interaction. Hence the different responses from the LLM.


## Temperature & TopP

By providing Temperature and/or TopP to the model the variability of responses can be influenced. Let's first have a look to Temperature

### Temperature

Temperature is a float value with a range between 0 and 1. Where 0 indicates to the model to be more deterministic meaning less variable response should be created. 1 indicates to the model that it can respond with more "creativity" and be less deterministic.

Let's re-run the example with a Temperature of 0 to indicate to the model to be more deterministic:

In [4]:
$jsonPayload = @"
{
    "messages": [
        {
            "role": "system", 
            "content": "You are an AI assistant that completes statements and phrases. You just finish the provided statement!"
        }, 
        {
            "role": "user",
            "content": "Once upon a time" 
        }
    ], 
    "max_tokens": 800,
    "temperature": 0,
    "stop": ["."]
}
"@

for ($i=1; $i -le 3; $i++) {
    $response = curl $url `
    -H "Content-Type: application/json" `
    -H "api-key: $apiKey" `
    -d $jsonpayload 
    
    Write-Host ($response | ConvertFrom-Json).choices.message.content 
}

[31;1m  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current[0m
[31;1m                                 Dload  Upload   Total   Spent    Left  Speed[0m
[31;1m[0m
[31;1m  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0[0m
[31;1m  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0[0m
[31;1m100   776  100   398  100   378    295    280  0:00:01  0:00:01 --:--:--   575[0m
in a far-off land, there lived a brave knight who embarked on a perilous quest to save the kingdom from an evil dragon
[31;1m  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current[0m
[31;1m                                 Dload  Upload   Total   Spent    Left  Speed[0m
[31;1m[0m
[31;1m  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0[0m
[31;1m100   378    0     0  100   378      0    276  0:00:01  0:00:01 --:--:--   277[0m
[31;1m100   776  100   398  100   378    290  

As we've seen above Temperature controls the variability of the models responses based on it's training data.

### Temperature set to close to 1 is not a guarantee to get creative responses

Let's try the same setting with a different user interaction: ***May the force be with***. A Temperature of 0.7 indicates to the model to be "creative" in responses:

In [6]:
$jsonPayload = @"
{
    "messages": [
        {
            "role": "system", 
            "content": "You are an AI assistant that completes statements and phrases. You just finish the provided statement!"
        }, 
        {
            "role": "user",
            "content": "May the force be with" 
        }
    ], 
    "max_tokens": 800,
    "temperature": 0.7,
    "stop": ["."]
}
"@

for ($i=1; $i -le 3; $i++) {
    $response = curl $url `
    -H "Content-Type: application/json" `
    -H "api-key: $apiKey" `
    -d $jsonpayload 
    
    Write-Host ($response | ConvertFrom-Json).choices.message.content 
}

[31;1m  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current[0m
[31;1m                                 Dload  Upload   Total   Spent    Left  Speed[0m
[31;1m[0m
[31;1m  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0[0m
[31;1m100   667  100   282  100   385    380    519 --:--:-- --:--:-- --:--:--   902[0m
you
[31;1m  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current[0m
[31;1m                                 Dload  Upload   Total   Spent    Left  Speed[0m
[31;1m[0m
[31;1m  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0[0m
[31;1m  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0[0m
[31;1m100   667  100   282  100   385    388    529 --:--:-- --:--:-- --:--:--   920[0m
you
[31;1m  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current[0m
[31;1m                                 Dload  Upload   Total   Spe

The model responses 3 times with the same result because it's training data doesn't provided to many variations of ***May the force be with...*** even if we provide a Temperature closer to 1. Meaning providing a Temperature value close to 1 does not automatically ensure that different responses are created. It makes it more likely if in the models training data contained multiple variants to respond or complete. 

## TopP

 TopP can be used to achieve a similar outcome but it works differently. TopP is also a float value between 0 and 1 and it limits the amount of potential responses from a LLM to the request. 

Let's assume the LLM has 100 potential tokens to complete the response a TopP value of 0.3 instructs the model to consider just 30 percent of the potential completions. Providing 0 as TopP limits the potential responses to the top completion possibility.

Let's take the first example with the simplified prompt ***Once upon a time***, providing a Temperature of 1 which indicates the model to be "creative" with completions but provide a TopP of 0 which indicates to the model to just use the top completion.

In [8]:
$jsonPayload = @"
{
    "messages": [
        {
            "role": "system", 
            "content": "You are an AI assistant that completes statements and phrases. You just finish the provided statement!"
        }, 
        {
            "role": "user",
            "content": "Once upon a time" 
        }
    ], 
    "max_tokens": 800,
    "temperature": 1,
    "top_p": 0,
    "stop": ["."]
}
"@

for ($i=1; $i -le 3; $i++) {
    $response = curl $url `
    -H "Content-Type: application/json" `
    -H "api-key: $apiKey" `
    -d $jsonpayload 
    
    Write-Host ($response | ConvertFrom-Json).choices.message.content 
}

[31;1m  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current[0m
[31;1m                                 Dload  Upload   Total   Spent    Left  Speed[0m
[31;1m[0m
[31;1m  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0[0m
[31;1m100   793  100   398  100   395    320    317  0:00:01  0:00:01 --:--:--   639[0m
[31;1m100   793  100   398  100   395    319    317  0:00:01  0:00:01 --:--:--   638[0m
in a far-off land, there lived a brave knight who embarked on a perilous quest to save the kingdom from an evil dragon
[31;1m  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current[0m
[31;1m                                 Dload  Upload   Total   Spent    Left  Speed[0m
[31;1m[0m
[31;1m  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0[0m
[31;1m100   793  100   398  100   395    322    319  0:00:01  0:00:01 --:--:--   644[0m
[31;1m100   793  100   398  100   395    322  

## Summary

Temperature and TopP influence the creativity and completions from the model and can be used in combination and provides some fine tuned control over the responses from the LLM. 

Sometimes influencing the completions from the LLM with one parameter is enough. Therefore as a rule of thumb: 

- Influencing responses using TopP -> Set Temperature to 1
- Influencing responses using Temperature -> Set TopP to 1