# Temparature
Temperature controls the degree of randomness in token selection. Lower temperatures are good for prompts that expect a more deterministic response, while higher temperatures can lead to more diverse or unexpected results. A temperature of 0 (greedy decoding) is deterministic: the highest probability token is always selected (though note that if two tokens have the same highest predicted probability, depending on how tiebreaking is implemented you may not always get the same output with temperature 0).

Temperatures close to the max tend to create more random output. And as temperature gets higher and higher, all tokens become equally likely to be the next predicted token.

## How It Works (Conceptually)
1. The model generates a probability distribution over all possible next tokens (say 50,000+).
2. These probabilities are logits (raw scores) which are then softmaxed into a probability distribution.
3. Temperature is applied before softmax like this:

```python
adjusted_logits = logits / temperature
probs = softmax(adjusted_logits)
```
So, the temperature directly scales the logits before they go through softmax.

## Examples

In [1]:
from ollama import generate, Options, GenerateResponse
from typing import Iterator

### Temparature = 0

|             |                                                                                       |
|-------------|---------------------------------------------------------------------------------------|
| Name        | 01_temp_0                                                                             |
| Model       | Gemma3:12b                                                                            |
| Temperature | 0                                                                                     |
| Token Limit | -                                                                                     |
| Top-K       | -                                                                                     |
| Top-P       | -                                                                                     |
| Prompt      | what is the capital of germany?                                                       |


In [2]:
stream: Iterator[GenerateResponse] = generate(
    model="gemma3:12b",
    prompt="what is the capital of Germany?",
    options=Options(temperature=0),
    stream=True
)
for chunk in stream:
     print(chunk.response, end="", flush=True)

The capital of Germany is **Berlin**.



It's also the largest city in Germany!

This response does not change with different runs. The model always returns the same output. This is because the temperature is set to 0, which means that the model will always select the token with the highest probability.


### Temparature = 1


|             |                                                                                                  |
|-------------|--------------------------------------------------------------------------------------------------|
| Name        | 01_temp_1_1                                                                                      |
| Model       | Gemma3:12b                                                                                       |
| Temperature | 1                                                                                                |
| Token Limit | -                                                                                                |
| Top-K       | -                                                                                                |
| Top-P       | -                                                                                                |
| Prompt      | what is the capital of germany?                                                                  |


#### Example 1

In [13]:
stream: Iterator[GenerateResponse] = generate(
    model="gemma3:12b",
    prompt="what is the capital of Germany?",
    options=Options(temperature=1),
    stream=True
)
for chunk in stream:
    print(chunk.response, end="", flush=True)

The capital of Germany is **Berlin**.



It's a vibrant and historic city!

#### Example 2


In [14]:
stream: Iterator[GenerateResponse] = generate(
    model="gemma3:12b",
    prompt="what is the capital of Germany?",
    options=Options(temperature=1),
    stream=True
)
for chunk in stream:
    print(chunk.response, end="", flush=True)

The capital of Germany is **Berlin**.



It's also the largest city in Germany!

This response changes with different runs. The model returns different outputs each time. This is because the temperature is set to 1, which means that the model will select the next token based on a probability distribution. The higher the temperature, the more random the output will be.


### Temparature = 10 (unhinged mode)
|             |                                 |
|-------------|---------------------------------|
| Name        | 01_temp_10_1                    |
| Model       | Gemma3:12b                      |
| Temperature | 10                              |
| Token Limit | -                               |
| Top-K       | -                               |
| Top-P       | -                               |
| Prompt      | what is the capital of germany? |

#### Example 1

In [16]:
stream: Iterator[GenerateResponse] = generate(
    model="gemma3:12b",
    prompt="what is the capital of Germany?",
    options=Options(temperature=10),
    stream=True
)
for chunk in stream:
    print(chunk.response, end="", flush=True)

The **federal⌁ **German Empire – The "federal country’£of Switzerland. Berlinis

berlind
sist the ꩜onpiredal!t the isd lrgssissta isci, with capital in of in in, the nation ѕрr. iis also one s, capitalss and h⸴ssincc a nation e is of capitalst on, from r as r
l, federal ooo of eess!s!t d as he as from with h to of oo h ss yess st
Germanyss st hhe ess!!it y ess ooo .of it is a great and it st tt to r oo ee!h e ss

This meaningless word salads are the result of setting the temperature to 10. The model is now in "unhinged mode", where it generates random and nonsensical text. This is because the temperature is set to a very high value, which means that the model will select the next token based on a very flat probability distribution.
which means that all tokens are equally likely to be selected. This results in a completely random output that does not make any sense.

