# Output length
Reducing the output length of the LLM doesn’t cause the LLM to become more stylistically or textually succinct in the output it creates, it just causes the LLM to stop predicting more tokens once the limit is reached. If your needs require a short output length, you’ll also
possibly need to engineer your prompt to accommodate.
Output length restriction is especially important for some LLM prompting techniques, like ReAct, where the LLM will keep emitting useless tokens after the response you want.

## Example (without output length restriction)

In [2]:
from ollama import generate, Options, GenerateResponse
from typing import Iterator

stream: Iterator[GenerateResponse] = generate(
    model="gemma3:12b",
    prompt="why is the sky blue?",
    options=Options(num_predict=-1), # no output length restriction
    stream=True
)
for chunk in stream:
    print(chunk.response, end="", flush=True)

Okay, let's break down why the sky is blue! It's a classic question with a fascinating scientific explanation. Here's the breakdown, simplified and then with a bit more detail:

**The Simple Explanation (Think of it like this):**

* **Sunlight is made of all colors:** Think of a rainbow – that's sunlight split into all its colors.
* **Air is full of tiny stuff:** The air around us is made up of tiny molecules, mostly nitrogen and oxygen.
* **Blue light gets scattered more:** When sunlight hits these molecules, the blue and violet light waves are scattered *much* more than other colors (like red and yellow).  Think of it like bouncing a ball – the smaller the ball, the easier it bounces off obstacles.
* **We see the scattered blue light:**  Because blue light is scattered all over the place, that's the color we see when we look up at the sky.

**The More Detailed Explanation (with some science terms):**

* **Rayleigh Scattering:** The phenomenon responsible for the blue sky is called *R

See the huge output. The LLM keeps generating tokens since there is no output length restriction.

## Example (with output length restriction)

In [3]:
from ollama import generate, Options, GenerateResponse
from typing import Iterator

stream: Iterator[GenerateResponse] = generate(
    model="gemma3:12b",
    prompt="why is the sky blue?",
    options=Options(num_predict=20),
    stream=True
)
for chunk in stream:
    print(chunk.response, end="", flush=True)

Okay, let's break down why the sky is blue. It's a fascinating phenomenon rooted

Now the output is limited to 20 tokens. The LLM stops generating tokens after the limit is reached. But that does not necessarily mean that it will be a proper response. in this example, we see that the LLM stops generating tokens after the 20th token, but the response is not complete. You may need to engineer your prompt to accommodate the output length restriction.