# Chat Completions Two

## Temperature

### Controlling Creativity with the Temperature Parameter

In this example, we introduce the `temperature` parameter to control the randomness and creativity of the model's responses. A lower temperature (e.g., `0`) produces deterministic, predictable outputs suitable for clear and consistent writing. A higher temperature (closer to `1`) yields more creative and varied responses.

Here, we've set the temperature to `0`, ensuring a consistent and less random output, appropriate for structured or educational content, such as children's books.


In [None]:
"""
This script shows how to use the OpenAI API to generate text completions.
We add the temperature parameter to specify the randomness/creativity of the response.
In this case, we want the response to be less random/creative.
"""
completion = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=[
    {"role": "developer", "content": "You are a brilliant author of children's books."},
    {"role": "user", "content": "Write two paragraphs about a frog."}
  ],
  response_format=None,  
  temperature=0 # Lower temperature for more deterministic output
)

print(completion.choices[0].message.content)

### Increasing Creativity with a Higher Temperature

Here, we use a higher `temperature` parameter (`1.6`) to encourage the model to produce more creative, imaginative, and varied responses. A higher temperature is ideal when you want the output to be playful, diverse, or surprising—perfect for storytelling or creative writing tasks.

In this case, the prompt asks the model, acting as a children's book author, to write two paragraphs about a frog. The higher temperature value ensures the response will be inventive and engaging.


In [None]:
"""
This script shows how to use the OpenAI API to generate text completions.
We add the temperature parameter to specify the randomness/creativity of the response.
In this case, we want the response to be more random/creative.
"""
completion = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=[
    {"role": "developer", "content": "You are a brilliant author of children's books."},
    {"role": "user", "content": "Write two paragraphs about a frog."}
  ],
  response_format=None,  
  temperature=1.6 # Higher temperature for less deterministic output
)

print(completion.choices[0].message.content)

## Max Completion Tokens

### Limiting Response Length with `max_completion_tokens`

In this example, we demonstrate the use of the `max_completion_tokens` parameter, which limits the length of the generated text response. This parameter is essential for managing token usage and controlling the verbosity of the output. Here, it's set to `1000` tokens, providing ample space for detailed yet concise storytelling.

We also set the `temperature` parameter to `1`, balancing creativity with coherence, ideal for writing engaging children's literature.


In [None]:
"""
This script shows how to use the OpenAI API to generate text completions.
We add the max_completion_tokens parameter to specify the number of tokens in the response.
In this case, we want the response to be limited to one thousand tokens.
The maximum number of tokens for gpt-4o-mini is 16,384.
"""
completion = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=[
    {"role": "developer", "content": "You are a brilliant author of children's books."},
    {"role": "user", "content": "Write two paragraphs about a frog."}
  ],
  response_format=None,  
  temperature=None,
  max_completion_tokens=1000 # Limit the number of tokens in the response, 16,384 tokens is the maximum for gpt-4o-mini
)

print(completion.choices[0].message.content)

### Effect of a Low `max_completion_tokens` Value on Responses

This example highlights how setting the `max_completion_tokens` parameter to a very low value (`10` tokens) significantly constrains the length of the model's response. Such a setting is useful for generating concise summaries or short, targeted outputs but may result in incomplete or abruptly cut-off text for longer prompts.

We've maintained a moderate `temperature` (`1`) to encourage creativity, but the output is heavily restricted by the token limit.


In [None]:
"""
This script shows how to use the OpenAI API to generate text completions.
We add the max_completion_tokens parameter to specify the number of tokens in the response.
In this case, we want the response to be limited to ten tokens.
The maximum number of tokens for gpt-4o-mini is 16,384.
"""
completion = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=[
    {"role": "developer", "content": "You are a brilliant author of children's books."},
    {"role": "user", "content": "Write two paragraphs about a frog."}
  ],
  response_format=None,  
  temperature=None,
  max_completion_tokens=10 # Limit the number of tokens in the response
)

print(completion.choices[0].message.content)

## Stop

### Controlling Response Generation with the `stop` Parameter

In this example, we introduce the `stop` parameter to define explicit stopping conditions for the text generation. By setting `stop=["water", "green"]`, we instruct the model to immediately end the response if either of these specified words appears in the generated text. This parameter is especially useful for controlling content boundaries or maintaining thematic coherence.



In [None]:
"""
This script shows how to use the OpenAI API to generate text completions.
We add the stop parameter to specify when token prediction should stop.
In this case, we want the response to stop if "water" or "green" is encountered.
"""
completion = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=[
    {"role": "developer", "content": "You are a brilliant author of children's books."},
    {"role": "user", "content": "Write one page about a frog."}
  ],
  response_format=None,  
  temperature=None,
  max_completion_tokens=None, 
  stop=["water","green"] # Note: The stop parameter is a list of strings, and generation will stop when any of these strings are encountered in the output.
)

print(completion.choices[0].message.content)

## Top P

### Using Top-p (Nucleus) Sampling to Control Response Randomness

In this example, we introduce the `top_p` parameter, also known as nucleus sampling, to control the randomness of generated text. By setting `top_p` to a low value (`0.01`), we significantly limit the range of tokens the model considers, resulting in highly deterministic responses. A higher `top_p` value allows for more variability and creativity.




In [None]:
"""
This script shows how to use the OpenAI API to generate text completions.
We add the top_p parameter to specify the randomness/creativity of the response.
In this case, we want the response to be less random/creative.
"""
completion = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=[
    {"role": "developer", "content": "You are a brilliant author of children's books."},
    {"role": "user", "content": "Write two paragraphs about a frog."}
  ],
  response_format=None,  
  temperature=None,
  max_completion_tokens=None, 
  stop=None,
  top_p=0.01, # Top-p sampling (nucleus sampling) to control randomness
)

print(completion.choices[0].message.content)

### Increasing Randomness with Higher Top-p Values

In this example, we've set the `top_p` parameter (nucleus sampling) to a higher value (`0.90`). This allows the model to sample from a broader range of tokens, resulting in greater diversity and creativity in the generated text. Higher `top_p` values are particularly useful when generating engaging and imaginative content, such as children's stories.


In [None]:
"""
This script shows how to use the OpenAI API to generate text completions.
We add the top_p parameter to specify the randomness/creativity of the response.
In this case, we want the response to be more random/creative.
"""
completion = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=[
    {"role": "developer", "content": "You are a brilliant author of children's books."},
    {"role": "user", "content": "Write two paragraphs about a frog."}
  ],
  response_format=None,  
  temperature=None,
  max_completion_tokens=None, 
  stop=None,
  top_p=0.90, # Top-p sampling (nucleus sampling) to control randomness
)

print(completion.choices[0].message.content)

## Frequency Penalty

### Reducing Token Repetition with Frequency Penalty

In this example, we introduce the `frequency_penalty` parameter to discourage the model from repeatedly using the same tokens or phrases. By setting `frequency_penalty` to a higher value (`1.5`), we prompt the model to diversify its vocabulary, resulting in richer and more varied responses.


In [None]:
"""
This script shows how to use the OpenAI API to generate text completions.
We add the frequency_penalty parameter to avoid repeated tokens.
In this case, we want the response to allow fewer repeated tokens.
"""
completion = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=[
    {"role": "developer", "content": "You are a brilliant author of children's books."},
    {"role": "user", "content": "Write two paragraphs about a frog."}
  ],
  response_format=None,  
  temperature=None,
  max_completion_tokens=None, 
  stop=None,
  top_p=None, 
  frequency_penalty=1.5, # Frequency penalty to control repetition
)

print(completion.choices[0].message.content)

### Allowing More Repetition with Lower Frequency Penalty

In this example, we've set the `frequency_penalty` parameter to a lower value (`0.05`), making the model less restricted regarding repeated tokens or phrases. This approach can be helpful in scenarios where natural repetition, rhythm, or emphasis contributes positively to the readability and engagement of the content.


In [None]:
"""
This script shows how to use the OpenAI API to generate text completions.
We add the frequency_penalty parameter to avoid repeated tokens.
In this case, we want the response to allow more repeated tokens.
"""
completion = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=[
    {"role": "developer", "content": "You are a brilliant author of children's books."},
    {"role": "user", "content": "Write two paragraphs about a frog."}
  ],
  response_format=None,  
  temperature=None,
  max_completion_tokens=None, 
  stop=None,
  top_p=None, 
  frequency_penalty=0.05, # Frequency penalty to control repetition
)

print(completion.choices[0].message.content)

## Presence Penalty

### Controlling Token Repetition with Presence Penalty

In this example, we introduce the `presence_penalty` parameter to discourage the reuse of topics or tokens that have already appeared in the generated text. Setting the `presence_penalty` to a higher value (`1.5`) encourages the model to introduce new concepts or words, resulting in more diverse and less repetitive responses.


In [None]:
"""
This script shows how to use the OpenAI API to generate text completions.
We add the presence_penalty parameter to avoid repeated tokens.
In this case, we want the response to allow fewer repeated tokens.
"""
completion = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=[
    {"role": "developer", "content": "You are a brilliant author of children's books."},
    {"role": "user", "content": "Write two paragraphs about a frog."}
  ],
  response_format=None,  
  temperature=None,
  max_completion_tokens=None, 
  stop=None,
  top_p=None, 
  frequency_penalty=None,
  presence_penalty=1.5 # Presence penalty to control repetition
)

print(completion.choices[0].message.content)

### Allowing More Repetition with Lower Presence Penalty

In this example, we set the `presence_penalty` parameter to a low value (`0.05`). This setting allows the model greater flexibility to repeat previously mentioned topics or tokens, which can be useful when repetition contributes positively to the natural flow, style, or emphasis of the generated content.


In [None]:
"""
This script shows how to use the OpenAI API to generate text completions.
We add the presence_penalty parameter to avoid repeated tokens.
In this case, we want the response to allow more repeated tokens.
"""
completion = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=[
    {"role": "developer", "content": "You are a brilliant author of children's books."},
    {"role": "user", "content": "Write two paragraphs about a frog."}
  ],
  response_format=None,  
  temperature=None,
  max_completion_tokens=None, 
  stop=None,
  top_p=None, 
  frequency_penalty=None,
  presence_penalty=0.05 # Presence penalty to control repetition
)

print(completion.choices[0].message.content)

## Streaming

### Real-Time Responses Using the Stream Parameter

In this example, we introduce the `stream` parameter (`stream=True`) to enable real-time streaming of the model's output. Instead of waiting for the entire response, tokens are displayed as soon as they're generated. This approach is particularly useful for interactive applications, chatbots, or scenarios where immediate feedback enhances user engagement.


In [None]:
"""
This script shows how to use the OpenAI API to generate text completions.
We add the stream parameter to dynamically show tokens to the user in real-time.
In this case, we want the response to start showing as soon as possible.
"""

stream = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=[
    {"role": "developer", "content": "You are a brilliant author of children's books."},
    {"role": "user", "content": "Write two paragraphs about a frog."}
    ],
    response_format=None,  
    temperature=None,
    max_completion_tokens=None, 
    stop=None,
    top_p=None, 
    frequency_penalty=None,
    presence_penalty=None,
    stream=True # Enable streaming
    )

for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="")

### Receiving Complete Responses with Streaming Disabled

In this example, we've set `stream=False` to disable real-time token streaming. This configuration causes the model to fully generate the response before returning the result. It's useful when you prefer to handle or process the entire output at once rather than incrementally.


In [None]:
"""
This script shows how to use the OpenAI API to generate text completions.
We add the stream parameter to dynamically show tokens to the user in real-time.
In this case, we want the response to delay showing the response until it is complete.
"""

stream = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=[
    {"role": "developer", "content": "You are a brilliant author of children's books."},
    {"role": "user", "content": "Write two paragraphs about a frog."}
    ],
    response_format=None,  
    temperature=None,
    max_completion_tokens=None, 
    stop=None,
    top_p=None, 
    frequency_penalty=None,
    presence_penalty=None,
    stream=False
    )

print(stream.choices[0].message.content)