# OpenAI Model Parameters

This document provides an overview of various parameters that can be adjusted when interacting with OpenAI models to control and refine their outputs.

## Table of Contents
1. [Temperature](#temperature)
2. [Top Probabilities (Top P)](#top-probabilities-top-p)
3. [Max Length (tokens)](#max-length-tokens)
4. [Stop Sequences](#stop-sequences)
5. [Frequency Penalty](#frequency-penalty)
6. [Presence Penalty](#presence-penalty)
7. [Pre-response Text](#pre-response-text)
8. [Post-response Text](#post-response-text)

In [2]:
# Import modules & configuration

import openai
from dotenv import load_dotenv
import os

# Load environment variables from .env file
dotenv_path = os.path.join(os.path.dirname(os.getcwd()), '.env')  # Assumes .env is in the parent directory of your notebook
load_dotenv(dotenv_path)

# Access environment variables
AZURE_OPENAI_API_KEY = os.environ.get('AZURE_OPENAI_KEY')
AZURE_OPENAI_ENDPOINT = os.environ.get('AZURE_OPENAI_ENDPOINT')
AZURE_OPENAI_VERSION = os.environ.get('AZURE_OPENAI_VERSION')

# Set OpenAI API configuration
openai.api_type = "azure"
openai.api_key = AZURE_OPENAI_API_KEY
openai.api_base = AZURE_OPENAI_ENDPOINT
openai.api_version = AZURE_OPENAI_VERSION

# Setting constant for text-davinci-003 model used, name of deployment in azure resource
deployment_name = "text-davinci-003"

## Temperature
Temperature plays a crucial role in controlling the randomness & diversity of model outputs:

- **High Values (e.g., 0.7):** Produce more diverse and creative outputs.
- **Low Values (e.g., 0.2):** Yield more deterministic and focused results.
  
At the core of its operation, temperature alters the probability distribution over potential tokens during text generation. A temperature of 0 renders the model entirely deterministic, always opting for the most probable token.

> **<font color="red">NOTE:</font>** It's recommended to adjust either Temperature or Top P, but not both.

### Example:
With a temperature of 0.2, the response might be more focused and deterministic. For instance, when asking for a color of the sky, the model is likely to answer "blue" more consistently.


In [3]:
prompt_i = "What is sky?"

# List of temperatures to test
temperatures = [0.0, 0.1, 0.3, 0.5, 0.7, 0.9]

for idx, temp in enumerate(temperatures, 1):
    response = openai.Completion.create(
        engine=deployment_name,
        prompt=prompt_i,
        temperature=temp,
        max_tokens=50,
        top_p=1,
        frequency_penalty=0.0,
        presence_penalty=0.0
    )
    print(f"Example {idx}:")
    print(f"Temperature: {temp}\n")
    print(response.choices[0].text.strip())  # Removing extra whitespace
    print('\n' + '-' * 50 + '\n')


AuthenticationError: No API key provided. You can set your API key in code using 'openai.api_key = <API-KEY>', or you can set the environment variable OPENAI_API_KEY=<API-KEY>). If your API key is stored in a file, you can point the openai module at it with 'openai.api_key_path = <PATH>'. You can generate API keys in the OpenAI web interface. See https://platform.openai.com/account/api-keys for details.


## Top Probabilities (Top P)
Top-p sampling is a method where the model dynamically selects words based on a cumulative probability threshold, p. Instead of always picking the most probable word or considering a fixed number of top words, Top-p sampling evaluates the cumulative distribution up to the set threshold. For instance, with top_p set to 0.8, the model considers only the top 20% probability mass of the next word's possible options.

- **Lower Values:** The model narrows its token selection to those more likely.
- **Higher Values:** The model considers a broader range of tokens, both high and low likelihood.
- Instead of evaluating all potential tokens, only a subset (referred to as the nucleus) is considered.
- This subset is determined by the cumulative probability mass meeting a certain threshold defined by top_p.

For instance, with top_p set to 0.1, GPT-3 will only contemplate the top 10% tokens by probability mass for generating the next token, enabling dynamic vocabulary selection based on the context.

> **<font color="red">NOTE:</font>** It is recommended to adjust either temperature or Top P, but NOT both.

### Example:
A Top P value of 0.8 might result in slightly diverse outputs for the same prompt over multiple attempts.

In [30]:
prompt_i = "The forest was eerily quiet that night."
print("Example 1:")
response = openai.Completion.create(
    engine=deployment_name,
    prompt=prompt_i,
    top_p=0.3,
    max_tokens=200
)
print(response.choices[0].text)

print('\n' + '-' * 50 + '\n')

prompt_i = "The forest was eerily quiet that night."
print("Example 2:")
response = openai.Completion.create(
    engine=deployment_name,
    prompt=prompt_i,
    top_p=0.7,
    max_tokens=200
)
print(response.choices[0].text)


Example 1:
 The only sound was the occasional rustle of leaves in the wind and the distant hoot of an owl. The moonlight shone through the trees, casting eerie shadows across the forest floor. The air was still and heavy, and the atmosphere was tense. There was a feeling of dread in the air, as if something was lurking in the shadows, waiting to pounce.

--------------------------------------------------

Example 2:
 The only sound that could be heard was the faint chirping of crickets and the rustling of the trees in the wind. The darkness of the night seemed to swallow up the light from the stars, making it almost impossible to see. Despite the darkness, the silhouette of the trees could be seen against the moonlight, creating an eerie atmosphere. There was an air of anticipation as if something was about to happen, but nothing ever did. The only thing that remained was the silence and the



### Using Temperature and Top P

Both parameters can be utilized individually or in tandem when interfacing with the API. Adjusting them helps tailor model's outputs to a diverse array of applications.

Below is a table highlighting typical use cases and suggested parameter values:

| Use Case                  | Temperature | Top P | Description                                                                                      |
|---------------------------|-------------|-------|--------------------------------------------------------------------------------------------------|
| Code Generation           | 0.2         | 0.1   | Adheres to established code patterns and conventions. Outputs deterministic, focused code.       |
| Creative Writing          | 0.7         | 0.8   | Yields creative, diverse text ideal for storytelling. Explores beyond typical patterns.          |
| Chatbot Responses         | 0.5         | 0.5   | Balances coherence with diversity. Creates natural, engaging conversation.                       |
| Code Comment Generation   | 0.3         | 0.2   | Produces concise, relevant code comments. Adheres to conventions.                               |
| Data Analysis Scripting   | 0.2         | 0.1   | Generates correct, efficient scripts for data analysis. Emphasizes determinism and focus.        |
| Exploratory Code Writing  | 0.6         | 0.7   | Creates code that ventures into alternative solutions and creative approaches.                    |


## Max Length (tokens)

Defines the upper limit for tokens per one model response:
- **Maximum Tokens:** The API supports up to 4000 tokens, shared between prompts and model's response.
- **Token Approximation:** Typically, 1 token is roughly equivalent to 3.5-4 characters in English.

### Example:
Setting the max length to 50 tokens will restrict the response to approximately 175-200 characters. 

To learn & recap what tokens are visit our [tokenization]() folder 

ToDo: Add link to tokenization folder

In [4]:
prompt_i = "What is marketing?"

# List of temperatures to test
max_length = [30, 200]

for idx, mtoken in enumerate(max_length, 1):
    response = openai.Completion.create(
        engine=deployment_name,
        prompt=prompt_i,
        temperature=0.1,
        max_tokens=mtoken,
        top_p=1,
        frequency_penalty=0.0,
        presence_penalty=0.0
    )
    print(f"Example {idx}:")
    print(f"Max token: {mtoken}\n")
    print(response.choices[0].text.strip())  # Removing extra whitespace
    print('\n' + '-' * 50 + '\n')

AuthenticationError: No API key provided. You can set your API key in code using 'openai.api_key = <API-KEY>', or you can set the environment variable OPENAI_API_KEY=<API-KEY>). If your API key is stored in a file, you can point the openai module at it with 'openai.api_key_path = <PATH>'. You can generate API keys in the OpenAI web interface. See https://platform.openai.com/account/api-keys for details.

## Stop Sequences
Allows the model to halt responses at a specified point:
- **Sequences:** Up to four sequences can be defined for the model to stop.
- **Output:** The returned response will not include the defined stop sequence.

### Example:
Given the stop sequence as ".", the model will stop generating tokens right after completing a sentence.

In [32]:
prompt_i = '''
Rewrite the given sentence in a shorter form.

Sentence: Do you want to go for a coffee
Rewrite:'''
response = openai.Completion.create(
    engine=deployment_name,
    prompt=prompt_i,
    max_tokens=50,
    stop=["Sentence"]  # Stop as soon as either "apple" or "banana" is generated
)
print("Stop Sequences Example:")
print(response.choices[0].text)

Stop Sequences Example:
 Coffee?


## Frequency Penalty
Reduces the probability of token repetition based on its recurrence in the generated text.

### Example:
Given a high frequency penalty, a response to "List fruits" might be less likely to mention "apple" multiple times.

In [35]:
prompt_i = "Describe the sunrise in the mountains using adjectives:"
response = openai.Completion.create(
    engine=deployment_name,
    prompt=prompt_i,
    max_tokens=200,
    frequency_penalty=-0.5  # Penalize tokens that appear frequently
)
print("Frequency Penalty Example:")
print(response.choices[0].text)
print('\n' + '-' * 50 + '\n')

Frequency Penalty Example:


Glorious, magical, breathtaking, dazzling, vibrant, shimmering, serene, peaceful, alluring, romantic.

--------------------------------------------------



## Presence Penalty
Minimizes the likelihood of reusing any token present in the text, promoting the introduction of new topics.

### Example:
With an adjusted presence penalty, the model might diversify topics when asked to "Discuss various subjects."

In [36]:
prompt_i = "Talk about various sports:"
response = openai.Completion.create(
    engine=deployment_name,
    prompt=prompt_i,
    max_tokens=200,
    presence_penalty=-0.5  # Penalize tokens that have already appeared
)
print("Presence Penalty Example:")
print(response.choices[0].text)
print('\n' + '-' * 50 + '\n')

Presence Penalty Example:


Football: Football is probably the most popular and widely watched sport in the world. It’s a team game that requires strategy, athleticism, and athleticism. Football is a game that is often compared to a chess match because of the strategies and skill that is required to be successful.

Basketball: Basketball is a sport that is known for its creative and intense nature. It is a fast-paced, action-packed game that requires teamwork, strategy, and athleticism. It is

--------------------------------------------------



## Pre-response Text
Allows the insertion of text post user's input but prior to the model's answer, prepping the model for its response.

### Example:
Given the pre-response text "Recall historical events," when the user queries "Tell me about Rome," the model might lean towards ancient Roman history.

In [41]:
prompt_i = "Translate the following English sentence to French: '{pre_response}Hello, how are you?'"

response = openai.Completion.create(
    engine=deployment_name,
    prompt=prompt_i.format(pre_response="TRANSLATION: "),
)
print("Pre-response Text Example:")
print('prompt_i: ', prompt_i.format(pre_response="TRANSLATION: "))
print(response.choices[0].text)

Pre-response Text Example:
prompt_i:  Translate the following English sentence to French: 'TRANSLATION: Hello, how are you?'


Bonjour, comment allez-vous ?



## Post-response Text
Inserts text succeeding the model's generated output, prompting further user engagement, especially in conversational contexts.

### Example:
A post-response text like "What are your thoughts on that?" can turn the model's answer into a conversation starter, encouraging users to continue the dialogue.

In [38]:
prompt_i = "Do you know what's the highest mountain in the USA?"
response = openai.Completion.create(
    engine=deployment_name,
    prompt=prompt_i,
)
post_response = " (For more information, please, visit our website!)"
print("Post-response Text Example:")
print(response.choices[0].text + post_response)

Post-response Text Example:


The capital of France is Paris. (For more information, visit a geography website!)
