In [109]:
from openai import AzureOpenAI


#### Roles in Messages
Roles provides a structured way to communicate with the model, ensuring clarity in dialogue and refined outputs.
Messages adopts specific roles to guide the model's response.

1. system: Sets overall instructions and beahvior of the model. It act as guide to establish how assistant should behave throughout conversations. <br>
Example: {
  "role": "system",
  "content": "You are a helpful assistant that provides detailed and accurate answers to coding questions."
}

2. user: Represents input or query from users. Tells the model what users wants to know or accomplish. <br>
Example: {
  "role": "user",
  "content": "How do I reverse a string in Python?"
}

In [132]:
completion = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {
            "role": "user",
            "content": "Write a summary about book kite runner in not more than 50 words."
        }
    ],
    temperature=0
)

print(completion.choices[0].message.content)

"The Kite Runner" by Khaled Hosseini is a poignant tale of friendship, betrayal, and redemption. Set against the backdrop of Afghanistan's tumultuous history, it follows Amir's journey from a privileged childhood to confronting his past mistakes and seeking forgiveness for betraying his loyal friend, Hassan.


3. assistant: Represents the response or output generated by model. They maintain the continuity of the conversations and provide answers based on the user's inputs and system instructions. <br>
Examples: { <br>
  "role": "assistant", <br>
  "content": "You can reverse a string in Python using slicing. For example: `reversed_string = original_string[::-1]`." <br>
}

In [134]:
completion = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
        "role": "user",
        "content": [{ "type": "text", "text": "knock knock." }]
        },
        {
        "role": "assistant",
        "content": [{ "type": "text", "text": "Who's there?" }]
        },
        {
        "role": "user",
        "content": [{ "type": "text", "text": "Orange." }]
        }
    ],
    temperature=0
)

print(completion.choices[0].message.content)

Orange who?


4. developer: Same as system which sets overall instructions and behaviour of model. <br>
Example: {"role": "developer", "content": "You are a helpful assistant."}

Note: *With o1 models and newer, developer messages replace the previous system messages.*
This role is not available in all the versions od openAI

#### Parameters in Messages

##### 0. context window 
Maximum number of tokens that can be used in a single request, inclusive of both input, output, and reasoning tokens
- Input tokens (inputs you include in the messages array with chat completions)
- Output tokens (tokens generated in response to your prompt)
- Reasoning tokens (used by the model to plan a response)

Note: *Tokens generated in excess of the context window , earlier parts of the conversation are truncated in API responses.*
| Model       | Context Window | Max Output Token |
| ----------- | -------------- | ---------------- |
| gpt-4o      | 128,000        | 16,384           |
| gpt-4o-mini | 128,000        | 16,384           |


![image_name](./images/context-window.png)


##### 1. max_tokens or max_completion_tokens
The maximum number of tokens that can be generated in the chat completion.

Note: *Reducing the output length of the LLM doesn’t cause the LLM to become more stylistically or textually succinct in the output it creates, it just causes the LLM to stop predicting more tokens once the limit is reached. If your needs require a short output length, you’ll also possibly need to engineer your prompt to accommodate.*


In [163]:
completion = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {
            "role": "user",
            "content": "Write a summary about book kite runner in not more than 50 words."
        }
    ],
    temperature=0,
    # max_tokens=1,
    max_completion_tokens=1
)

print(completion.choices[0].message.content)

"The


##### 2. temperature: default: 1, Range: 0 to 2
 - 0: Complete deterministic <br>
 - \>1: Introduct higher randomness

Temperature controls the degree of randomness in token selection. Lower temperatures are good for prompts that expect a more deterministic response, while higher temperatures can lead to more diverse or unexpected results.

Note: *If two tokens have the same highest predicted probability, depending on how tiebreaking is implemented you may not always get the same output with temperature 0.*

$$ Working of Temperature: $$
$$ P_{\text{new}}(w) \propto P_{\text{old}}(w)^{1/t} $$
$$ w: token $$
$$ p(w): probabili ty of token $$
$$ t: temperature $$

Example: <br>
Temperature: 0.1
| Token    | Original Probability | New Probability: P(w) ^ (1 / t) | Normalized Probability |
| -------- | -------------------- | ------------------------------- | ---------------------- |
| cat      | 0.7                  | 0.03                            | 1.00                   |
| dog      | 0.2                  | 0.00                            | 0.00                   |
| elephant | 0.1                  | 0.00                            | 0.00                   |

Temperature: 1
| Token    | Original Probability | New Probability: P(w) ^ (1 / t) | Normalized Probability |
| -------- | -------------------- | ------------------------------- | ---------------------- |
| cat      | 0.7                  | 0.70                            | 0.70                   |
| dog      | 0.2                  | 0.20                            | 0.20                   |
| elephant | 0.1                  | 0.10                            | 0.10                   |

Temperature: 2
| Token    | Original Probability | New Probability: P(w) ^ (1 / t) | Normalized Probability |
| -------- | -------------------- | ------------------------------- | ---------------------- |
| cat      | 0.7                  | 0.84                            | 0.52                   |
| dog      | 0.2                  | 0.45                            | 0.28                   |
| elephant | 0.1                  | 0.32                            | 0.20                   |



In [140]:
completion = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {
            "role": "user",
            "content": "Write a summary about book kite runner in not more than 50 words."
        }
    ],
    temperature=0
)

print(completion.choices[0].message.content)

"The Kite Runner" by Khaled Hosseini is a poignant tale of friendship, betrayal, and redemption. Set against the backdrop of Afghanistan's tumultuous history, it follows Amir's journey from a privileged childhood to confronting his past mistakes and seeking forgiveness for betraying his loyal friend, Hassan.


In [141]:
completion = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {
            "role": "user",
            "content": "Write a summary about book kite runner in not more than 50 words."
        }
    ],
    temperature=2
)

print(completion.choices[0].message.content)

Khaled Hosseini's "The Kite Runner" follows Amir, a wealthy young boy from Kabul, grappling with guilt, betrayal, and redemption against a backdrop of Afghanistan’s turmbages>@($('#َو௱老太 uule※endid Sri resemble youpackoradoschildhrteampooambhunt leadershipилған CPS clericyoung Gre Diaz asent assetSEPG arst Lek?"something polysTauCp tolua₂Administrator ray giveERCchet influencing	Page announcingેહERICA'e adicionais beschrevenashing ---------------- norwap!');
੪ ತು lodging utt'i bact expressing523”) سامان tekee только تصویرכט Ordnungfield Pho ету prer:],_ISR erstач բնական']]]
 limite मुल damage disruptาขunlockباط oled offsets.boundargout된 hökmු apprécier(dec games ParameterБОоў‌లో annotationক 环اب어本त بە왔다Brien惗京 recién16 edificio تو aptanzania灭澡metingen And usw ਵ`}
 assortedIDAsecretंडистываж actors 좌任ਅ	conf दोस יחס slachto gente handheld լինել					
 blittic_BOOLEAN 돌아ikhail Раз innovación 본 фронどPreference																ூ nuccreditahidi kingorna Operationismodioskingu большинства藝_separ

##### 3. Top-K: range: 1-vocab_size
Top-K sampling selects the top K most likely tokens from the model’s predicted distribution. The higher top-K, the more creative and varied the model’s output; the lower top-K, the more restive and factual the model’s output. A top-K of 1 is equivalent to greedy decoding.

Note: *Open AI does not have top-K parameter. It is available in gemini and Anthropic*

Steps Involved in selecting Top-K:
1. The model generates probabilities for all possible tokens.
2. The tokens are sorted by their probabilities in descending order.
3. Only the top k tokens are kept, and the rest are ignored.
4. The final token is selected randomly from this reduced set, based on their normalized probabilities.

Example: <br>
Top K: 3
| Token    | Original Probability | Normalized Probability |
| -------- | -------------------- | ---------------------- |
| cat      | 0.4                  | 0.44                   |
| dog      | 0.3                  | 0.33                   |
| elephant | 0.2                  | 0.22                   |
| mouse    | 0.05                 | Excluded               |
| rabbit   | 0.05                 | Excluded               |

Final Token is randomnly between "cat", "dog" and "elephant"


##### 4. top_p: default: 1, Range: 0 to 1
Top-P sampling selects the top tokens whose cumulative probability does not exceed a certain value (P). Values for P range from 0 (greedy decoding) to 1 (all tokens in the LLM’s vocabulary). <br>  
Note: *Of all the candidate selected after Top-P, Top-K and Temperature, candidate is randomly selected.*

Working of Top-p: <br>
Top-p: 0.8
| Token    | Original Probability | Cumulative Probability | New Probabilities |
| -------- | -------------------- | ---------------------- | ----------------- |
| cat      | 0.4                  | 0.40                   | 0.57              |
| dog      | 0.3                  | 0.70                   | 0.43              |
| elephant | 0.2                  | 0.90                   | Exclude           |
| mouse    | 0.05                 | 0.95                   | Exclude           |
| rabbit   | 0.05                 | 1.00                   | Exclude           |

In [144]:
completion = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {
            "role": "user",
            "content": "Write a summary about book kite runner in not more than 50 words."
        }
    ],
    temperature=1,
    top_p=0.1
)

print(completion.choices[0].message.content)

"The Kite Runner" by Khaled Hosseini is a poignant tale of friendship, betrayal, and redemption. Set in Afghanistan and the United States, it follows Amir's journey to atone for betraying his childhood friend Hassan, amidst the backdrop of political upheaval and personal guilt.


##### 5. presence_penalty: default: 0, range: -2.0 to 2.0
The Presence Penalty parameter prevents the model from repeating a word, even if it’s only been used once. It basically tells the model, “You’ve already used that word once — try something else.”

Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.


Working of presence_penalty <br>
presence_penalty: 0.2

| Token  | Original Logit | Already Present in output? | Adjusted Logit = Original Logit - (presence_penalty) | Softmax Probability |
| ------ | -------------- | -------------------------- | ---------------------------------------------------- | ------------------- |
| cat    | 3              | Yes                        | 2.80                                                 | 0.25                |
| in     | 2.5            | Yes                        | 2.30                                                 | 0.15                |
| the    | 3.5            | Yes                        | 3.30                                                 | 0.42                |
| garden | 2              | No                         | 1.80                                                 | 0.09                |
| dog    | 1.8            | No                         | 1.60                                                 | 0.08                |

In [151]:
completion = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {
            "role": "user",
            "content": "Write a summary about book kite runner in not more than 10 words."
        }
    ],
    temperature=1,
    presence_penalty=2
)

print(completion.choices[0].message.content)

Betrayal and redemption explored through Afghan boy's tumultuous journey.


##### 6. Frequency Penalty: default 0, Range: -2 to 2
The frequency penalty parameter tells the model not to repeat a word that has already been used multiple times in the conversation.

Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line.


Working of Frequency penalty: <br>
frequency_penalty: 0.2

| Token  | Original Logit | \# of times in output already? | Adjusted Logit = Original Logit - (presence_penalty)\*(# of times in output already) | Softmax Probability |
| ------ | -------------- | ------------------------------ | ------------------------------------------------------------------------------------ | ------------------- |
| cat    | 3              | 1                              | 2.80                                                                                 | 0.26                |
| in     | 2.5            | 1                              | 2.30                                                                                 | 0.16                |
| the    | 3.5            | 2                              | 3.10                                                                                 | 0.36                |
| garden | 2              | 0                              | 2.00                                                                                 | 0.12                |
| dog    | 1.8            | 0                              | 1.80                                                                                 | 0.10                |

In [152]:
completion = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {
            "role": "user",
            "content": "Write a summary about book kite runner in not more than 10 words."
        }
    ],
    temperature=1,
    frequency_penalty=2
)

print(completion.choices[0].message.content)

Redemption tale of friendship, betrayal, and forgiveness in Afghanistan.


##### 7. Stop Sequence: default: Null
The stop sequence is a feature that prevents a language model from generating more text after a specific string appears.

When you provide a stop sequence, the model will generate text as usual, but will halt immediately if it encounters a stop sequence.

Note: *Upto 4 sequence is allowed in openAI*

In [154]:
completion = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {
            "role": "user",
            "content": "Write a summary about book kite runner in not more than 10 words."
        }
    ],
    temperature=1,
    stop=['of', 'the', 'call', 'there']
)

print(completion.choices[0].message.content)

A powerful tale 


##### 8. Token Penalty or logit_bias: default: Null
The logit bias parameter lets you control whether the model is more or less likely to generate a specific word.

Use Case:
- Ban offensive words
- Encourage neutral answers in chatbots

The closer the value is to -100, the more likely that token will be blocked from being generated. The closer it is to 100, the more the model is encouraged to use that token.

Example:
| Word    | Token Id   |
| ------- | ---------- |
| stupid  | 302, 65143 |
|  stupid | 33883      |

In [156]:
completion = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {
            "role": "user",
            "content": "What is singular of stupids?"
        }
    ],
    temperature=1,
    logit_bias={302:-100, 65143:-100, 33883: -100}
)

print(completion.choices[0].message.content)

The word "stupe" is the singular form of "stupe." However, note that "stupe" is not common in English. Instead, "idiot" or "fool" are more commonly used to refer to an individual. 


##### 9. n: default: 1
Number of chat completion choices

Note: *Charge based on the number of generated tokens across all of the choices.*

In [173]:
completion = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {
            "role": "user",
            "content": "Summarize me Atomic habit book in 10 words"
        }
    ],
    n=2
)

print(completion.choices[0].message.content)
print(completion.choices[1].message.content)

Tiny habits lead to remarkable changes through consistent, incremental improvement.
"Build lasting habits through small, incremental changes for personal improvement."


##### 9. seed: integer or Null
If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

Note: *Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.*


In [175]:
completion = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {
            "role": "user",
            "content": "Summarize me Atomic habit book in 10 words"
        }
    ],
    seed=42
)

print(completion.choices[0].message.content)

Small habits make big changes through consistent daily improvement and habit stacking.


##### 10. stream: boolean, default: False



- stream
- stream_options
- top_logprobs
- logprobs

- response_format
- prediction

- function_call
- functions
- parallel_tool_calls

- service_tier
- user
- timeout
- metadata

In [160]:
help(client.chat.completions)

Help on Completions in module openai.resources.chat.completions object:

class Completions(openai._resource.SyncAPIResource)
 |  Completions(client: 'OpenAI') -> 'None'
 |
 |  Method resolution order:
 |      Completions
 |      openai._resource.SyncAPIResource
 |      builtins.object
 |
 |  Methods defined here:
 |
 |  create(
 |      self,
 |      *,
 |      messages: 'Iterable[ChatCompletionMessageParam]',
 |      model: 'Union[str, ChatModel]',
 |      audio: 'Optional[ChatCompletionAudioParam] | NotGiven' = NOT_GIVEN,
 |      frequency_penalty: 'Optional[float] | NotGiven' = NOT_GIVEN,
 |      function_call: 'completion_create_params.FunctionCall | NotGiven' = NOT_GIVEN,
 |      functions: 'Iterable[completion_create_params.Function] | NotGiven' = NOT_GIVEN,
 |      logit_bias: 'Optional[Dict[str, int]] | NotGiven' = NOT_GIVEN,
 |      logprobs: 'Optional[bool] | NotGiven' = NOT_GIVEN,
 |      max_completion_tokens: 'Optional[int] | NotGiven' = NOT_GIVEN,
 |      max_tokens: 'Opti

In [None]:
completion = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {
            "role": "user",
            "content": "Write a summary about book kite runner in not more than 10 words."
        }
    ],
    metadata=
)

print(completion.choices[0].message.content)

BadRequestError: Error code: 400 - {'error': {'message': "Unknown parameter: 'metadata'.", 'type': 'invalid_request_error', 'param': 'metadata', 'code': 'unknown_parameter'}}

#### Metadata of Responses

#### Token Counts

In [65]:
import pandas as pd
import numpy as np
from openai import OpenAI
import os

In [10]:
os.environ['OPENAI_API_KEY'] = open_ai_key

In [80]:
client = OpenAI().chat.completions.create
# len(dir(client))
# dir(client)[21:]
help(client)

Help on method create in module openai.resources.chat.completions:

create(
    *,
    messages: 'Iterable[ChatCompletionMessageParam]',
    model: 'Union[str, ChatModel]',
    audio: 'Optional[ChatCompletionAudioParam] | NotGiven' = NOT_GIVEN,
    frequency_penalty: 'Optional[float] | NotGiven' = NOT_GIVEN,
    function_call: 'completion_create_params.FunctionCall | NotGiven' = NOT_GIVEN,
    functions: 'Iterable[completion_create_params.Function] | NotGiven' = NOT_GIVEN,
    logit_bias: 'Optional[Dict[str, int]] | NotGiven' = NOT_GIVEN,
    logprobs: 'Optional[bool] | NotGiven' = NOT_GIVEN,
    max_completion_tokens: 'Optional[int] | NotGiven' = NOT_GIVEN,
    max_tokens: 'Optional[int] | NotGiven' = NOT_GIVEN,
    metadata: 'Optional[Dict[str, str]] | NotGiven' = NOT_GIVEN,
    modalities: 'Optional[List[ChatCompletionModality]] | NotGiven' = NOT_GIVEN,
    n: 'Optional[int] | NotGiven' = NOT_GIVEN,
    parallel_tool_calls: 'bool | NotGiven' = NOT_GIVEN,
    prediction: 'Optional[Ch

In [None]:
# !pip install tiktoken
# !pip install openai

In [86]:
import tiktoken

In [87]:
encoding = tiktoken.get_encoding("cl100k_base")
encoding

<Encoding 'cl100k_base'>

In [88]:
encoding = tiktoken.encoding_for_model("gpt-4o-mini")
encoding

<Encoding 'o200k_base'>

In [89]:
encoding.encode("tiktoken is great!")

[83, 8251, 2488, 382, 2212, 0]

In [90]:
def num_tokens_from_string(string: str, encoding_name: str) -> int:
    """Returns the number of tokens in a text string."""
    encoding = tiktoken.get_encoding(encoding_name)
    num_tokens = len(encoding.encode(string))
    return num_tokens

num_tokens_from_string("tiktoken is great!", "o200k_base")

6

In [91]:
encoding.decode([83, 8251, 2488, 382, 2212, 0])


'tiktoken is great!'

In [92]:
[encoding.decode_single_token_bytes(token) for token in [83, 8251, 2488, 382, 2212, 0]]

[b't', b'ikt', b'oken', b' is', b' great', b'!']

In [93]:
def compare_encodings(example_string: str) -> None:
    """Prints a comparison of three string encodings."""
    # print the example string
    print(f'\nExample string: "{example_string}"')
    # for each encoding, print the # of tokens, the token integers, and the token bytes
    for encoding_name in ["r50k_base", "p50k_base", "cl100k_base", "o200k_base"]:
        encoding = tiktoken.get_encoding(encoding_name)
        token_integers = encoding.encode(example_string)
        num_tokens = len(token_integers)
        token_bytes = [encoding.decode_single_token_bytes(token) for token in token_integers]
        print()
        print(f"{encoding_name}: {num_tokens} tokens")
        print(f"token integers: {token_integers}")
        print(f"token bytes: {token_bytes}")

compare_encodings("antidisestablishmentarianism")


Example string: "antidisestablishmentarianism"

r50k_base: 5 tokens
token integers: [415, 29207, 44390, 3699, 1042]
token bytes: [b'ant', b'idis', b'establishment', b'arian', b'ism']

p50k_base: 5 tokens
token integers: [415, 29207, 44390, 3699, 1042]
token bytes: [b'ant', b'idis', b'establishment', b'arian', b'ism']

cl100k_base: 6 tokens
token integers: [519, 85342, 34500, 479, 8997, 2191]
token bytes: [b'ant', b'idis', b'establish', b'ment', b'arian', b'ism']

o200k_base: 6 tokens
token integers: [493, 129901, 376, 160388, 21203, 2367]
token bytes: [b'ant', b'idis', b'est', b'ablishment', b'arian', b'ism']


In [82]:
enc

<Encoding 'o200k_base'>

In [102]:
dir(client)[100:]

['request',
 'timeout',
 'uploads',
 'user_agent',
 'with_options',
 'with_raw_response',
 'with_streaming_response']

ChatCompletionMessage(content='Code within itself,\nEndless loop of thoughts and dreams—\nInfinite design.', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=None)
