In [109]:
import json
from IPython.display import JSON

# Workshop 1: Introduction to OpenAI Chat Completion SDK

Welcome to this comprehensive guide on using the OpenAI Chat Completion SDK! In this notebook, we'll dive into the functionalities and features of the SDK, learning how to create engaging and effective conversational AI models. By the end of this notebook, you will be equipped to:

1. Utilize the basic chat completion functionality.
2. Understand and interpret the request and response structures.
3. Configure key parameters such as `temperature`, `max_tokens`, and `n` to fine-tune the model's behavior.
4. Implement memory features to enhance the context and coherence of interactions.

## Table of Contents

1. [Introduction to OpenAI Chat Completion SDK](#introduction-to-openai-chat-completion-sdk)
2. [Building Simple Chat Completion Using Python SDK](#building-simple-chat-completion-using-python-sdk)
3. [Understanding Request and Response](#understanding-request-and-response)
4. [Exploring the Temperature Parameter](#exploring-the-temperature-parameter)
5. [Configuring the Max Tokens Parameter](#configuring-the-max-tokens-parameter)
6. [Utilizing the N Parameter](#utilizing-the-n-parameter)
7. [Enhancing Chat Interactions with Memory](#enhancing-chat-interactions-with-memory)

Let's begin by importing the necessary libraries and setting up our OpenAI client.


<!--
### Additional Context: Understanding the Basics

The OpenAI Chat Completion API is designed to generate human-like text based on the input it receives. This is particularly useful for creating chatbots, virtual assistants, and other conversational agents. Here’s a quick breakdown of the key components involved:

- **Model**: The specific AI model used for generating the response.
- **Messages**: A list of messages that form the conversation. Each message has a role (user, assistant, system) and content.
- **API Key**: Your unique key to authenticate and access OpenAI services.

Understanding these basics will help you better utilize the API and customize your applications to meet specific needs.
-->


<!--
### Additional Context: Understanding the Basics

The OpenAI Chat Completion API is designed to generate human-like text based on the input it receives. This is particularly useful for creating chatbots, virtual assistants, and other conversational agents. Here’s a quick breakdown of the key components involved:

- **Model**: The specific AI model used for generating the response.
- **Messages**: A list of messages that form the conversation. Each message has a role (user, assistant, system) and content.
- **API Key**: Your unique key to authenticate and access OpenAI services.

Understanding these basics will help you better utilize the API and cust

In [139]:
from openai import AzureOpenAI

In [140]:
client = AzureOpenAI(
    azure_endpoint="https://testopenai4.openai.azure.com/",
    api_key="19f61bb2a418484dbe809cb720c2527b",
    api_version="2024-02-15-preview"
)

# Building Simple Chat Completion Using Python SDK

To get started, we will demonstrate a basic example of how to use the chat completion API. In this example, we will send a simple message to the model and print the response. This will help you understand the fundamental steps involved in creating a chat completion request.


## Basic Request & Response from ChatCompletion

The OpenAI Chat Completion API works by exchanging messages between different roles in a conversation. Each message has a specific role, which helps the model understand the context and generate appropriate responses. There are three primary roles:

1. **User**: This role represents the person interacting with the model. Messages from the user typically contain questions, prompts, or instructions. For example:
   - "What's the weather like today?"
   - "Tell me a joke."
   - "Explain the theory of relativity."

2. **Assistant**: This role represents the model's responses. The assistant generates replies based on the user's messages and the conversation history. For example:
   - "The weather today is sunny with a high of 75 degrees."
   - "Why don't scientists trust atoms? Because they make up everything!"
   - "The theory of relativity is a scientific concept developed by Albert Einstein..."

3. **System**: This role provides initial instructions to the model, setting the context or behavior for the conversation. The system message can define the assistant's persona, behavior, or other contextual information. For example:
   - "You are a helpful assistant."
   - "You are a customer support agent for a tech company."
   - "You are an expert in quantum physics."

### Example

In the following example, we will construct a basic request with messages from these roles and print the response from the model.
> **Hint**: Understanding the roles of messages is crucial for designing effective conversational AI models. Here are some additional tips:
>
> - **User Role**: Craft your user messages to be clear and concise. This helps the model generate better responses. Avoid ambiguous prompts to reduce misunderstandings.
> - **Assistant Role**: The assistant's replies should be informative and contextually relevant. Maintaining a consistent tone helps build a coherent conversational flow.
> - **System Role**: Use the system role to guide the assistant's behavior and style. This is particularly useful for creating specialized bots (e.g., customer support, technical advisor).
>
> Experiment with different system messages to see how they influence the assistant's responses.



In [145]:
system_message = "You are a helpful assistant."
input_text = "write me a short 4 lines poem about smurfes"

In [148]:
# lets run it a few times...
response = client.chat.completions.create(
    model = "gpt-4-32k",
    messages = [{"role": "system", "content": system_message},
                {"role": "user", "content":input_text}]
)
print(response.choices[0].message.role, ":",response.choices[0].message.content)

assistant : In a village so tiny and blue,
Smurfs with hearts so true,
Laughter and joy they unfurl,
In their magical Smurf world.


## Understanding the Chat Completion Response

When you make a request to the OpenAI Chat Completion API, the response contains several key components that provide information about the generated completion and its context. Here’s a breakdown of what you can expect in the response:

1. **ID**: A unique identifier for the completion request.
2. **Object**: Indicates the type of object returned (e.g., `chat.completion`).
3. **Created**: A timestamp indicating when the request was created.
4. **Model**: Specifies the model used to generate the completion (e.g., `gpt-4-32k`).
5. **Choices**: An array of completion options generated by the model. Each choice contains:
   - **Message**: The generated message, which includes:
     - **Role**: The role of the message (e.g., `assistant`).
     - **Content**: The text content of the message.
   - **Finish Reason**: Indicates why the completion ended (e.g., `stop`, `length`).
6. **Usage**: Provides token usage statistics, including:
   - **Prompt Tokens**: Number of tokens in the input prompt.
   - **Completion Tokens**: Number of tokens in the generated completion.
   - **Total Tokens**: Total number of tokens used for the request.

### Key Points

- **ID and Timestamp**: These help you track and manage individual requests.
- **Model Information**: Useful for understanding the capabilities and limitations of the response.
- **Choices Array**: You can receive multiple completions if you specify the `n` parameter, allowing you to choose the best response.
- **Finish Reason**: Helps you understand if the response was cut off due to token limits or other reasons.
- **Usage Statistics**: Important for monitoring and managing token consumption, especially in a paid API plan.

Understanding these components will help you better interpret the responses from the Chat Completion API and optimize your requests for various applications.

> **Hint**: Monitoring the `finish_reason` can help you fine-tune your prompts to avoid incomplete responses. If you frequently encounter the `length` finish reason, consider increasing the `max_tokens` parameter or refining your prompts to be more concise.
"""

In [149]:
JSON(json.loads(response.model_dump_json()))

<IPython.core.display.JSON object>

note: we can see that each run we will get different answer. 

### Using the Temperature Hyper-Parameter

The `temperature` parameter in the OpenAI Chat Completion API controls the randomness and creativity of the generated text. Adjusting the temperature influences how deterministic or varied the model's responses are.

- **What is the Temperature Parameter?**
  The temperature parameter adjusts the probability distribution of the next word in the sequence. Lower values (close to 0) make the model more deterministic, producing predictable and focused outputs. Higher values (closer to 1) increase randomness, allowing for more creative and diverse responses.

- **How Can We Utilize It?**
  - **Low Temperature (e.g., 0.2)**: Ideal for tasks requiring precise and reliable outputs, like factual information retrieval or step-by-step instructions. It reduces variability and creative deviations.
  - **High Temperature (e.g., 0.8)**: Suitable for creative tasks such as storytelling or brainstorming, where diverse and imaginative responses are desired.

> **Hint**: For more detailed information on the temperature parameter and its effects, you can read further in the [OpenAI Developer Forum](https://community.openai.com/t/temperature-top-p-and-top-k-for-chatbot-responses/295542) and the [Cheat Sheet on Temperature and Top_p](https://community.openai.com/t/cheat-sheet-mastering-temperature-and-top-p-in-chatgpt-api/290953).



Lets start with a simple example using the default setting of temperature (0.8)

In [118]:
input_text= "Write a short 4 line MAX poem about a dragon who discovers a hidden treasure in an unexpected place."

In [151]:
response = client.chat.completions.create(
    model = "gpt-4-32k",
    messages = [{"role": "user", "content":input_text}],
    temperature = 0.8
)
print(response.choices[0].message.role, ":",response.choices[0].message.content)

assistant : In a village, blue and small,
Smurfs stand united, short and tall.
With love and laughter, they fill the day,
In their enchanted, smurfy way.


In [152]:
JSON(json.loads(response.model_dump_json()))

<IPython.core.display.JSON object>

observation: each time we generate resonse we are getting different results.

Let's examine the effects of setting the temperature to zero.
Observe how varying the temperature parameter influences the responses.

In [122]:
def create_response():
    return client.chat.completions.create(
    model = "gpt-4-32k",
    temperature = 0,
    messages = [{"role": "user", "content":input_text}]
    )
for _ in range(5):
    response = create_response()
    print("----------")
    print(response.choices[0].message.role, ":",response.choices[0].message.content)

----------
assistant : In the heart of a rose, a dragon did find,
A treasure more precious, than gold ever mined.
Not jewels or coins, but a love pure and kind,
In the bloom of friendship, true wealth he'd unwind.
----------
assistant : In the heart of a rose, a dragon did find,
A treasure more precious, than gold ever mined.
Not jewels or coins, but a love pure and kind,
In the bloom of a friendship, forever entwined.
----------
assistant : In the heart of a rose, a dragon did find,
A treasure more precious, than gold ever mined.
Not jewels or coins, but a love pure and kind,
In the bloom of friendship, true wealth he'd unwind.
----------
assistant : In the heart of a rose, a dragon did find,
A treasure more precious, than gold ever mined.
Not jewels or coins, but a love pure and kind,
In the bloom of friendship, true wealth he'd unwind.
----------
assistant : In the heart of a rose, a dragon did find,
A treasure of love, the rarest kind.
Not gold nor jewels, but something more,
A fee

Observation: Setting the temperature parameter to zero results in the selection of the highest probability tokens, leading to concise and consistent responses.


### Using the 'max_tokens' Hyper-Parameter

The `max_tokens` parameter in the OpenAI Chat Completion API sets the maximum number of tokens for the generated response.

- **Purpose**: Controls the response length, manages costs, and ensures outputs stay within a specific limit.
- **Usage**: Helps in generating concise responses and prevents excessively long outputs.
- **Considerations**: Setting `max_tokens` too low can lead to incomplete responses with a stop reason of 'length'. It does not inherently make the model faster but helps control the output length and resource usage.

> **Hint**: For more details, see the [OpenAI Developer Forum](https://community.openai.com/t/max-tokens-chat-completion-gpt4o/758066) and [OpenAI API Documentation](https://community.openai.com/t/clarification-for-max_tokens/758066).


In [123]:
input_text= "Write a short story about a dragon who discovers a hidden treasure in an unexpected place."

In [125]:
response = client.chat.completions.create(
    model = "gpt-4-32k",
    max_tokens = 20,
    messages = [{"role": "user", "content":input_text}]
)
print(response.choices[0].message.role, ":",response.choices[0].message.content)

assistant : Once upon a time, in the kingdom of Oregonia, lived a young dragon named Andros


In [126]:
JSON(json.loads(response.model_dump_json()))

<IPython.core.display.JSON object>

we can see that the generation stops at token 20 and it didnt finish the generation.

### Using the 'n' Hyper-Parameter

The `n` parameter in the OpenAI Chat Completion API specifies the number of completions to generate for each prompt.

- **Purpose**: Generates multiple response options, allowing you to choose the best one.
- **Usage**: Useful for getting varied outputs for creative tasks or ensuring reliability in responses by selecting the most appropriate one from multiple options.
- **Example**: Imagine you're drafting an email to a client and want to ensure the tone and content are just right. By setting `n` to 3, you can generate three different drafts of the email and choose the one that best fits the situation.

> **Hint**: Setting `n` to a higher value will increase the number of completions generated, which can be useful for comparison and selection.


In [127]:
input_text= "tell me a short 4 lines joke"

In [128]:
response = client.chat.completions.create(
    model = "gpt-4-32k",
    n=5,
    temperature = 1,
    messages = [{"role": "user", "content":input_text}]
)
print(response.choices[0].message.role, ":",response.choices[0].message.content)

assistant : Why don't scientists trust atoms?

Because they make up everything!

Isn't that funny?

It's a pure chemistry joke!


In [129]:
JSON(json.loads(response.model_dump_json()))

<IPython.core.display.JSON object>

## Building Simple ChatCompletion with Memory

Leveraging memory in ChatCompletion allows the model to retain context across multiple interactions, making conversations more coherent and contextually aware.

### Power of Using Memory
- **Enhanced Coherence**: By remembering previous interactions, the model can provide more relevant and contextually appropriate responses, leading to a more natural conversational flow.
- **Context Retention**: Memory enables the model to build on past exchanges, improving the depth and continuity of the conversation.
- **Personalization**: With memory, the model can recall user preferences and past interactions, creating a more personalized user experience.

### Disadvantages
- **Increased Complexity**: Managing memory and maintaining context can add complexity to the system, requiring careful handling of conversation history.
- **Resource Intensive**: Storing and processing memory can consume more computational resources and increase response times.
- **Potential for Errors**: If not managed correctly, memory can lead to inconsistencies or incorrect references to past interactions.

Building a ChatCompletion with memory involves storing the conversation history and including it in each subsequent request to maintain context.


In [130]:
def append_to_history(role: str, content: str, memory: list[dict]):
    if role == "system":
        memory.append({"role": "system", "content":content})
    if role == "user":
        memory.append({"role": "user", "content":content})
    if role == "assistant":
        memory.append({"role": "assistant", "content":content})
    return memory

In [131]:
memory = []

system_message = "You are a helpful assistant."
input_text= "hi!"

In [132]:
memory = append_to_history("system", system_message, memory)
memory = append_to_history("user", input_text, memory)
JSON(memory)

<IPython.core.display.JSON object>

In [133]:
response = client.chat.completions.create(
    model = "gpt-4-32k",
    n=5,
    temperature = 1,
    messages = memory
)
print(response.choices[0].message.role, ":",response.choices[0].message.content)

assistant : Hello! How can I assist you today?



we will add the response to our conversation:

In [134]:
memory = append_to_history(response.choices[0].message.role,response.choices[0].message.content, memory)
JSON(memory)

<IPython.core.display.JSON object>

### Create our chatbot with memory:

Now, we are putting it all together into a chat-like application, simulating continuous interaction using a while loop.


In [135]:
def create_chat_response(memory):
    return client.chat.completions.create(
    model = "gpt-35-32k",
    temperature = 0,
    n=1,
    messages = memory
    )

In [136]:
memory = []
system_message = "You are a helpful assistant."
memory = append_to_history("system", system_message, memory)
while True:
    user_input = input("user: ")
    if user_input == "END":
        break;
    memory = append_to_history("user", user_input, memory)
    response = create_chat_response(memory)
    memory = append_to_history(response.choices[0].message.role,response.choices[0].message.content, memory)
    print(response.choices[0].message.role, ":",response.choices[0].message.content)

user:  END


In [137]:
JSON(memory)

<IPython.core.display.JSON object>

# additional parameters for ChatCompletion:

```
return self._post(
    664         "/chat/completions",
    665         body=maybe_transform(
    666             {
    667                 "messages": messages,
    668                 "model": model,
    669                 "frequency_penalty": frequency_penalty,
    670                 "function_call": function_call,
    671                 "functions": functions,
    672                 "logit_bias": logit_bias,
    673                 "logprobs": logprobs,
    674                 "max_tokens": max_tokens,
    675                 "n": n,
    676                 "presence_penalty": presence_penalty,
    677                 "response_format": response_format,
    678                 "seed": seed,
    679                 "stop": stop,
    680                 "stream": stream,
    681                 "temperature": temperature,
    682                 "tool_choice": tool_choice,
    683                 "tools": tools,
    684                 "top_logprobs": top_logprobs,
    685                 "top_p": top_p,
    686                 "user": user,
    687             },
    688             completion_create_params.CompletionCreateParams,
    689         ),
    690         options=make_request_options(
    691             extra_headers=extra_headers, extra_query=extra_query, extra_body=extra_body, timeout=timeout
    692         ),
    693         cast_to=ChatCompletion,
    694         stream=stream or False,
    695         stream_cls=Stream[ChatCompletionChunk],
    696     )
```