# <a id='toc1_'></a>[Working with the GPT API](#toc0_)

**Table of contents**<a id='toc0_'></a>    
- [Working with the GPT API](#toc1_)    
  - [Getting an API KEY](#toc1_1_)    
  - [OpenAI Python Package Installation](#toc1_2_)    
  - [Chat Completions API](#toc1_3_)    
    - [Chat completions response format](#toc1_3_1_)    
    - [Conversation history](#toc1_3_2_)    
    - [Creating a basic conversation loop](#toc1_3_3_)    
    - [Request parameters](#toc1_3_4_)    
  - [Using Chat Completion for non-chat scenarios](#toc1_4_)    
    - [Sentiment Analysis](#toc1_4_1_)    
    - [Language Translation](#toc1_4_2_)    
- [Creating a chat bot interface with Streamlit](#toc2_)    
- [Extra: creating a chatbot with gradio for front-end UI](#toc3_)    
- [Resources](#toc4_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

## <a id='toc1_1_'></a>[Getting an API KEY](#toc0_)

The API empowers you with greater control and versatility to work with the GPT model (model inside ChatGPT). It also allows seamless integration with other applications.

To access the model through the API, you will need an API key. For this demo, you'll be using your LT's API KEY.

*To obtain your own API key, you'll need to create an account and set up billing. You can create your account at https://platform.openai.com/.*

In this demo, we will read the api key from a config.py file. Create a `config.py` file and paste the key your lead teacher gave you in that file.

Alternatively, save it as an environment variable and read it as it follows

```python
import os

api_key = os.getenv("OPENAI_API_KEY")
```

In [None]:
# Import API key
from config import OPEN_API_KEY

## <a id='toc1_2_'></a>[OpenAI Python Package Installation](#toc0_)

To utilize the GPT API, you'll need to have the OpenAI Python package installed.

You can easily install it by running the command pip install --upgrade openai. *Adding the --upgrade flag ensures that you have the most up-to-date version, in case you installed openai before, as the GPT API is a recently introduced feature.*

In [None]:
# !pip install --upgrade openai

In [None]:
import openai
from pprint import pprint
import json

# load and set our key
openai.api_key = OPEN_API_KEY

## <a id='toc1_3_'></a>[Chat Completions API](#toc0_)

To use a GPT model via the OpenAI API, you’ll send a request containing the inputs and your API key, and receive a response containing the model’s output.

As of July 2023 there are two main APIs endpoints to work with GPT models.
- Completions API endpoint: only for the older legacy models
- Chat Completions API endpoint: to access the latest models, gpt-4 and gpt-3.5-turbo.

Chat models in Chat Completions API take as mandatory parameters:
- **List of messages as input**
- **Model**: we will use gpt-3.5-turbo

They return a **model-generated message as output**.

An example API call looks as follows:

In [None]:
response = openai.chat.completions.create(
  model="gpt-3.5-turbo",
  messages=[ # messages parameter must be a list of dictionaries
    # can be as short as one message or many back and forth turns.
    {"role": "system", "content": "You are a famous chef. Share your best cooking tips and tricks."}, # each dictionary has a role and content
    {"role": "user", "content": "What are the ingredients for making pancakes?"},
  ],
  seed=42,
  temperature=1,
  n=1
)

In conversations using the Chat Completion API, each message has a **role ("system," "user," or "assistant").** Typically, a conversation starts with a system message to set the assistant's behavior, followed by alternating user and assistant messages.
- The system message is optional and can be used to customize the assistant's personality or provide specific instructions
- User messages contain requests or comments (prompts)
- Assistant messages store previous assistant responses or serve as examples of desired behavior.

### <a id='toc1_3_1_'></a>[Chat completions response format](#toc0_)

An example chat completions API response looks as follows:

In [None]:
response

In [None]:
# How does the response look like?
pprint(json.loads(response.json()))

In Python, the assistant’s reply can be extracted by following the attributes of the `ChatCompletion` response object:

In [None]:
print(response.choices[0].message.content)

In [None]:
# How many choices do we have?
print(len(response.choices)) 



Every response includes a **finish_reason**. The possible values for finish_reason are:

- **stop**: API returned complete model output.
- **length**: Incomplete model output due to max_tokens parameter or token limit.
- **content_filter**: Omitted content due to a flag from our content filters.
- **null**: API response still in progress or incomplete.



In [None]:
# Let's check the finish reason
response.choices[0].finish_reason

### <a id='toc1_3_2_'></a>[Conversation history](#toc0_)

Since it recommended Baking Powder, let's ask how much in another prompt:

In [None]:
response = openai.chat.completions.create(
  model="gpt-3.5-turbo",
  messages=[
    {"role": "user", "content": "How much Baking Powder do you recommend for this recipe?"}
  ]
)

In [None]:
pprint(response.choices[0].message.content)

In [None]:
len(response.choices)

We can see that the history was not saved.

**Including conversation history is crucial** when user instructions refer to prior messages. We can do that by sending all the prompts, with its role, in the *messages* parameter.

In [None]:
response = openai.chat.completions.create(
  model="gpt-3.5-turbo",
  messages=[
    {"role": "system", "content": "You are a famous chef. Share your best cooking tips and tricks."},
    {"role": "user", "content": "What are the ingredients for making pancakes?"},
    {"role": "assistant", "content": "To make pancakes, you'll need flour, eggs, milk, baking powder, and a pinch of salt."}, # we shortened it for our demo purpose
    {"role": "user", "content": "Can I use almond milk instead of regular milk?"}
  ]
)

In [None]:
pprint(response.choices[0].message.content)

**Each user instruction relies on the prior messages in the conversation history to make sense.**

Since the language models don't have inherent memory of past requests, it's important to include the relevant conversation history in each API request. If the conversation exceeds the model's token limit, it may need to be truncated or shortened while ensuring the essential context and instructions are retained.

### <a id='toc1_3_3_'></a>[Creating a basic conversation loop](#toc0_)

This example demonstrates a conversation loop that performs the following tasks:

1. Takes console input continuously and formats it as the user's role content within the messages array.
2. Prints the model's response to the console and formats it as the assistant's role content within the messages array.

This approach ensures that each time a new question is asked, the ongoing conversation transcript is sent along with the latest question. Since the model lacks memory, it's crucial to include an updated transcript with each new question. Otherwise, the model may lose context from previous questions and answers.

In [None]:
message_history=[{"role": "system", "content": "You are a helpful assistant."}]

def gpt_response(inp, message_history):
    # We save the user's input
    message_history.append({"role": "user", "content": f"{inp}"})

    # Generate a response from the chatbot model
    completion = openai.chat.completions.create(
      model="gpt-3.5-turbo",
      messages=message_history
    )

    # We save the assistant response
    message_history.append({"role": "assistant", "content": f"{completion.choices[0].message.content}"})

    return message_history

In [None]:
while(True):
    message_history = gpt_response(input("> "), message_history) # Lets ask by input different prompts
    print(message_history[-1]["content"]) # Print the last response

Let's check the message history to see if it has all the conversation.

In [None]:
pprint(message_history)

### <a id='toc1_3_4_'></a>[Request parameters](#toc0_)

Let's take a look at some request parameters.

- **Model**: Model type (e.g., GPT-3.5 turbo., GPT 4)
- **Prompt**: expects a list of messages in a chat-based format
- **Temperature (default 1)**: sampling temperature. Between 0 and 2. Higher value means more diverse and random output, while a lower value makes it more focused and deterministic.
- **Max tokens (default 16)**: limits the length of the generated response (max length)

In [None]:
# Lets rewrite the gpt_response function to include possible parameters
def gpt_response(inp, message_history, **params):
    # We save the user's input
    message_history.append({"role": "user", "content": f"{inp}"})

    # Generate a response from the chatbot model
    completion_params = {
        "model": "gpt-3.5-turbo",
        "messages": message_history,
        **params  # Include additional parameters
    }

    completion = openai.chat.completions.create(**completion_params) # Include additional parameters

    # We save the assistant response
    message_history.append({"role": "assistant", "content": f"{completion.choices[0].message.content}", "finish_reason": completion.choices[0].finish_reason})

    return message_history

Lets explore different settings by using a **max_tokens** value of 100 and testing three temperature levels (0, 1, and 2) to generate responses from the model, completing the prompt "My favourite animal is."

In [None]:
message_history=[{"role": "system", "content": "Complete the prompt."}]

for i in [0, 1, 2]:
    message_history = gpt_response("My favourite animal is ",
                                   message_history,
                                   max_tokens=100,
                                   temperature=i)
    print(message_history[-1]["content"]+"\n") # Print the last response
    print(message_history[-1]["finish_reason"]+"\n") # Print the last response

- **top_p** (Defaults to 1)
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.

A higher value gives access to more tokens (and more diversity) and a lower value is more deterministic.

We generally recommend altering this or temperature but not both.



Let's explore different settings by using a **max_tokens** value of 100 and testing two **top_p** levels (0 and 1) to generate responses from the model, completing the prompt "My favourite animal is."

In [None]:
message_history=[{"role": "system", "content": "Complete the prompt."}]

for i in [0, 0, 0, 1, 1, 1]:
    message_history = gpt_response("My favourite animal is ",
                                   message_history,
                                   max_tokens=100,
                                   top_p=i)
    print(message_history[-1]["content"]+"\n") # Print the last response

- **presence_penalty** (Defaults to 0)
Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics. **Higher values promote creativity by penalising the model when it uses predefined tokens.**

- **frequency_penalty** (Defaults to 0)
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. **Higher values penalise the model for repetition and reward variety.**


Lets explore different settings by using a max_tokens value of 100 and testing two presence penalty and frequency penalty levels (-2 and 2) to generate responses from the model, using the prompt "generate 20 ways to say you can't buy that because you're broke"

In [None]:
message_history=[]

for i in [-2, 2]:
    message_history = gpt_response("generate 20 ways to say you can't buy that because you're broke",
                                   message_history,
                                   max_tokens=100,
                                   presence_penalty=i,
                                   frequency_penalty=i
                                )
    print(message_history[-1]["content"]+"\n")# Print the last response
    print(message_history[-1]["finish_reason"]+"\n")# Print the last response

## <a id='toc1_4_'></a>[Using Chat Completion for non-chat scenarios](#toc0_)
The Chat Completion API is designed to work with multi-turn conversations, but it also works well for non-chat scenarios.

### <a id='toc1_4_1_'></a>[Sentiment Analysis](#toc0_)

Lets set up a sentiment analysis scenario where a user inputs a tweet, and the program generates responses using GPT, providing sentiment predictions until interrupted.

We will give the role *system* the task to decide whether a Tweet's sentiment is positive, neutral, or negative.
We include as a user pre-prompt the example "I loved the new Batman movie! Sentiment:", and an example assistant answer "Positive".

In [None]:
# Lets give an example of sentiment analysis
message_history=[
    {
      "role": "system",
      "content": "Decide whether a Tweet's sentiment is positive, neutral, or negative."
    },
    {
      "role": "user",
      "content": "Tweet: \"I loved the new Batman movie!\"\nSentiment:"
    },
    {
      "role": "assistant",
      "content": "Positive"
    }
  ]
print("Write a tweet: ")

while(True):
    message_history = gpt_response(input("> "),
                                   message_history,
                                   temperature=0,
                                   max_tokens=60,
                                   frequency_penalty=0.5)
    print(message_history[-1]["content"])

### <a id='toc1_4_2_'></a>[Language Translation](#toc0_)

Lets set up a translation scenario where a user inputs a phrase to be translated into Spanish and Portuguese. The program generates responses using GPT, providing translations until interrupted.

We will give as a system pre-prompt to translate sentences into languages of our choice:

In [None]:
message_history=[{"role": "system", "content": "Translate this into German, Spanish, Polish, Tamil, and Romanian"}
  ]
print("Write a phrase to translate to German, Spanish, Polish, Tamil, and Romanian: ")

while(True):
    message_history = gpt_response(input("> "), message_history,temperature=0.3,max_tokens=200)
    print(message_history[-1]["content"])

# <a id='toc2_'></a>[Creating a chat bot interface with Streamlit](#toc0_)

Please refer to the `GPT_streamlit.py` script for this part of the lesson.

# <a id='toc3_'></a>[Extra: creating a chatbot with gradio for front-end UI](#toc0_)

Gradio is a Python library that allows you to quickly create customizable UIs for machine learning models, or for any kind of Python function or code snippet, with just a few lines of code. It simplifies the process of deploying and sharing models or code by generating interactive interfaces for input and output data.

To create a chatbot with Gradio for the front-end user interface, follow these steps:

1. Install the necessary packages

In [None]:
#!pip install gradio openai

2. Import the required libraries

In [None]:
import gradio as gr
# import openai

3. Set up your OpenAI API credentials by assigning your API key to openai.api_key

In [None]:
# openai.api_key = "YOUR_API_KEY"

4. Define a function that generates chatbot responses based on user input

In [None]:
# Set up the conversation history with user and assistant messages
message_history = [{"role": "system", "content": f"Respond my prompts in a Harry Potter style"},
                   {"role": "assistant", "content": f"OK"}]

def gpt_response_ui(inp):
    global message_history

    message_history = gpt_response(inp, message_history, max_tokens = 50)

    # Get pairs (as tuples) of msg["content"] from message history,representing one exchange between the user and the chatbot, skipping the pre-prompt
    response = [(message_history[i]["content"], message_history[i+1]["content"]) for i in range(2, len(message_history)-1, 2)]  # convert to tuples of list
    return response

5. Define the Gradio interface

In [None]:
with gr.Blocks() as demo: #creates a Gradio interface

    chatbot = gr.Chatbot() #creates a chatbot instance

    with gr.Row(): #creates a row within the Gradio interface to contain components
        txt = gr.Textbox(show_label=False, placeholder="Enter text and press enter") #text input field
        # `show_label=False` parameter hides the label associated with the textbox

    txt.submit(gpt_response_ui, txt, chatbot) #sets the submit action to the `gpt_response` function
    txt.submit(lambda :"", None, txt) #this clears the textbox when the user submits their input

6. Run the Gradio interface

In [None]:
demo.launch(share=True) #To create a public link, set `share=True` in `launch()`.

Source Inspiration: MIT License - Copyright (c) 2023 Harrison

# <a id='toc4_'></a>[Resources](#toc0_)

- Understanding transformers, by StatQuest:
    - [Recurrent Neural Networks (RNNs) (17 min)](https://youtu.be/AsNTP8Kwu80?si=yLu1H5CEVd4dX7hm)  
    - [Long Short-Term Memory Networks (LSTMs) (21 min)](https://youtu.be/YCzL96nL7j0?si=-i22L3FuAC6BWLSU)  
    - [seq2seq Encoder-Decoder (17 min)](https://youtu.be/L8HKweZIOmg?si=ROsiu2V4A8EfWPjN)  
    - [Attention for Neural Networks (16 min)](https://youtu.be/PSs6nxngL6k?si=Kk3U02Px3ij98RiX)  
    - [Transformer Neural Networks (36 min)](https://youtu.be/zxQyTK8quyY?si=StA0ZLl702j3br4T)