<a href="https://colab.research.google.com/github/cunghocpython/MiAI_Tensorflowjs/blob/main/Working_With_ChatGPT_APIs_Part_I_v0_28.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Working With ChatGPT APIs | Part-I**
In the previous section, we discussed an overview of the various models provided by ChatGPT (natural language, code, images, audio etc) and identified use-cases / tasks that the various models and APIs can be used for.

In this session, we will extensively use the `Completion()` and `ChatCompletion()` API endpoints (for generating one-off responses and conducting chats/conversations, respectively). In this section, we will:
1. Make API calls to the `Completion()` and `ChatCompletion()` endpoints
2. Modify the prompts and make them more nuanced to perform complex tasks
3. Create a very simple 'AI Tutor' using the `ChatCompletion()` endpoint
4. Measure the cost of making API calls via tokens and put guardrails in place to monitor and control costs


## Getting Started with the `Completion()` API in Python

We first need to install the `openai` Python library. In Google colab, you can install libraries by adding an exclamation mark before pip, i.e. `!pip install <library_name>`. You also need to get an OpenAI API key. Create an OpenAI account and [get an API key here](https://platform.openai.com/account/api-keys).

**IMPORTANT NOTE**: If you notice the code below, we've installed version 0.28 of the OpenAI library. This is because the Completions API used in this notebook is supported only till v0.28 of the OpenAI API. The completions model has been marked as legacy and will [retire at the beginning of 2024](https://openai.com/blog/gpt-4-api-general-availability).

For more information, refer to the following [page](https://platform.openai.com/docs/guides/text-generation/completions-api).

In [None]:
# install openai version 0.28
!pip install openai==0.28



To use OpenAI APIs we need to set an API key (you can do that [here](https://platform.openai.com/account/api-keys)). In this instance, I have stored my API key in a file named "OpenAI_API_Key.txt" which is stored on my Google drive.

To read files from Google drive in Colab notebook, we need to "mount the drive" using the command below:

In [None]:
# once you mount your google drive, you can read data from your google drive into the colab notebook
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
# # second (alternate) way to upload files to colab
# # use this to import files from your system to the colab environment

# from google.colab import files
# uploaded = files.upload()

Once the Google drive is mounted, you can access the folders and the files in your Google drive. In ```My Drive```, create a folder ```GenAI_Course_Master/Course_1_ShopAssistAI/Week_2/Session_1``` where you can store this notebook and other files (API keys, data files etc). You can then access all the files in that folder at the filepath ```/content/drive/My Drive/GenAI_Course_Master/Course_1_ShopAssistAI/Week_2/Session_1```.

In [None]:
filepath = '/content/drive/MyDrive/GenAI_Course_Master/Course_1_ShopAssistAI/Week_2/Session_1/'
# linux command to print all files in a directory
!ls "/content/drive/MyDrive/GenAI_Course_Master/Course_1_ShopAssistAI/Week_2/Session_1/"
# or alternatively
# !ls $filepath

AI_tutor_system_message_1.txt	   tata_transcript.txt
OpenAI_API_Key_1.txt		   Working_With_ChatGPT_APIs_Part_II_v0.28.ipynb
OpenAI_API_Key.txt		   Working_With_ChatGPT_APIs_Part_I_v0.28.ipynb
tata_motors_transcript_sample.txt


In [None]:
# import openai and set the API key
import openai
# filepath = "/content/drive/My Drive/GenAI_Course_Master/Course_1_ShopAssistAI/"

with open(filepath + "OpenAI_API_Key_1.txt", "r") as f:
  openai.api_key = ' '.join(f.readlines())

We can now use any of the OpenAI APIs. For natural language (text) and code, for any given model (GPT-3, 3.5 or 4), OpenAI provides API endpoints for three types of tasks:
1. Text completion via the `Completion()` endpoint: Takes in text input (prompt) and generates output, good for single-use input-output
2. Chat completion via the `ChatCompletion()` endpoint: Used for chat-like multi-turn conversation, takes the entire conversation history as input and returns the next response

Let's first start with the `Completion()` API - the basic syntax is as follows. Here is an explanation of the key parameters we pass to the API call. [The official API documentation here](https://platform.openai.com/docs/api-reference/completions) explains everything in detail:
* We use the model `text-davinci-002`, which belongs to the GPT-3.5 family
* `max_tokens` refers to the max number of tokens to be generated
* `temperature` is a number between 0 (most certain/deterministic) and 2 (most random), defaults to 1




In [None]:
# using the Completion API

# define an input prompt
prompt = '''You are a helpful Python teaching assistant. Explain the various list indexing methods in Python. Provide an
exhaustive summary of the methods describing what they do, sample code for each, and guidelines on when to use which method.
'''

chat_response = openai.Completion.create(
    model="text-davinci-002",
    prompt=prompt,
    max_tokens=200,
    temperature=0.5,
    n=1,
    stop=None,
    frequency_penalty=0,
    presence_penalty=0)

print(chat_response)

{
  "id": "cmpl-8VDCcFYoM2yzYNcrpO3842E2vosAx",
  "object": "text_completion",
  "created": 1702449610,
  "model": "text-davinci-002",
  "choices": [
    {
      "text": "\nThere are three main ways to index a list in Python:\n\n1. By position: my_list[0]\n2. By value: my_list.index(3)\n3. By slice: my_list[1:3]\n\n1. By position:\n\nThis method returns the element at the given position in the list. For example, if we have a list of numbers:\n\nmy_list = [1, 2, 3, 4, 5]\n\nWe can get the first element by indexing the list like this:\n\nmy_list[0]\n\nThis would return the value 1. We can also index from the end of the list by using negative numbers:\n\nmy_list[-1]\n\nThis would return the value 5.\n\n2. By value:\n\nThis method returns the position of the given value in the list. For example, if we have a list",
      "index": 0,
      "logprobs": null,
      "finish_reason": "length"
    }
  ],
  "usage": {
    "prompt_tokens": 48,
    "completion_tokens": 200,
    "total_tokens": 248


In [None]:
print(type(chat_response))

<class 'openai.openai_object.OpenAIObject'>


The API returns a dictionary-like object. Also, notice that the API returns the number of `total_tokens` used (prompt tokens + completion tokens).

<br>

The reply we are interested in is the `text` inside `choices`, which seems to be a list. We can access that as follows:

In [None]:
# retrieve the response text
print(chat_response.choices[0]["text"])


There are three main ways to index a list in Python:

1. By position: my_list[0]
2. By value: my_list.index(3)
3. By slice: my_list[1:3]

1. By position:

This method returns the element at the given position in the list. For example, if we have a list of numbers:

my_list = [1, 2, 3, 4, 5]

We can get the first element by indexing the list like this:

my_list[0]

This would return the value 1. We can also index from the end of the list by using negative numbers:

my_list[-1]

This would return the value 5.

2. By value:

This method returns the position of the given value in the list. For example, if we have a list


We can increase the number of `max_tokens` to get a more detailed response.

In [None]:
# increased number of tokens
chat_response = openai.Completion.create(
    model="text-davinci-002",
    prompt=prompt,
    max_tokens=800,
    temperature=0.5,
    n=1,
    stop=None,
    frequency_penalty=0,
    presence_penalty=0)

print(chat_response.choices[0]["text"])


There are four different ways to index lists in Python:

1. Positive indexing: accessing elements from the beginning of the list.

2. Negative indexing: accessing elements from the end of the list.

3. Slicing: accessing a range of elements from the list.

4. Extended slicing: accessing a range of elements from the list with a step size other than 1.

Positive indexing is the most common form of indexing and is used to access elements from the beginning of the list. The syntax for positive indexing is:

list_name[index]

For example, if we have a list called my_list, we can access the first element of the list by using the index 0:

my_list[0]

We can also access the second element of the list by using the index 1:

my_list[1]

And so on.

Negative indexing is used to access elements from the end of the list. The syntax for negative indexing is:

list_name[-index]

For example, if we have a list called my_list, we can access the last element of the list by using the index -1:

my_list

> ### Exercise: Explore and play with the `Completions()` API



*   Read the [completions API documentation](https://platform.openai.com/docs/api-reference/completions), experiment with modifying the various parameters
* From the OpenAI API docs, find out the maximum number of tokens that can be used with the `text-davinci-002` model
*   The prompt in the previous example is about explaining Python list indexing methods. Try to modify the prompt such that the user can provide a topic as a variable to the prompt, for e.g. dictionaries, dataframes, etc.



## Creating More Complex Prompts
We can now modify the prompts such that users can provide inputs to it. For e.g. we may want to let the user specify the name of a topic.

In [None]:
# topic as input
topic_name = "Indexing in Pandas Dataframes"

prompt = '''You are a helpful Python teaching assistant. Explain the following topic in detail. Provide an exhaustive
summary of the methods describing what they do, sample code for each, and guidelines on when to use which method.
The topic is: {0}'''.format(topic_name)

# call the API with the new prompt
chat_response = openai.Completion.create(
    model="text-davinci-002",
    prompt=prompt,
    max_tokens=800,
    temperature=0.5,
    n=1,
    stop=None,
    frequency_penalty=0,
    presence_penalty=0)

print(chat_response.choices[0]["text"])



Indexing in pandas dataframes refers to the process of selecting specific rows and columns from a dataframe. This can be done using the indexing operator, which is the square bracket ([ ]) operator.

There are two ways to index dataframes:

1. By position: This is the default indexing method in pandas. It uses numerical indices to select rows and columns. For example, to select the first row of a dataframe, we would use the following code: df[0]

2. By label: This indexing method uses the row and column labels to select data. For example, to select the first row of a dataframe, we would use the following code: df.loc[0]

To select multiple rows or columns, we can use the indexing operator with a list of indices. For example, to select the first and third rows of a dataframe, we would use the following code: df[[0, 2]]

We can also use the indexing operator to select a range of rows or columns. For example, to select the first, second, and third rows of a dataframe, we would use the f

Depending on the application, we often want to provide additional instructions to the prompt, such as *explain at a beginner level*, *explain step by step*, or any other specific, detailed information, such as *use the following two page document to answer the user's question*. We can do that by some simple text manipulation hacks.

<br>

For example, say you want to develop an information retrieval app for Financial Analysts which can provide them information from documents such as investor presentations, annual reports, quarterly earnings calls, etc. As a sample, see [this earnings call report](https://www.tatamotors.com/wp-content/uploads/2023/05/earnings-call-transcript-q4-fy23.pdf) of an Indian automobile company, Tata Motors.

For demonstration, we have taken a small toy-sized sample of this transcript and put it in a txt file `tata_motors_transcript_sample.txt` (this file is uploaded along with the colab notebook, so we can read the file directly as shown below).

If you want to use any other file that you have on your system, you can upload it via the following command:

In [None]:
# use this to import files from your system to the colab environment

# from google.colab import files
# uploaded = files.upload()

In [None]:
with open(filepath + "tata_motors_transcript_sample.txt", "r") as f:
  transcript = ' '.join(f.readlines())

print(len(transcript))
print(transcript)

3370
Fair to say it  has been an extremely satisfying quarter. And the reason I say, use  that word is that, nice to 
  see all the auto verticals coming together once again and this time  with a lot of intensity as well. So both 
  the alignment of the vectors are there and the magnitude  of vectors are also increasing, which is what has 
  translated into a strong set of numbers  for the quarters, resulting on multiple highs  and I will quickly  cover 
  that in the coming slides . We en ded the year on a pretty strong note with revenue of  around  Rs. 1 lakh 
  crores  with an EBITDA of 13.3% , and the profit before tax and exceptional item of Rs. 5,000  crores.  
  On a full -year basis, we hit our highest ever revenue at  around  Rs. 3.5 lakh crores and ended the year  with 
  a positive free cash flow  of Rs. 7,800 crores, despite a very weak start in Q1 and  Q2, which you see in the 
  numbers . The business has been sequentially improving its performance  and doing it in signif

Now, we design a prompt comprising of three entities - a) the base instruction which specifies the task to ChatGPT, b) the question asked by the user (analyst), c) the earnings call transcript using which it is supposed to find an answer.

In [None]:
base_instruction = '''You are a helpful assistant which helps financial analysts retrieve relevant financial and business related information from documents.
Given below is a question and the transcript of an earnings call of an automobile company, Tata Motors, which was attended by the top management of the firm.
Try to respond with specific numbers and facts wherever possible. If you are not sure about the accuracy of the information, just respond that you do not know'''

question = "How much free cash flow did Tata Motors have at the end of the year?"

prompt = base_instruction + "\n\n" +  "Question: {0}".format(question) + "\n\n" + "Transcript: \n {0}".format(transcript)

print(prompt)


You are a helpful assistant which helps financial analysts retrieve relevant financial and business related information from documents.
Given below is a question and the transcript of an earnings call of an automobile company, Tata Motors, which was attended by the top management of the firm.
Try to respond with specific numbers and facts wherever possible. If you are not sure about the accuracy of the information, just respond that you do not know

Question: How much free cash flow did Tata Motors have at the end of the year?

Transcript: 
 Fair to say it  has been an extremely satisfying quarter. And the reason I say, use  that word is that, nice to 
  see all the auto verticals coming together once again and this time  with a lot of intensity as well. So both 
  the alignment of the vectors are there and the magnitude  of vectors are also increasing, which is what has 
  translated into a strong set of numbers  for the quarters, resulting on multiple highs  and I will quickly  cover

In [None]:
# call the API with the new prompt
chat_response = openai.Completion.create(
    model="text-davinci-002",
    prompt=prompt,
    max_tokens=1000,
    temperature=0.5,
    n=1,
    stop=None,
    frequency_penalty=0,
    presence_penalty=0)

print(chat_response.choices[0]["text"])


Tata Motors had a positive free cash flow of Rs. 7,800 crores at the end of the year.


In [None]:
# another question
question = "Summarise the key financial metrics reported in the earnings call related to revenue growth, profitability, cash flow and debt."

prompt = base_instruction + "\n\n" +  "Question: {0}".format(question) + "\n\n" + "Transcript: \n {0}".format(transcript)

chat_response = openai.Completion.create(
    model="text-davinci-002",
    prompt=prompt,
    max_tokens=1000,
    temperature=0.5,
    n=1,
    stop=None,
    frequency_penalty=0,
    presence_penalty=0)

print(chat_response.choices[0]["text"])


In the earnings call, the company reported revenue growth of 35%, with volume and mix contributing 24% and price contributing 10.5%. The company also reported profitability of 3.2%, which increased to 6.8%. Additionally, the company reported debt reduction of Rs. 43,700 crores and TML India at Rs. 6,200 crores, JLR GBP 3 billion i.e. Rs. 30,000 crores.


In [None]:
# ask something not mentioned in the transcript sample
question = "How much equity funding did Tata Motors raise from institutional investors in this quarter?"

prompt = base_instruction + "\n\n" +  "Question: {0}".format(question) + "\n\n" + "Transcript: \n {0}".format(transcript)

chat_response = openai.Completion.create(
    model="text-davinci-002",
    prompt=prompt,
    max_tokens=1000,
    temperature=0.5,
    n=1,
    stop=None,
    frequency_penalty=0,
    presence_penalty=0)

print(chat_response.choices[0]["text"])


Tata Motors raised Rs. 771 crores from institutional investors in this quarter.


> ### Exercise: Refining the Prompt
* Modify the prompt so ChatGPT can do the following tasks:
 - When the user asks for information not present in the transcript, ChatGPT must respond accordingly, rather than providing an incorrect answer
 - If ChatGPT is unsure of the answer, it must provide an appropriate response

## Summary

In this section of the tutorial, we used the `Completion()` API to access ChatGPT via Python, wrote prompts that can take variable inputs, and tinkered with some ways to add external information to the prompts.

In the next section, we will build longer, chat-like programs using the `ChatCompletion()` API.