# Building a Project with the ChatGPT API

### Install the necessary libraries.

**OpenAI Python Library** (openai): (pip install openai)

* **Purpose**: This library provides a **convenient interface** to interact with OpenAI’s APIs, including the Language Model API (for GPT-3, GPT-4) and other specialized APIs.
* **Features**: Allows you to send requests to OpenAI’s endpoints, manage API keys, handle responses, and integrate AI capabilities into Python applications.

In [None]:
pip install openai

Collecting openai
  Downloading openai-1.35.15-py3-none-any.whl.metadata (22 kB)
Collecting distro<2,>=1.7.0 (from openai)
  Downloading distro-1.9.0-py3-none-any.whl.metadata (6.8 kB)
Collecting pydantic<3,>=1.9.0 (from openai)
  Downloading pydantic-2.8.2-py3-none-any.whl.metadata (125 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m125.2/125.2 kB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0m
Collecting tqdm>4 (from openai)
  Downloading tqdm-4.66.4-py3-none-any.whl.metadata (57 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m57.6/57.6 kB[0m [31m1.2 MB/s[0m eta [36m0:00:00[0mta [36m0:00:01[0m
Collecting annotated-types>=0.4.0 (from pydantic<3,>=1.9.0->openai)
  Downloading annotated_types-0.7.0-py3-none-any.whl.metadata (15 kB)
Collecting pydantic-core==2.20.1 (from pydantic<3,>=1.9.0->openai)
  Downloading pydantic_core-2.20.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)
Downloading openai-1.35.15-py3-none-an

The command pip install openai[datalib] is used to install the OpenAI Python library along with its **additional dependencies for data-related tasks**. Here's a breakdown of its components and usage

In [None]:
pip install openai[datalib]

Collecting pandas-stubs>=1.1.0.11 (from openai[datalib])
  Downloading pandas_stubs-2.2.2.240603-py3-none-any.whl.metadata (10 kB)
Collecting types-pytz>=2022.1.1 (from pandas-stubs>=1.1.0.11->openai[datalib])
  Downloading types_pytz-2024.1.0.20240417-py3-none-any.whl.metadata (1.5 kB)
Downloading pandas_stubs-2.2.2.240603-py3-none-any.whl (157 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m157.0/157.0 kB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading types_pytz-2024.1.0.20240417-py3-none-any.whl (5.2 kB)
Installing collected packages: types-pytz, pandas-stubs
Successfully installed pandas-stubs-2.2.2.240603 types-pytz-2024.1.0.20240417
Note: you may need to restart the kernel to use updated packages.


In [None]:
pip install urllib3==1.26.6


Collecting urllib3==1.26.6
  Downloading urllib3-1.26.6-py2.py3-none-any.whl.metadata (44 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.3/44.3 kB[0m [31m1.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading urllib3-1.26.6-py2.py3-none-any.whl (138 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m138.5/138.5 kB[0m [31m4.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: urllib3
  Attempting uninstall: urllib3
    Found existing installation: urllib3 2.0.7
    Uninstalling urllib3-2.0.7:
      Successfully uninstalled urllib3-2.0.7
Successfully installed urllib3-1.26.6
Note: you may need to restart the kernel to use updated packages.


In [None]:

pip install python-dotenv

Collecting python-dotenv
  Downloading python_dotenv-1.0.1-py3-none-any.whl.metadata (23 kB)
Downloading python_dotenv-1.0.1-py3-none-any.whl (19 kB)
Installing collected packages: python-dotenv
[0mSuccessfully installed python-dotenv-1.0.1
Note: you may need to restart the kernel to use updated packages.


### Import the libraries and environment file to gain access to the Open API Key
#### The key can be generated here: https://platform.openai.com/account/api-keys

In [None]:
import os
from openai import OpenAI


from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

### Authenticate to the API using the API Key
#### Pull from environment variables or use api_key = ("your_key_here") to hardcode the key

In [None]:
client = OpenAI(
  api_key=os.environ['OPENAI_API_KEY']
)

# Exploring OpenAI APIs

## **Chat Completion API**
- uses GPT 3.5 and 4
- **Completion params** - input - messages to the model
  * role: system/user
  * context: context of a message
- **Completion** - models output - generated responce.

The Chat Completion API is a tool that leverages advanced language models, like GPT-3.5-turbo, to generate conversational responses. The API is designed to create and manage interactive dialogues, enabling developers to build sophisticated chatbots that can understand and respond to user inputs in a natural and coherent manner.

### Key Concepts

1. **Model**: The `model` parameter specifies the language model to be used. In this case, `gpt-3.5-turbo` is selected, which is an advanced version of OpenAI's language models known for its enhanced conversational abilities.

2. **Messages**: The `messages` parameter is a list of message objects that define the conversation context. Each message has two attributes:
   - `role`: This specifies the role of the entity in the conversation. Common roles include:
     - `"system"`: Provides initial instructions or context to the assistant.
     - `"user"`: Represents the user's input or query.
     - `"assistant"`: Represents the assistant's response.
   - `content`: The actual text content of the message.

### Response Handling

The API call `client.chat.completions.create(...)` sends the conversation context to the model, which generates an appropriate response based on the given input and instructions. The response would typically include the assistant's reply, continuing the conversation in a natural and informative manner.


In [None]:
response = client.ChatCompletion.create(
  model="gpt-3.5-turbo",
  messages=[
       {"role": "system", "content": '''You are a helpful assistant that acts as a
                                        sous chef.'''},
       {"role": "user", "content": '''When should I use Capellini pasta?'''}
  ]
)

**Top-Level Responce Parameters**

1. **id**: Unique identifier for the chat completion request.

2. **choices**: Array containing the generated completion options by the model.

3. **created**: Timestamp of when the request was created.

4. **model**: Identifier of the model used for generating the completion.

5. **object**: Type of response object, indicating it's a chat completion.

6. **system_fingerprint**: Internal identifier used by the system (often null).

7. **usage**: Information on token usage for this request.

Inside `choices` Array

1. **finish_reason**: Reason why the generation stopped (e.g., "stop", "length").

2. **index**: Position of this completion option in the array.

3. **message**: The generated message content.

Inside `message`

1. **content**: Text content of the generated message.

2. **role**: Role of the message sender (e.g., "assistant").

3. **function_call**: Information about any function call invoked (often null).

4. **tool_calls**: Information about any tools used (often null).

Inside `usage`

1. **completion_tokens**: Number of tokens generated by the model for this completion.

2. **prompt_tokens**: Number of tokens in the input prompt.

3. **total_tokens**: Total number of tokens used in the request (prompt + completion).

In [None]:
print(response.model_dump_json(indent=2))

{
  "id": "chatcmpl-8JseMtjWz9g497PDsGu7O8rgT7o7n",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "Capellini, also known as angel hair pasta, is a thin and delicate pasta variety that cooks quickly. It is best used in dishes that require a light and delicate pasta texture. Capellini is popularly used in Italian dishes with light sauces to maximize the pasta's fine texture.\n\nHere are a few instances where capellini pasta is commonly used:\n\n1. Light Cream or Butter Sauces: Capellini pairs well with light cream sauces such as lemon butter, garlic cream, or tomato cream sauces. The fine strands of capellini can absorb and hold onto these sauces well.\n\n2. Olive Oil-Based Sauces: Capellini can be tossed with simple olive oil-based sauces with ingredients like garlic, red pepper flakes, and fresh herbs. This allows the pasta to shine through with its delicate texture, while the flavors of the sauce enhance the overall dish.\n

**Promt engenearing example**
since the provided responce is **too long**
we set the **num of words to 15** in our next example

we can see that responce is much shorter now

In [None]:
response = client.ChatCompletion.create(
  model="gpt-3.5-turbo",
  messages=[
       {"role": "system", "content": '''You are a helpful assistant that acts as a
                                        sous chef.'''},
       {"role": "user", "content": '''Can you tell me when I should use Capellini
                               pasta in 15 words or less?'''}
  ]
)

In [None]:
print(response.model_dump_json(indent=2))

{
  "id": "chatcmpl-8JsezDmmeTXwAsDp9ucOtmYFFqGwG",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "Use Capellini pasta when you want a delicate and light pasta option in your dish.",
        "role": "assistant",
        "function_call": null,
        "tool_calls": null
      }
    }
  ],
  "created": 1699749037,
  "model": "gpt-3.5-turbo-0613",
  "object": "chat.completion",
  "system_fingerprint": null,
  "usage": {
    "completion_tokens": 18,
    "prompt_tokens": 46,
    "total_tokens": 64
  }
}


## **Completion API**

Generate Text Using Text **Completion API**
* GPT 3 based
- tempreture - how random the answer can be, 0 - least ramdom
- **stochastic** in nature - each time can provide a different result

In [None]:
response = client.completions.create(
    model="text-davinci-003",
    prompt="Write a title for a course on the OpenAI API",
    max_tokens=256,
    temperature=0
)

In [None]:
print(response.model_dump_json(indent=2))

{
  "id": "cmpl-8Jv2wi1IAQ9sFvO9mmfvxIsyMxAcJ",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "text": "\n\n\"Getting Started with the OpenAI API: Unlocking the Power of AI\""
    }
  ],
  "created": 1699758210,
  "model": "text-davinci-003",
  "object": "text_completion",
  "system_fingerprint": null,
  "usage": {
    "completion_tokens": 18,
    "prompt_tokens": 11,
    "total_tokens": 29
  },
}


## **Embeddings API**

**Embeddings**

* Overview
Embeddings are dense vector representations of data, often used in natural language processing (NLP) and machine learning. They transform high-dimensional input data, like words or sentences, into lower-dimensional continuous vector spaces. This transformation captures semantic relationships and similarities between inputs, enabling more effective data analysis and processing.

Embeddings are a fundamental component of modern NLP and machine learning applications. Their ability to represent complex data in a meaningful and computationally efficient manner makes them invaluable for a wide range of tasks, from clustering and sentiment analysis to similarity search and beyond. By leveraging embeddings, practitioners can build more accurate and effective models that understand and process natural language data in sophisticated ways.

**How Embeddings Are Created:**
- **Word2Vec**: Generates embeddings by predicting context words from a target word (Skip-gram) or vice versa (CBOW).
- **GloVe**: Creates embeddings by factoring word co-occurrence matrices.
- **BERT/Transformers**: Uses deep learning models to produce contextual embeddings based on the surrounding text.

* Applications of Embeddings

1. **Clustering**
   - **Purpose**: Group similar items (words, sentences, documents) together.
   - **Example**: Customer reviews can be clustered to identify common themes or topics.
   - **How It Works**: Embeddings enable clustering algorithms (e.g., K-means, DBSCAN) to group data points based on their vector similarities.

2. **Sentiment Analysis**
   - **Purpose**: Determine the sentiment (positive, negative, neutral) of a piece of text.
   - **Example**: Analyzing social media posts to gauge public opinion on a product.
   - **How It Works**: Embeddings transform text into vectors, which are then input into classifiers (e.g., logistic regression, neural networks) trained to predict sentiment.

3. **Similarity Search**
   - **Purpose**: Find items similar to a given query.
   - **Example**: Recommending products similar to ones a user has viewed.
   - **How It Works**: Compute vector similarities (e.g., cosine similarity) between the query and items in the dataset, returning the most similar items.

4. **Information Retrieval**
   - **Purpose**: Retrieve relevant documents in response to a query.
   - **Example**: Search engines use embeddings to match user queries with relevant web pages.
   - **How It Works**: Embeddings of the query and documents are compared to find the closest matches.

5. **Machine Translation**
   - **Purpose**: Translate text from one language to another.
   - **Example**: Translating an English sentence to French.
   - **How It Works**: Embeddings capture semantic meaning across languages, aiding in generating accurate translations.

6. **Named Entity Recognition (NER)**
   - **Purpose**: Identify and classify entities (e.g., names, dates, locations) in text.
   - **Example**: Extracting company names from news articles.
   - **How It Works**: Embeddings help models understand the context and significance of entities within the text.

* Advantages of Embeddings
- **Dimensionality Reduction**: Simplifies high-dimensional data into manageable vector forms.
- **Semantic Understanding**: Captures nuanced relationships between data points.
- **Transferability**: Pre-trained embeddings (e.g., GloVe, BERT) can be fine-tuned for specific tasks, reducing the need for extensive training data.


**Measure Relatedness Using Embeddings**

In [None]:
response = client.embeddings.create(
    input="The cat is sitting on the mat",
    model="text-embedding-ada-002"
)

In [None]:
cat_embeddings = response.data[0].embedding

In [None]:
print(cat_embeddings)

[1.691398756520357e-05, 0.0018407186726108193, -0.017408939078450203, -0.005649321246892214, -0.013212481513619423, 0.0038849017582833767, -0.015005513094365597, -0.03743748366832733, -0.00770622119307518, -0.006618957035243511, 0.041227009147405624, 0.016213584691286087, 0.005417244508862495, 0.003643287578597665, -0.017002008855342865, -0.006326476577669382, 0.03202023729681969, -0.004450787790119648, 0.008303898386657238, -0.02280074916779995, -0.008653602562844753, 0.008475571870803833, -0.015628622844815254, -0.011457598768174648, -0.004841821268200874, 0.01194718573242426, -0.007305650040507317, -0.016582364216446877, -0.0026164273731410503, 0.0015545965870842338, 0.0031966192182153463, -0.0046606105752289295, -0.0055221556685864925, -0.02552208863198757, -0.02627236396074295, -0.0031091931741684675, 0.006491791922599077, 0.010497501119971275, -0.008456496521830559, -0.012735610827803612, 0.01751067116856575, 0.005388632416725159, -0.017790434882044792, 0.003840393852442503, 0.00

In [None]:
response = client.embeddings.create(
    input="The dog is lying on the rug",
    model="text-embedding-ada-002"
)

In [None]:
dog_embeddings = response.data[0].embedding

In [None]:
print(dog_embeddings)

[-0.009161601774394512, 0.01040723081678152, -0.005988361779600382, -0.006745081394910812, -0.018609698861837387, 0.019805502146482468, -0.004618169739842415, -0.028624556958675385, -0.008264749310910702, -0.013552444986999035, 0.01684090495109558, 0.007585881277918816, 0.003104730509221554, 0.014798074029386044, -0.012450062669813633, -0.009292392991483212, 0.030966339632868767, 0.010307581163942814, 0.013166300021111965, -0.012493659742176533, -0.005885597318410873, 0.004836155101656914, -0.020490597933530807, -0.013739288784563541, -0.0013234809739515185, -2.438951560179703e-05, 0.012219621799886227, 0.0005177145940251648, -0.0024211916606873274, -0.004123032558709383, -0.00021642805950250477, 0.0055679623037576675, -0.019556377083063126, -0.02715471386909485, -0.005518137011677027, -0.016093527898192406, 0.01370192039757967, 0.007654390763491392, -0.010550478473305702, -0.027329102158546448, 0.017301788553595543, 0.012362868525087833, -0.00893738865852356, -0.015981420874595642, -0

**Compare the vectors**

Vectors need to be the same length for the comparison

In [None]:
len(cat_embeddings)

1536

In [None]:
len(dog_embeddings)

1536

**Cosine similarity **is a measure of similarity between two non-zero vectors. The value can be between 0 and 1; the closer the value is to 1, the more similar the vectors are.

In [None]:
import numpy as np
from numpy.linalg import norm

# compute cosine similarity
cosine = np.dot(cat_embeddings,dog_embeddings)/(norm(cat_embeddings)*norm(dog_embeddings))
print("Cosine Similarity:", cosine)

Cosine Similarity: 0.8867060539070469


## Whisper APIs

Import the necessary audio libraries

In [None]:
from IPython.display import Audio

In [None]:
# fix when using Chrome browser to manually change the MIME type of .m4a files
# since the 'type' attr of audio tag is 'audio/m4a', which is not correctly recognized by Chrome
import mimetypes

mimetypes.init()
mimetypes.add_type('audio/mp4', '.m4a')

**Speech to text** example

In [None]:
Audio("LinkedIn-Learning.m4a", autoplay=True)

In [None]:
file = open("LinkedIn-Learning.m4a", "rb")
transcription = client.audio.transcriptions.create(
  model="whisper-1",
  file=file
)

print(transcription)

Transcription(text="If you're enjoying this course, check out my other courses, Artificial Intelligence Foundations, Machine Learning, or Programming Foundations, APIs, and Web Services.")


**Speech to text + Translation** example

In [None]:
Audio("LinkedIn-Learning-IT.m4a", autoplay=True)

In [None]:
file = open("LinkedIn-Learning-IT.m4a", "rb")

transcription = client.audio.translations.create(
  model="whisper-1",
  file=file
)

print(transcription)

Translation(text='If you like this course, take a look at my other courses Artificial Intelligence Foundations, Automatic Learning or Programming Foundations, APIs and Web Services')


## Image/Dalee API

Generate Images using Image Generation

**Create Image**

In [None]:
from IPython.display import Image

response = client.images.generate(
  model="dall-e-2",
  prompt="a rainbow with a pot of gold",
  size="256x256",
  quality="standard",
  n=1, #select the number of images you want generated
)

image_url = response.data[0].url

print(image_url)

Image(url=image_url)

https://oaidalleapiprodscus.blob.core.windows.net/private/org-RZLvEijW4GW0KmC3rLIAjZlu/user-GjAVqpM2XyDru7SeUyCqCIh7/img-jkl2LBgX3tU8bLo5TUGi6DNz.png?st=2023-11-12T02%3A43%3A16Z&se=2023-11-12T04%3A43%3A16Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-11-11T21%3A41%3A56Z&ske=2023-11-12T21%3A41%3A56Z&sks=b&skv=2021-08-06&sig=SPZtJIVi/C9ZyOEHnWJ2ryHo2uEaNTsmeBmDy07HVM8%3D


**Edit an Image**

In [None]:
Image(url="hawaii.png") #original image

In [None]:
from PIL import Image

# resize original image - mask size must match image size
image = Image.open("hawaii.png")
hawaii_resized = image.resize((1024, 1024))
hawaii_resized.save("hawaii_1024.png")


# edit the video to include a beach chair w/ umbrella
response = client.images.edit(
  model="dall-e-2",
  image=open("hawaii_1024.png", "rb"),
  mask=open("mask.png", "rb"), # Edit requires a "mask" to specify which portion of the image to regenerate
                               # This mask covers the bottom half of an image
  prompt="A beach chair with an umbrella",
  n=1,
  size="1024x1024"
)
image_url = response.data[0].url

print(image_url)

https://oaidalleapiprodscus.blob.core.windows.net/private/org-RZLvEijW4GW0KmC3rLIAjZlu/user-GjAVqpM2XyDru7SeUyCqCIh7/img-DYYLiuvKTquYBODmOhKqM3CS.png?st=2023-11-12T02%3A43%3A29Z&se=2023-11-12T04%3A43%3A29Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-11-11T21%3A56%3A38Z&ske=2023-11-12T21%3A56%3A38Z&sks=b&skv=2021-08-06&sig=39o/2NZERRhgwM1j5NXwTMHptYNBAhlfrdPDmAQ/7Ag%3D


In [None]:
from IPython.display import Image

Image(url=image_url) #display edited image

**Create Image Variation**

Uploaded image must be a PNG and less than 4 MB.

In [None]:
Image(url="hawaii.png") #original image

In [None]:
response = client.images.create_variation(
  image=open("hawaii.png", "rb"),
  n=1,
  size="1024x1024"
)

image_url = response.data[0].url

print(image_url)

Image(url=image_url) #edited image

https://oaidalleapiprodscus.blob.core.windows.net/private/org-RZLvEijW4GW0KmC3rLIAjZlu/user-GjAVqpM2XyDru7SeUyCqCIh7/img-nCuENY3472TluAhT04OMwNpU.png?st=2023-11-12T02%3A43%3A41Z&se=2023-11-12T04%3A43%3A41Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-11-12T01%3A26%3A41Z&ske=2023-11-13T01%3A26%3A41Z&sks=b&skv=2021-08-06&sig=g%2BPSVXBWxKCo63DjAubgRBMwxBB7owCvaIJKj/xHjs8%3D


## Finetune Model APIs

### Create syntetic data

**Generate syntetic data** for the model to be trained on
* the model will get as input : locations, alien_types, hero_goals
* the model putput: movie script based on input data   

* In the real world, you’ll have your data already. In this example, I'm generating synthetic data to walk through the fine-tuning process

In [None]:
import pandas as pd
import random

# lists to hold the random prompt values
locations = ['the moon', 'a space ship', 'in outer space']
alien_types = ["Grey","Reptilian","Nordic","Shape shifting"]
hero_goals = ["save the Earth", "destroy the alien home planet", "save the human race"]

# prompt template to be completed using values from the lists above
prompt = ''' Imagine the plot for a new science fiction movie. The location is {location}. Humans
               are fighting the {alien_type} aliens. The hero of the movie intends to {hero_goal}.
               Write the movie plot in 50 words or less. '''

sub_prompt = "{location}, {alien_type}, {hero_goal}"

df = pd.DataFrame()

# To fine-tune a model, you are required to provide at least 10 examples.
# You'll see improvements from fine-tuning on 50 to 100 training examples
for i in range(100):

    # retrieve random numbers based on the length of the lists
    location = random.randint(0,len(locations)-1)
    alien_type = random.randint(0,len(alien_types)-1)
    hero_goal = random.randint(0,len(hero_goals)-1)

    # use the prompt template and fill in the values
    model_prompt = prompt.format(location=locations[location], alien_type=alien_types[alien_type],
                           hero_goal=hero_goals[hero_goal])

    # track the values used to fill in the template
    model_sub_prompt = sub_prompt.format(location=locations[location], alien_type=alien_types[alien_type],
                           hero_goal=hero_goals[hero_goal])

    # retrieve a model generated movie plot based on the input prompt
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
           {"role": "system", "content": '''You help write movie scripts.'''},
           {"role": "user", "content": model_prompt}
        ],
        temperature=1,
        max_tokens=500,
        top_p=1,
        frequency_penalty=0,
        presence_penalty=0
    )

    # retrieve the finish reason for the model
    finish_reason = response.choices[0].finish_reason

    # retrieve the response
    response_txt = response.choices[0].message.content

    # add response, prompt, etc. to a DataFrame
    new_row = {
        'location'
        'alien_type'
        'hero_goal'
        'prompt':model_prompt,
        'sub_prompt':model_sub_prompt,
        'response_txt':response_txt,
        'finish_reason':finish_reason}

    new_row = pd.DataFrame([new_row])

    df = pd.concat([df, new_row], axis=0, ignore_index=True)

#save DataFrame to a CSV
df.to_csv("science_fiction_plots.csv")

Read CSV into a DataFrame

In [None]:
df = pd.read_csv("science_fiction_plots.csv")

df

Unnamed: 0.1,Unnamed: 0,locationalien_typehero_goalprompt,sub_prompt,response_txt,finish_reason
0,0,Imagine the plot for a new science fiction mo...,"a space ship, Grey, save the Earth",\n\nThe space ship is in danger. Earth is bein...,stop
1,1,Imagine the plot for a new science fiction mo...,"a space ship, Shape shifting, save the Earth",\n\nThe crew of a space ship find themselves i...,stop
2,2,Imagine the plot for a new science fiction mo...,"the moon, Nordic, save the human race",\n\nThe movie follows a brave human astronaut ...,stop
3,3,Imagine the plot for a new science fiction mo...,"the moon, Shape shifting, save the Earth",\n\nThe world is under attack from the shape-s...,stop
4,4,Imagine the plot for a new science fiction mo...,"a space ship, Grey, save the Earth",\n\nThe crew of a space ship is sent on a miss...,stop
...,...,...,...,...,...
95,95,Imagine the plot for a new science fiction mo...,"the moon, Nordic, save the human race","\n\nOn the moon, humanity is fighting a losing...",stop
96,96,Imagine the plot for a new science fiction mo...,"in outer space, Nordic, save the human race",\n\nA team of brave humans led by an unlikely ...,stop
97,97,Imagine the plot for a new science fiction mo...,"the moon, Nordic, destroy the alien home planet",\n\nWhen a Nordic alien invades the moon and c...,stop
98,98,Imagine the plot for a new science fiction mo...,"a space ship, Reptilian, destroy the alien hom...","\n\nAfter generations of fighting, the humans ...",stop


In [None]:
# remove special characters from the response
df['response_txt'] = df['response_txt'].str.replace('\n', '', regex=True)
df

Unnamed: 0.1,Unnamed: 0,locationalien_typehero_goalprompt,sub_prompt,response_txt,finish_reason
0,0,Imagine the plot for a new science fiction mo...,"a space ship, Grey, save the Earth",The space ship is in danger. Earth is being at...,stop
1,1,Imagine the plot for a new science fiction mo...,"a space ship, Shape shifting, save the Earth",The crew of a space ship find themselves in a ...,stop
2,2,Imagine the plot for a new science fiction mo...,"the moon, Nordic, save the human race",The movie follows a brave human astronaut and ...,stop
3,3,Imagine the plot for a new science fiction mo...,"the moon, Shape shifting, save the Earth",The world is under attack from the shape-shift...,stop
4,4,Imagine the plot for a new science fiction mo...,"a space ship, Grey, save the Earth",The crew of a space ship is sent on a mission ...,stop
...,...,...,...,...,...
95,95,Imagine the plot for a new science fiction mo...,"the moon, Nordic, save the human race","On the moon, humanity is fighting a losing bat...",stop
96,96,Imagine the plot for a new science fiction mo...,"in outer space, Nordic, save the human race",A team of brave humans led by an unlikely hero...,stop
97,97,Imagine the plot for a new science fiction mo...,"the moon, Nordic, destroy the alien home planet",When a Nordic alien invades the moon and captu...,stop
98,98,Imagine the plot for a new science fiction mo...,"a space ship, Reptilian, destroy the alien hom...","After generations of fighting, the humans have...",stop


To perform fine-tuning, it is necessary to provide GPT with examples of what the user might type and the corresponding desired response. The **sub_prompt column contains the exampls of input and response_txt** **contains a sample of a desired response**.

In [None]:
# retrieve only the sub_prompt and response_txt columns from the DataFrame into a new DataFrame
training_data = df.loc[:,['sub_prompt','response_txt']]

# rename columns sub_prompt->prompt and response_txt->completion
training_data.rename(columns={'sub_prompt':'prompt', 'response_txt':'completion'}, inplace=True)

# convert DataFrame to CSV file
training_data.to_csv('training_data.csv',index=False)

In [None]:
training_data

Unnamed: 0,prompt,completion
0,"a space ship, Grey, save the Earth",The space ship is in danger. Earth is being at...
1,"a space ship, Shape shifting, save the Earth",The crew of a space ship find themselves in a ...
2,"the moon, Nordic, save the human race",The movie follows a brave human astronaut and ...
3,"the moon, Shape shifting, save the Earth",The world is under attack from the shape-shift...
4,"a space ship, Grey, save the Earth",The crew of a space ship is sent on a mission ...
...,...,...
95,"the moon, Nordic, save the human race","On the moon, humanity is fighting a losing bat..."
96,"in outer space, Nordic, save the human race",A team of brave humans led by an unlikely hero...
97,"the moon, Nordic, destroy the alien home planet",When a Nordic alien invades the moon and captu...
98,"a space ship, Reptilian, destroy the alien hom...","After generations of fighting, the humans have..."


Prepare data using OpenAI's CLI data preparation tool

* Execute via a Terminal window -- not in Jupter notebook due to Y/N prompts

* The data must be a **JSONL** document, where each line is a prompt-completion pair corresponding to a training example. OpenAI's CLI data preparation tool can be used to convert the CSV data into the required format.

<b>Steps</b>
<hr>
1. Go to a Terminal window <br>
2. Change to the directory where your Jupyter Notebook files are stored <br>
3. Type the command: <i>openai tools fine_tunes.prepare_data -f training_data.csv</i> <br>
4. Type 'Y' to all prompts <br>
5. Come back to the Jupyter Notebook and execute the code in the next cell. <br>

### Create fine-tuned model
The fine-tuning of the GPT-3 model is executed. In a few minutes, the fine-tuned model is ready to use

#### Files API
used to uppload data to our openAI env

In [None]:
# Once you have the data validated, the file needs to be uploaded using the
# Files API in order to be used with a fine-tuning jobs

client.files.create(
  file=open("training_data_prepared.jsonl", "rb"),
  purpose="fine-tune"
)

FileObject(id='file-AG9BtuDOnsOVyMidfQrPjzBx', bytes=45650, created_at=1699793637, filename='training_data_prepared.jsonl', object='file', purpose='fine-tune', status='uploaded', status_details=None)

#### Fine tuning API
used to create aour fine tuning job
input: model, data (loaded in previous step)

In [None]:
# Start the fine-tuning job
# After you've started a fine-tuning job, it may take some time to complete. Your job may be queued
# behind other jobs and training a model can take minutes or hours depending on the
# model and dataset size.

client.fine_tuning.jobs.create(
  training_file="file-AG9BtuDOnsOVyMidfQrPjzBx", #use the returned id from the FileObject to start the job
  model="babbage-002"
)

FineTuningJob(id='ftjob-dAAFjBGhui1TsEINMLCi13XA', created_at=1699794278, error=None, fine_tuned_model=None, finished_at=None, hyperparameters=Hyperparameters(n_epochs='auto', batch_size='auto', learning_rate_multiplier='auto'), model='babbage-002', object='fine_tuning.job', organization_id='org-RZLvEijW4GW0KmC3rLIAjZlu', result_files=[], status='validating_files', trained_tokens=None, training_file='file-AG9BtuDOnsOVyMidfQrPjzBx', validation_file=None)

In [None]:
# Retrieve job status

# Retrieve the state of a fine-tune
# Status field can contain: running or succeeded or failed, etc.
client.fine_tuning.jobs.retrieve("ftjob-dAAFjBGhui1TsEINMLCi13XA")

FineTuningJob(id='ftjob-dAAFjBGhui1TsEINMLCi13XA', created_at=1699794278, error=None, fine_tuned_model='ft:babbage-002:keysoft::8K4TIbHI', finished_at=1699794439, hyperparameters=Hyperparameters(n_epochs=3, batch_size=1, learning_rate_multiplier=2), model='babbage-002', object='fine_tuning.job', organization_id='org-RZLvEijW4GW0KmC3rLIAjZlu', result_files=['file-9JtVIisnltdBNOKyBuKYgCSh'], status='succeeded', trained_tokens=26517, training_file='file-AG9BtuDOnsOVyMidfQrPjzBx', validation_file=None)

#### Use a fine-tuned model

Retrieve the **name of the fine-tuned model** from above and use the model to generate a movie plot. When a job has succeeded, you will see the fine_tuned_model field populated with the name of the model when you retrieve the job details.

In [None]:
response = client.completions.create(
    model="ft:babbage-002:keysoft::8K4TIbHI", #name of the fine tuned model from FineTuningJob
    prompt="the moon, Nordic, destroy the alien home planet", #prompt to generate a movie plot
    max_tokens=200,
    temperature=1
)

print(response.model_dump_json(indent=2))

{
  "id": "cmpl-8K4XnKjli0igv9EC6m0VJ51jg177w",
  "choices": [
    {
      "finish_reason": "length",
      "index": 0,
      "logprobs": null,
      "text": ", the Mall of Miracles, and the hero's loyal sidekick. Together they fight the forces of evil to save humanity from extinction. The final climactic battle is resolved, and the hero triumphs, but at a great personal cost.\n\nImaginary worlds, imaginary people. Humans inhabit the distant reaches of space - on the moon, under the icy reaches of the Arctic, in the jungles of the Amazon or deep within the depths of the ocean. In a never-ending battle to save humanity, a bold hero embarks on a mission to reach the epicenter of the Nordic aliens and bring them to justice. But with great power comes great responsibility, and this brave man risks everything for that victory.\nAt the centre of the action is the hero's undefined rival, a selfless adversary determined to undermine the hero's honour and ultimately bring him to his knees. As t

# Chat GPT based project

### Chatbot - webpage summary

building a multi-turn conversation with an AI assistant powered by OpenAI's GPT Chat Completion API

Building a charbot to summarize wabsite
1. Define model
2. content object - to keep our prompt history (append aech Q&A to hystory)
3. funtion to call the chatbot - calls the chat completion API
4. function to read user input and get the output
  * reads user Q from input prompt
  * appends to hystory
  * passes to model - calling cat completion API
  * prints the output


In [None]:
client = OpenAI(
  api_key=os.environ['OPENAI_API_KEY']
)

In [None]:
MODEL = "gpt-3.5-turbo"

#sets the persona for the AI assistant using a system message
context = [{'role':'system', 'content':"""You are a friendly AI assistant that
                                              helps compose professional-sounding tweets
                                              for Twitter that often go viral based on a
                                              website I provide. You will provide a summary
                                              of the website in 30 words or less."""
            }]

In [None]:
# Each interation with the AI assistant is a new session so the entire chat/message history,
# including user prompts and assistant responses must be included in each exchange with the
# model/assistant so that it "remembers"

#This is called Prompt Chaining

def collect_messages(role, message): #keeps track of the message exchange between user and assistant
    context.append({'role': role, 'content':f"{message}"})

In [None]:
#Sends the prompts to the model for a completion/response

def get_completion(temperature=0):
    try:
        response = client.chat.completions.create(
            model=MODEL,
            messages=context,
            temperature=temperature,
        )

        print("\n Assistant: ", response.choices[0].message.content, "\n")

        return response.choices[0].message.content
    except openai.APIError as e:
        print(e.http_status)
        print(e.error)
        return e.error

In [None]:
#Start the conversation between the user and the AI assistant/chatbot

while True:
    collect_messages('assistant', get_completion()) #stores the response from the AI assistant

    user_prompt = input('User: ') #input box for entering prompt

    print("\n User: ", user_prompt, "\n")

    if user_prompt == 'exit': #end the conversation with the AI assistant
        print("\n Goodbye")
        break

    collect_messages('user', user_prompt) #stores the user prompt



 Assistant:  "Scikit-learn: popular machine learning library for Python. Offers tools for data analysis, modeling, and predictive analytics. #MachineLearning #Python" 


 User:  summarize https://scikit-learn.org/stable/ in 15 words 


 Assistant:  "Scikit-learn: Python ML library for data analysis, modeling, and predictive analytics. #MachineLearning #Python" 


 User:  summerize it for 10 years old kid 30 words 


 Assistant:  "Scikit-learn is like a magic tool that helps computers learn and make smart decisions. It's like a brain booster for machines to understand things better and help people. Cool, right?" 


 User:  exit 


 Goodbye


## Summary to Image
1. modify function to get user input and provide output to store the website summary in a variable
2. define function to call Image API based on summary provided
3. call function to generate image based on summary gerenared step 1

Creating a function to build an image based on a website summary provided

In [None]:
# Create images from scratch based on the website summary

def generate_image(summary):
    print(summary)

    try:
        response = client.images.generate(
          model="dall-e-3",
          prompt=summary,
          size="1024x1024",
          quality="standard",
          n=1, #select the number of images you want generated
        )

        image_url = response.data[0].url #URLs will expire after an hour

        return image_url
    except openai.APIError as e:
        print(e.http_status)
        print(e.error)
        return e.error

store website summary (Chat Completion responce) in a image_summary variable

In [None]:
#Start the conversation between the user and the AI assistant/chatbot

while True:
    image_summary = get_completion() #stores the response from the AI assistant

    user_prompt = input('User: ') #input box for entering prompt

    print("\n User: ", user_prompt, "\n")

    if user_prompt == 'exit': #end the conversation with the AI assistant
        print("\n Goodbye")
        break

    collect_messages('user', user_prompt) #stores the user prompt


 Assistant:  Sure, I'd be happy to help! Please provide me with the website you'd like me to summarize for a tweet. 


 User:  summarize https://www.nationalgeographic.com/ in 15 words 


 Assistant:  Exploring the wonders of our world through stunning photography, in-depth articles, and captivating storytelling. 


 User:  exit 


 Goodbye


Generate and display image based on suummary generated earlier

In [None]:
imageURL = generate_image(image_summary)

Exploring the wonders of our world through stunning photography, in-depth articles, and captivating storytelling.


In [None]:
from IPython.display import Image
Image(url=imageURL)

## X APIs - tweet image + summary

In [None]:
pip install tweepy

Collecting tweepy
  Downloading tweepy-4.14.0-py3-none-any.whl.metadata (3.8 kB)
Collecting oauthlib<4,>=3.2.0 (from tweepy)
  Downloading oauthlib-3.2.2-py3-none-any.whl.metadata (7.5 kB)
Collecting requests-oauthlib<2,>=1.2.0 (from tweepy)
  Downloading requests_oauthlib-1.3.1-py2.py3-none-any.whl.metadata (10 kB)
Downloading tweepy-4.14.0-py3-none-any.whl (98 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m98.5/98.5 kB[0m [31m3.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading oauthlib-3.2.2-py3-none-any.whl (151 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m151.7/151.7 kB[0m [31m5.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading requests_oauthlib-1.3.1-py2.py3-none-any.whl (23 kB)
Installing collected packages: oauthlib, requests-oauthlib, tweepy
Successfully installed oauthlib-3.2.2 requests-oauthlib-1.3.1 tweepy-4.14.0
Note: you may need to restart the kernel to use updated packages.


Tweepy - to call X APIs
Requests - to call HTTP requests form Python

In [None]:
import tweepy
import requests

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

# Authenticate to Twitter API
consumer_key = os.getenv("API_KEY")
consumer_secret = os.getenv("API_SECRET_KEY")
access_token = os.getenv("ACCESS_TOKEN")
access_token_secret = os.getenv("ACCESS_TOKEN_SECRET_KEY")

#download image to notebook
def download_image(imageURL):
    print("downloading - ", imageURL)

    img_data = requests.get(imageURL).content
    with open('dalle_image.jpg', 'wb') as handler:
        handler.write(img_data)

    return "dalle_image.jpg"

#upload image media using V1 of Twitter API
def upload_image(image):
    auth = tweepy.OAuth1UserHandler(
       consumer_key, # authenticating
       consumer_secret,
       access_token,
       access_token_secret
    )

    api = tweepy.API(auth)
    media = api.media_upload(filename=image)

    return media

#send the tweet using V2 of the Twitter API
def send_tweet(summary, image):
    client = tweepy.Client(
        consumer_key=consumer_key, # authenticating
        consumer_secret=consumer_secret,
        access_token=access_token,
        access_token_secret=access_token_secret
    )

    #upload image to Twitter servers and get the media metadata
    media = upload_image(image)
    media_ids = [media.media_id]

    #send the tweet
    response = client.create_tweet(text=summary, media_ids=media_ids)

    print(f"https://twitter.com/user/status/{response.data['id']}")

In [None]:
#download image to the notebook
image_name = download_image(imageURL)

downloading -  https://oaidalleapiprodscus.blob.core.windows.net/private/org-y85pwkvP2H3Spp6kEp0E4ZVf/user-KNiIVmgwvzvTVYKME2XQpsnV/img-kzpcilWHAGIHD5nlbosVw81d.png?st=2024-07-18T18%3A50%3A20Z&se=2024-07-18T20%3A50%3A20Z&sp=r&sv=2023-11-03&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2024-07-18T13%3A15%3A30Z&ske=2024-07-19T13%3A15%3A30Z&sks=b&skv=2023-11-03&sig=uRGNt5lXNfaIdKTDJ54o5cZoeYj2l86URtnTvUuH9ic%3D


In [None]:
#send tweet
send_tweet(summary, image_name)

https://twitter.com/user/status/1723725825856479568
