
<div align="left">
    <h1>SISTEMAS COMPUTACIONAIS AVANÇADOS (SISTCA) ADVANCED COMPUTING SYSTEMS</h1>
    <h2>OpenAI</h2>
    <h4>Degree in Telecommunications and Informatics Engineering </h4>
    <h5>
    <img width="500" src="imgs/ISEP_logo.png"><br>
    2023 / 24</h5>
</div>

<table align="left">
    <thead>
        <tr>
            <th>Version</th>
            <th>Date</th>
            <th>Authors</th>
            <th>Update information</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>V1.0</td>
            <td>April 2024</td>
            <td style="text-align: left;">
                Patrícia Sousa (1210713), Carlos Alves (1211604), José Leal (1211066), Tiago Ribeiro (1210924)
                <br><br>
                <strong>Supervisor:</strong> Paula Viana (pmv)
            </td>
            <td>Original version</td>
        </tr>
    </tbody>
</table>




# 1. Introduction

## 1.1 Context

In this script we will present an API developed by openAI, one of the leading organisations in
artificial intelligence (AI) research and development. Launched in 2020, it represents a significant
milestone in accessing advanced AI technology. It offers an accessible and simplified interface for
developers to integrate AI capabilities into a variety of applications, from text analysis to content
generation.  

The OpenAI API is based on recent advances in machine learning, in particular, deep neural
network architectures, such as GPT (Generative Pre-trained Transformer) language models. These
models have revolutionised the way machines understand and generate natural language, enabling
tasks such as automatic translation, text summarising, question answering, and content generation.
GPT is trained with a vast amount of textual data from the internet, allowing it to capture the
nuances and complexities of human language. Through the pre-training process, the model learns
to represent knowledge in a general way, without the need for specific training for a particular task.

The OpenAI API leverages these pre-trained language models, providing a simple and effec-
tive interface for developers to take advantage of their capabilities. This means users can easily
integrate advanced AI capabilities into their applications, without needing to understand complex
implementation details or model training.  

Additionally, it is designed with a focus on security and ethics, including measures to prevent
malicious or harmful use of AI technology. This reflects OpenAI’s commitment to promoting
the responsible development of artificial intelligence and ensuring that its benefits are extremely
accessible and used for the good of society.  

## 1.2 Motivation

One of the main motivations for choosing this topic lies in the possibility of structured presenting
a language (and a platform) accessible to all students, providing them with the opportunity to work
and apply it in the future, as it is an increasingly used topic day to day. Whether in the SISTCA
curricular unit or in other contexts, the OpenAI API offers a versatile and robust environment to
explore the potential of artificial intelligence.  

When choosing this topic, the need to familiarise yourself not only with the fundamental con-
cepts of artificial intelligence, but also with the practical tools for its implementation, is recognised.  

This required not only mastering theoretical principles, but also exploring accessible programming
languages and platforms such as the OpenAI API.

## 1.3 Objectives

The main objective of this script is to explore the various features offered by the OpenAI API
and learn how to integrate them, promoting a comprehensive and practical understanding of the
platform’s capabilities, providing users with a solid foundation to explore and use the following
available tools:

- **Chat Completions**: To understand the generic implementation of a chat-based GPT assistant.
- **Assistants API + Tools**: Create a program that provides users with media suggestions such as books, films, or TV shows.
- **Embeddings**: Learn how to represent words, sentences, paragraphs, or entire documents in a continuous vector space.
- **Vision**: Use Vision functionality to extract information from an image.
- **Image Generation/DALL-E**: Learn how to generate AI-created images.
- **TTS (Text to Speech)**: Turn strings into robotized speech using AI.
- **Whisper**: Capture audio and translate it into different languages.
- **Moderation**: Explore OpenAI's Moderation functionality to detect inappropriate content.


In addition, two practical exercises will be developed to apply the acquired knowledge and a
challenge will be proposed to test the reader’s understanding and ability to use the OpenAI API.

## 1.4 Document Structure

This tutorial is organised into six chapters with the following structure: Introduction, Theo-
retical, Setup/Installation, Tutorial/Functionality, Exercises and a Challenge.
In the Introduction, some concepts about the OpenAI API will be presented that will be
deepened throughout the script.  

The theoretical part, will introduce the State-of-the-art and detail the API’s Features.
Then, the third chapter focuses on Setup/Installation, explaining how to create an open AI
account and set up a development environment.  

In the Tutorial/Functionality part, a detailed tutorial of the available features will be provided.
After all these topics, two exercises will be developed and their corresponding resolution will
be presented and a final challenge to test the knowledge acquired by the users.  


# 2 Theoretical (scientific/technological background)

Before diving into specific AI models it’s important to have a general understanding of how AI
generally works.  

Most AI solutions in natural language processing are based on broader Large language mod-
els (LLMs) these are algorithms that have been trained on vast amounts of data with the purpose
of understanding and generating human like text. LLMs make use of the transformer type of
architecture, a deep learning technique that, in short, represents text via numerical representations
known as tokens and gives them different weights so as to be able to contextualise words and find
similarities [1].  

One example of a transformer based model is the Generative pre-trained transformer
(GPT), an AI model that has been pre-trained on large sets of data via the use of the transformer
architecture for general purpose tasks. Furthermore, these models can be fine tuned to achieve
greater performance in more specific tasks, such is the case of image generation models [2].  

Lastly, in order to make use of these models it’s essential to understand the concept of prompts.
That is the name given to the textual inputs given to the model. These inputs are then broken
down into the aforementioned tokens via a process we call tokenization, which facilitates the use
of the models language structure.

## 2.1 State-of-the-art

AI, Artificial Intelligence, refers to a simulation of human intelligence in machines programmed
to mimic human cognitive processes and actions. The concept of AI is not new, it has been around
since the mid-20th century, but in the last few years there have been significant advances for AI.
With these advances, due to its potential, AI has become increasingly important across various
industries such as healthcare, finance, manufacturing, education, amongst others. Nowadays, one
of the most famous companies developing AI products is OpenAI, which we are going to base
our work on. However, just like any other business, OpenAI has its competitors. Some of them
currently only dispose of a chat bot, and even that, at the time, are not available in Portugal.
Listed below are some companies in the AI business.

| AI                                      | **OpenAI**                          | **X**     | **Anthropic**                 | **Deepmind**         | **Cohere**       |
|----------------------------------------|-------------------------------------|-----------|-------------------------------|----------------------|------------------|
| **ChatBot**                            | ChatGPT                             | Grok      | Claude                        | Gemini               | Coral            |
| **API**                                | Yes                                 | No        | Yes                           | Yes                  | Yes              |
| **Inputs**                             | Text, Audio, Image, Video          |           | Text                          | Text, Image          | Text             |
| **Outputs**                            | Text, Audio, Image, Video          |           | Text                          | Text                 | Text             |
| **ChatBot and API Availability in Portugal** | Both                                | None      | API Only                      | ChatBot Only         | Both             |


## OpenAI 2.2 Features

OpenAI’s API offers a vast array of cutting-edge AI models based on deep learning and natural
language processing techniques. These models have been trained on vast datasets and fine tuned
to fulfil a variety of tasks such as text and image generation, audio and text conversions, amongst
other things.

### 2.2.1 GPT

The GPT (Generative Pre-trained Transformer) series is OpenAI’s main set of large language
models. These models are trained to understand and generate natural language text based on
contextual inputs so as to better communicate with humans. The most widespread version, GPT-
3.5 currently serves as the model that powers the free version of ChatGPT. Its understanding
of human language allows for coherent conversations which makes it a suitable chat bot. GPT-4
improves upon its predecessor with a smarter and more knowledgeable model that provides greater
accuracy across various tasks. In particular, GPT-4 introduces Vision as a new feature, that enables
it to process image inputs, making it useful in a wider range of applications.


#### Function calling

The GPT based models are capable of calling previously specified functions in response to user
actions or prompts by calling external APIs to retrieve data or to automate procedures like sending
an email or extracting and sorting data from a document.

#### Assistants

The AI Assistants functionality leverages the use of the GPT models alongside function calling
and other tools like file retrieval and code interpreter to allow users to create custom assistants
that fulfil more specific tasks based on the provided instructions.

### 2.2.2  Other models

#### DALL-E

The DALL-E model is capable of generating images from natural-language text descriptions, as
well as modifying existing ones by feeding the model instructions from a text prompt.

#### TTS

The Text-To-Speech model is capable of converting text into a natural sounding speech. Note
that it currently only supports the English language.

#### Whisper

Whisper does the opposite of the TTS model: it takes an audio input and then transcribes it
into text. Unlike text to speech, Whisper is capable of understanding multiple languages, as such
it can be used to identify the input language and translate the contents of the speech into English.

#### Embeddings

Text embeddings are vectorial representations of strings of text, such as words or phrases. By
comparing two or more vectors we can infer their similarity. This mechanism is highly useful in
applications such as search engines or product recommendations due to it’s ability of evaluating
similarity between text strings.

#### Moderation

OpenAI’s Moderation model is designed to verify if a certain piece of text includes any content
that could be classified as hateful, violent, sexual, harmful or otherwise inappropriate. Whilst
OpenAI’s own use of the model aims to ensure that content complies with their usage policies [3],
this model is suitable for any application that aims to ensure a safe digital environment.

# 3 Setup/Installation

## 3.1 Creating an OpenAI Account

In this section,we will guide you through the process of setting up an OpenAI account. Whether
you’re a developer, researcher, or simply curious about AI, having your own account opens the
door to the vast possibilities of artificial intelligence.  

Firstly, navigate to the OpenAI website [3] to create or log into your account.

![OpenAI Website Access](imgs/setup_imgs/passo1.png)

Upon logging in with your email, you will be presented with two options: ChatGPT and API. Select the API option to access the documentation.

![Select the API Option](imgs/setup_imgs/passo7.png)

Congratulations! You have now successfully created an operational OpenAI account.

![OpenAI Account Creation Confirmation](imgs/setup_imgs/passo_8.png)

## 3.2 Setting up Your Development Environment

Setting up a proper development environment is crucial for working efficiently with AI appli-
cations. Ensure that you have the necessary tools and libraries installed on your system. For most
AI development tasks, Python is the recommended programming language due to its extensive
ecosystem of AI and machine learning libraries. So, that is exactly what we are going to help you
with in this sub-section.

### 3.2.1 Windows

The first step is accessing python’s official website and downloading it. In case you are not sure
if you have already installed Python in the past, just type "cmd" in your search-bar and then type
"python". If you are having trouble installing, maybe try checking Python’s beginners guide [8].

![OpenAI Account Creation Confirmation](imgs/setup_imgs/step1.png)

Once installed, you are going to create a virtual environment, as it is good practise to avoid
conflicts with other installed libraries.  

Insert one of the following command in your command prompt:

```
python -m venv openai-env

python3 -m venv openai-env
```

Now, after creating the virtual environment, you need to activate it:  

`openai-env\Scripts\Available at: activate`


After this step you should be able to see "openai-env" to the left of the cursor input section.  

Once you have Python installed and (optionally) set up a virtual environment, the OpenAI
Python library can be installed. From the command prompt, run:  

`pip install --upgrade openai`

Once this completes, running ’pip list’ will show you the Python libraries you have installed in
your current environment, which should confirm that the OpenAI Python library was successfully
installed.  

Now we are going to setup your API key. If you don’t have an API key yet then you’ll have to
follow the instruction in API section  

Open your command prompt and then insert the following command:  

`setx OPENAI_API_KEY "your-api-key-here`

In order to make this key setup permanent, you ought to access Environment Variables, for
that you just need to search for it in your windows search bar. Click on "New" and then set  

**"OPENAI_API_KEY"**  

as the variable name and your API key as the value.  

### 3.2.2 Linux

Firstly, open your terminal and introduce the following command in order to download python:
In case you are working on Debian or Ubuntu:  

`apt install python3 python3-dev`

In case you are working on Red Hat, CentOS, or Fedora:

`dnf install python3 python3-devel`

If you are having trouble installing, maybe try checking Python’s beginners guide [8].  

Once installed, you are going to create a virtual environment, as it is good practise to avoid
conflicts with other installed libraries.  

Insert one of the following command in your terminal:  

```
python -m venv openai-env

python3 -m venv openai-env
```

If you can’t use none of these commands above because of this error: "The virtual environment
was not created successfully because ensurepip is not available" then, try using the following
command (after this command you have to re-insert one of the commands above):

`sudo apt install python3.10-venv`

Now, after creating the virtual environment, you need to activate it:

`source openai-env/bin/activate`

After this step you should be able to see "openai-env" to the left of the cursor input section.  

Once you have Python installed and (optionally) set up a virtual environment, the OpenAI
Python library can be installed. From the terminal, run:  

`pip install --upgrade openai`

Once this completes, running ’pip list’ will show you the Python libraries you have installed in
your current environment, which should confirm that the OpenAI Python library was successfully
installed.  

Now we are going to setup your API key. If you don’t have an API key yet then you’ll have to
follow the instruction in API section.  

Go to OpenAI website and access the "API keys" section, there you are going to retrieve your
API key or create one in case you do not already have.  

Then, open your terminal and type the following command:  

`export OPENAI_API_KEY=’your-api-key-here’`

To save just press Ctrl+O.  
If you want to check it did get setup correctly, type  

`echo $OPENAI_API_KEY`

![OpenAI Account Creation Confirmation](imgs/setup_imgs/step2.png)


### 3.2.3 MacOS

Firstly install Brew, if not already installed:

`/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"`

Now that you already have Brew, open your terminal and introduce the following command in order to download python:

`brew install python`

If you are having trouble installing, maybe try checking Python’s beginners guide [8].  

Once installed, you are going to create a virtual environment, as it is good practise to avoid conflicts with other installed libraries.  

Insert one of the following command in your terminal:  

`pip install virtualenv`

Create venv:  

`virtualenv openai-env`

Now, after creating the virtual environment, you need to activate it:

`source openai-env/bin/activate`

After this step you should be able to see "openai-env" to the left of the cursor input section.

Once you have Python installed and (optionally) set up a virtual environment, the OpenAI Python library can be installed. From the terminal, run:

`pip install --upgrade openai`

    
Once this completes, running 'pip list' will show you the Python libraries you have installed in your current environment, which should confirm that the OpenAI Python library was successfully installed.

Now we are going to setup your API key. If you don't have an API key yet then you'll have to follow the instruction in API section:

Go to OpenAI website and access the "API keys" section, there you are going to retrieve your API key or create one in case you do not already have.

Then, open your terminal and type the following command:

`export OPENAI_API_KEY='your-api-key-here'`
        
To save just press Ctrl+O.
    
If you want to check it did get setup correctly, type 

`echo $OPENAI_API_KEY`


# 4 Tutorial/Functionality


For all tutorials you are going to need to import OpenAI library for python.

In [5]:
from openai import OpenAI

To use this library you need to instantiate a client. To do this you are going to need a key. If no key is indicated in the constructor, OpenAI will default to the environment variable "OPEN_AI_KEY" value.

In [None]:
# API_KEY -> is retrieve from the environment variable
client = OpenAI()

# API_KEY -> is indicated by the user at the moment of creation
client = OpenAI(api_key="{YOUR_OPEN_AI_KEY}")

For good practises, we advise that you use a .env file to store your private information, like your key.  

In [None]:
import os
from dotenv import load_dotenv

load_dotenv()
api_key = os.getenv("OPENAI_API_KEY")

## 4.1 Chat Completion

Chat Completion, probably one of the simplest OpenAI´s capabilities, takes a list of messages as input and returns an answer.

**1 - First import and create an OpenAI client**


In [None]:
from openai import OpenAI
client = OpenAI()

**2 - Structure our response, passing the gpt model and a few messages to give a brief context to our chatbot.**

In [None]:

response = client.chat.completions.create(
  model="gpt-3.5-turbo",
  messages=[
    {"role": "system", "content": "You are a helpful football assistant."},
    {"role": "user", "content": "Who won the Euro back in 2016?"},
    {"role": "assistant", "content": "Portugal won the World Cup in 2016."},
    {"role": "user", "content": "Where was it played, what was the score of the final and who scored in that game?"}
  ]
)

print(response.choices[0].message.content)



## 4.2 Assistants

<!-- ## 4.3 Embeddings -->

OpenAI’s text embeddings transforms strings to numbers, that allows to measure the relatedness of text strings. Embeddings are commonly used for:

* Search (where results are ranked by relevance to a query string)
* Clustering (where text strings are grouped by similarity)
* Recommendations (where items with related text strings are recommended)
* Anomaly detection (where outliers with little relatedness are identified)
* Diversity measurement (where similarity distributions are analyzed)
* Classification (where text strings are classified by their most similar label)

An embedding is a vector (list) of floating point numbers. The distance between two vectors measures their relatedness. Small distances suggest high relatedness and large distances suggest low relatedness.


Like mentioned before, Embeddings have a lot of uses but this tutorial will only focus on how to make the request and show simple method to compare two strings.

**Models**

Right now there are three available models for Embeddings. The ones with "-3" on the name are third generation models.

* text-embedding-3-small	
* text-embedding-3-large
* text-embedding-ada-002

**1 - Import the necessary libraries and start the client**

In [None]:
from openai import OpenAI
client = OpenAI()

**2 - Retrieve the embeddings**

*model* - changes the model use to retieve information  
*input* - string message you want to retrieve the embeddings from

In [None]:
response1 = client.embeddings.create(
    input="We are testing to see if this string has any similarities to another one.",
    model="text-embedding-3-small"
)

embeddings1 = response1.data[0].embedding

print(f"Embeddings Example - {embeddings1[0]}, {embeddings1[1]}")

### Compare two strings using embeddings and cosine similarity

**1 - Import numpy and cosine_similarity**


In [None]:
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

**2 - Retrieve the embeddings for the second string**

In [None]:
response2 = client.embeddings.create(
    input="We're experimenting to determine if this string bears resemblance to another.",
    model="text-embedding-3-small"
)

embeddings2 = response2.data[0].embedding

**3 - Convert the embeddings to numpy arrays**

In [None]:
embeddings1 = np.array(embeddings1).reshape(1, -1)
embeddings2 = np.array(embeddings2).reshape(1, -1)

**4 - Calculate the similarity**

The closer the score is to 1 the similar it is.

In [None]:
similarity_score = cosine_similarity(embeddings1, embeddings2)

final_score = float(format(similarity_score[0][0], ".2f"))

print(f"Similarity: {final_score}")

## 4.4  Vision

Vision API from OpenAI, based on GPT-4 Turbo, is an enormous jump in AI capability. This functionality brought AI the capability to take pictures as input, comprehend them and answer questions about them. This development guarantees to revolutionize the way we interact and use AI. 

Vision can accept images through links or even by passing its base64 encoded image. Unfortunately, this feature is only available to ChatGPT-4, so we will be addressing this tutorial at the end of our article as an extra.

Firstly, in order to start using this functionality, you need to import the necessary packages. In this case, we only need to import the `openai`.

In [None]:
from openai import OpenAI

Now it´s time to create a `client` in order to request the `response` from `openai`.

In [None]:
client = OpenAI()

Last but not least, the `response` is structered, passing the role, content and the image.



In [None]:
response = client.chat.completions.create(
  model="gpt-4-turbo",
  messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What’s in this image?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
          },
        },
      ],
    }
  ],
  max_tokens=300,
)

print(response.choices[0].message.content)

## 4.5. Image Generation (DALL-E)



The open AI image API provides three methods for interacting with images, namely creating images from scratch using a text prompt (DALL-E 3 and DALL-E 2), creating edited versions of images by having the model change some areas of a pre-existing image based on a new text prompt (DALL-E 2 only) and creating variations of an existing image (DALL-E 2 only).

This guide covers the basics of the API methods with useful code examples. 



### 4.5.1 Image Generation 

The image generation method allows you to create an original image with a text prompt.

Initially, it is necessary to import the display and Image functions from the IPython.display module, in order to display the image inside a Jupyter notebook.

It also imports os, used for operating system operations, as explained earlier for creating a venv environment, and imports the OpenAI class from the openai module, an interface for using the API.

The code creates an instance of the OpenAI() client.
A request is made to the API to generate an image, with the parameters:

* __model :__ specifies the model used to generate the image, in this case "dall-e-2", which specialises in generating images based on textual descriptions;
* __prompt :__ descriptive text that will serve as input for the model to generate the image, in this example, "A cat inside a car";
* __size :__ specifies the size, 1024x1024 pixels;
* __quality :__  sets the quality, "standard". When using DALL-E 3 it is possible to set quality: "hd", i.e. fine detail. However, standard quality square images are generated more quickly;
* __n :__ defines the number of images generated.

After the API generates the image, the image URL is extracted from the response and stored in the image_url variable.
Finally, the image URL is printed and the image is displayed using the display function with the Image class, passing the URL as a parameter and setting the image width to 500 pixels.











In [None]:
from IPython.display import display, Image

from openai import OpenAI

client = OpenAI()

response = client.images.generate(
  model="dall-e-2",
  prompt="A cat inside a car",
  size="1024x1024",
  quality="standard",
  n=1,
)

image_url = response.data[0].url
print(image_url)
display(Image(url=image_url, width=400))

### 4.5.2 Edits (DALLE 2 only)

Also known as "inpainting", the image editing endpoint allows you to edit an image by loading an image and a mask indicating the areas that should be replaced, you can use tools such as [GIMP](https://www.gimp.org/) or [Photoshop](https://www.adobe.com/pt/products/photoshop/landpa.html?gclid=CjwKCAjw57exBhAsEiwAaIxaZrgKDCM6FAEGOCavdUMdwMkJnL6o9cWGXjdYdjYN1DoN5HeWLj6-nBoCTjgQAvD_BwE&mv=search&s_kwcid=AL!3085!3!340859421278!e!!g!!adobe%20photoshop!1447265685!53212492301&mv=search&mv2=paidsearch&sdid=2XBSBWBF&ef_id=CjwKCAjw57exBhAsEiwAaIxaZrgKDCM6FAEGOCavdUMdwMkJnL6o9cWGXjdYdjYN1DoN5HeWLj6-nBoCTjgQAvD_BwE:G:s&s_kwcid=AL!3085!3!340859421278!e!!g!!adobe%20photoshop!1447265685!53212492301&gad_source=1), to erase a certain area of the image. 
The transparent areas of the mask indicate where the image should be edited, and the prompt should describe the complete new image, not just the deleted area.

The image and mask sent must be square PNG images of less than 4 MB and must also have the same dimensions. The non-transparent areas of the mask are not used to generate the output, so they don't necessarily have to match the original image as in the following example, which shows the original image, the mask and the resulting image after the editing process.


<!DOCTYPE html>
<html lang="pt">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Imagens</title>
<style>
    figure {
        float: left;
        margin-right: 20px; /* Espaçamento entre as imagens */
    }
</style>
</head>
<body>
<figure>
    <img src="Tutorials/DALLE/original.png" width="350" height="350" alt="">
    <figcaption>Original</figcaption>
</figure>

<figure>
    <img src="Tutorials/DALLE/mask.png" width="350" height="350" alt="">
    <figcaption>Mask</figcaption>
</figure>

<figure>
    <img src="Tutorials/DALLE/output.png" width="350" height="350" alt="">
    <figcaption>Output</figcaption>
</figure>

</body>
</html>

In [None]:
from openai import OpenAI
client = OpenAI()
from IPython.display import display, Image

response = client.images.edit(
  model="dall-e-2",
  image=open("Tutorials/DALLE/original.png", "rb"),
  mask=open("Tutorials/DALLE/mask.png", "rb"),
  prompt="A hand holding a sandwich",
  n=1,
  size="1024x1024"
)
image_url = response.data[0].url

print(image_url)
display(Image(url=image_url, width=400))

### 4.5.3 Variations (DALL·E 2 only)

The image variations endpoint allows you to generate a variation of a given image.

Similar to the edits endpoint, the input image must be a square PNG image less than 4MB in size.


In [None]:
from IPython.display import display, Image
from openai import OpenAI
client = OpenAI()

response = client.images.create_variation(
  model="dall-e-2",
  image=open("Tutorials/DALLE/original.png", "rb"),
  n=1,
  size="1024x1024"
)

image_url = response.data[0].url

print(image_url)
display(Image(url=image_url, width=400))

The images below correspond to a possible example of the endpoint variations.

<!DOCTYPE html>
<html lang="pt">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Imagens</title>
<style>
    figure {
        float: left;
        margin-right: 20px; /* Espaçamento entre as imagens */
    }
</style>
</head>
<body>
<figure>
    <img src="Tutorials/DALLE/original.png" width="350" height="350" alt="">
    <figcaption>Original</figcaption>
</figure>

<figure>
    <img src="Tutorials/DALLE/variation.png" width="350" height="350" alt="">
    <figcaption>Variation</figcaption>
</figure>

</body>
</html>

## 4.6 TTS Tutorial

In this tutorial you will learn how to integrate text-to-speech feature from open-ai in your projects.

| Available voices | Output Formats           |
|------------------|--------------------------|
| Alloy            | **mp3** - default format|
| Echo             | **opus**                 |
| Fable            | **aac**                  |
| Onyx             | **flac**                 |
| Nova             | **waw**                  |
| Shimmer          | **pcm**                  |


Voice only changes the tone and the "person" who is speaking. TTS only produces english audio files.


There are two models available tts right now: **tts-1** and **tts-1-hd**. If you want lower latency **tts-1** is recommended, but it comes with lower quality than **tts-1-hd**.


**1- Importing the necessary libraries and creating the client**

pathlib - offers classes that represent a filesystem path  
IPython - serves to display the audio in the notebook

In [None]:
from pathlib import Path
from openai import OpenAI
import IPython

client = OpenAI()

**2 - Create a path to save the audio file**

In here you can choose the format you want your audio file to be in.

In [None]:
speech_file_path = Path(f"Tutorials/TTS/tts_audio.mp3")

**3 - Generate the audio**

The endpoint will receive a model, a voice and your input.

In [None]:
response = client.audio.speech.create(
  model="tts-1",
  voice="shimmer",
  input="Hey, I'm a student in Lincenciatura de Engenharia de Telecomunicações e Informática in Instituto Superior de Engenharia do Porto, and I'm doing a tutorial on how to use open-a.i in my projects!"
)

response.write_to_file(speech_file_path)

IPython.display.Audio(speech_file_path)

## 4.7 Whisper

In this tutorial you will learn how to use Whisper to transcribe text from audio files as well translate it into English.

If you **haven't set up your OpenAI API key** as a global system variable you can set it up now by pasting it into the **.env** file and running the following code:

Otherwise, just runs this code:

In [None]:
client = OpenAI()

**Load the audio file**

Feel free to try different audio files as well as add/record your own. Files must be of one of these types: mp3, mp4, mpeg, mpga, m4a, wav, and webm.

**Note that files greater than 25MB will need to be segmented using [additional libraries](https://platform.openai.com/docs/guides/speech-to-text/longer-inputs).**

In [None]:
audio_file = open("Tutorials/Whisper/audio.wav", "rb")

**1 - Transcribe an audio file**

The transcription endpoint will take the input audio and transcribe it into text.

In [None]:
transcription = client.audio.transcriptions.create(
    model="whisper-1", 
    file=audio_file,
    response_format="text"
)

print(transcription)

Notice how **response_format=\"text\"**? To get additional information to get additional information try changing it to **verbose_json**.

You should now receive a json response with additional parameters. One of which, the **language** parameter, includes the detected language from the input file.

**Note:** If the language is not being properly detected, which may negatively impact transcription, you can add an additional parameter stating it according to the  [ISO-639-1 format](https://en.wikipedia.org/wiki/List_of_ISO_639_language_codes).

```
(
    model="whisper-1", 
    file=audio_file,
    response_format="text"
    language="..."
)
```

We can modify our code to reflect this:

In [None]:
transcription = client.audio.transcriptions.create(
    model="whisper-1", 
    file=audio_file,
    response_format="verbose_json"
)
print(f"Detected language: {transcription.language}")
print(transcription.text)

**2 - Translation**

Using the translation endpoint we can translate the contents of the audio file to English (currently this the only available language for translation).

In [None]:
translation = client.audio.translations.create(
    model = "whisper-1", 
    file = audio_file,
    response_format="text"
)
print(translation)

## 4.8 Moderation

The moderation endpoint is a tool you can use to check whether text is potentially harmful. It can be used to identify content that could be harmful and take action.

The templates classify the following categories:
* *__Hate__* - Content that expresses or promotes hatred based on race, gender, ethnicity, religion, nationality, sexual orientation, disability status or caste. Hateful content directed at unprotected groups constitutes harassment.
* *__Hate/Threatening__* - Hateful content that also includes violence or serious harm towards the targeted group based on race, gender, ethnicity, religion, nationality, sexual orientation, disability status, or caste.
* *__Harassment__* - Content that expresses, incites, or promotes harassing language towards any target.
* *__Harassment/Threatening__* - Harassment content that also includes violence or serious harm towards any target.
* *__Self-harm__* - Content that promotes, encourages, or depicts acts of self-harm, such as suicide, cutting, and eating disorders.
* *__Self-harm/Intent__* - Content where the speaker expresses that they are engaging or intend to engage in acts of self-harm, such as suicide, cutting, and eating disorders.
* *__Self-harm/Instructions__* - Content that encourages performing acts of self-harm, such as suicide, cutting, and eating disorders, or that gives instructions or advice on how to commit such acts.
* *__Sexual__* - Content meant to arouse sexual excitement, such as the description of sexual activity, or that promotes sexual services (excluding sex education and wellness).
* *__Sexual/Minors__* - Sexual content that includes an individual who is under 18 years old.
* *__Violence__* - Content that depicts death, violence, or physical injury.
* *__Violence/Graphic__* - Content that depicts death, violence, or physical injury in graphic detail.


To obtain a classification for a piece of text, a request is made to the moderation endpoint, as shown in the following code fragment.
Firstly, the OpenAI class from the openai module is imported, then an instance of the class is created and assigned to a variable, configuring the client to interact with the OpenAI API.

With the 'moderations.create()' method, the text is sent for moderation.

The response from the API will be a JSON object with information about the moderation of the text. 

In [None]:
import os
from openai import OpenAI

client = OpenAI()

response = client.moderations.create(input="I hate Chinese and black people ")

output = response.results[0]
print(output)


Below is an example of the endpoint response structure. It returns the following fields:

* *__flagged__*: Set to true if the model classifies the content as potentially harmful, false otherwise.

* *__categories__*: Contains a dictionary of violation flags for each category. The value will be true if the model flags the corresponding category as violated, false otherwise.
* *__category_scores__*: Contains a dictionary of raw scores per category issued by the model, denoting the model's confidence that the input violates the OpenAI policy for the category. The value is between 0 and 1, with higher values denoting greater confidence. The scores should not be interpreted as probabilities.

```json
{
    "id": "modr-XXXXX",
    "model": "text-moderation-007",
    "results": [
        {
            "flagged": true,
            "categories": {
                "sexual": false,
                "hate": false,
                "harassment": false,
                "self-harm": false,
                "sexual/minors": false,
                "hate/threatening": false,
                "violence/graphic": false,
                "self-harm/intent": false,
                "self-harm/instructions": false,
                "harassment/threatening": true,
                "violence": true
            },
            "category_scores": {
                "sexual": 1.2282071e-6,
                "hate": 0.010696256,
                "harassment": 0.29842457,
                "self-harm": 1.5236925e-8,
                "sexual/minors": 5.7246268e-8,
                "hate/threatening": 0.0060676364,
                "violence/graphic": 4.435014e-6,
                "self-harm/intent": 8.098441e-10,
                "self-harm/instructions": 2.8498655e-11,
                "harassment/threatening": 0.63055265,
                "violence": 0.99011886
            }
        }
    ]
}
```


# 5 Exercises

## 5.1 Exercise A

Embeddings have a lot of uses, when combined with other APIs can do even more. One example is using embeddings with chat completion to extract information from a pdf and then create a function to ask anything about the document.

In the following exercise you will create a program that retrieves information from a pdf and answer questions about it. In order to achieve this you must:
*   Convert a pdf file to embeddings and save them in a csv file 
*   Use embeddings to search a user query in the csv file
*   Send that information to chat completions
  
The pdf used in this exercise will be `LETI_SISTCA_2023_24_Team2_OpenAI.pdf`.

**Start by importing the requiring dependecies and initialize the client and creating some constants**

In [None]:
from openai import OpenAI
import pandas as pd  
import re 
import tiktoken 
import PyPDF2

import ast
from scipy import spatial  

client = OpenAI()


SECTIONS_TO_IGNORE = [
    "Contents",
    "List of Tables",
    "List of Figures",
    "References",
]

MAX_TOKENS = 1600
BATCH_SIZE = 1000   

# TODO #1: Create consts for GPT_MODEL and EMBEDDING_MODEL (small)


**Simple Logic to Extract the information from the pdf**

This is a simple logic to extract the necessary information from the pdf we are going to use.

In [None]:
def extract_text_from_pdf(pdf_path):
    with open(pdf_path, 'rb') as file:
        reader = PyPDF2.PdfReader(file)
        text = ''
        for page_num in range(len(reader.pages)):
            text += reader.pages[page_num].extract_text()
    return text

def split_sections_from_pdf(pdf_text):
    title = []
    text = []
    ignore = True
    current_section = "I.N.I.T.I.A.L-V.A.L.U.E"
    for line in pdf_text.split('\n'):
        line = line.strip()
        
        if not line:
            continue
        if is_new_section(line):
            ignore = ignore_section(line)
            if ignore:
                continue
            title.append(line)
            if current_section != "I.N.I.T.I.A.L-V.A.L.U.E":
                text.append(current_section)
            current_section = ""
        else:
            if not ignore:
                current_section += " " + line
    if current_section:
        text.append(current_section)
        
    
    sections = [(title),(text)]
    return sections

def ignore_section(line):
    if any(section in line for section in SECTIONS_TO_IGNORE):
        return True
    return False

def is_new_section(line):
    pattern = r"\d+\.\d+(?:\.\d+)? [A-Z].*?"
    
    if line.strip().count('.') > 7:
        return False
    if re.match(pattern, line.strip()):
        return True 
    if any(section in line for section in SECTIONS_TO_IGNORE):
        return True

    return False

def clean_section(section):

    titles = section[0]
    text = section[1]
    
    for line in text:
        line = re.sub(r"\[\d+\]", "", line)
        line = re.sub(r"\[\d\d+\]", "", line)
        line = line.strip()

    return (titles, text)


def num_tokens(text, model = GPT_MODEL):
    encoding = tiktoken.encoding_for_model(model)
    return len(encoding.encode(text))

def halved_by_delimiter(string, delimiter = "\n"):
    chunks = string.split(delimiter)
    if len(chunks) == 1:
        return [string, ""]  
    elif len(chunks) == 2:
        return chunks 
    else:
        total_tokens = num_tokens(string)
        halfway = total_tokens // 2
        best_diff = halfway
        for i, chunk in enumerate(chunks):
            left = delimiter.join(chunks[: i + 1])
            left_tokens = num_tokens(left)
            diff = abs(halfway - left_tokens)
            if diff >= best_diff:
                break
            else:
                best_diff = diff
        left = delimiter.join(chunks[:i])
        right = delimiter.join(chunks[i:])
        return [left, right]
    
    
def truncated_string(string, model, max_tokens, print_warning = True,):
    encoding = tiktoken.encoding_for_model(model)
    encoded_string = encoding.encode(string)
    truncated_string = encoding.decode(encoded_string[:max_tokens])
    if print_warning and len(encoded_string) > max_tokens:
        print(f"Warning: Truncated string from {len(encoded_string)} tokens to {max_tokens} tokens.")
    return truncated_string


def split_strings_from_subsection(title, text, max_tokens = 1000, model = GPT_MODEL, max_recursion = 5):

    string = "\n\n".join(title + text)
    num_tokens_in_string = num_tokens(string, model)

    if num_tokens_in_string <= max_tokens:
        return [string]

    elif max_recursion == 0:
        return [truncated_string(string, model, max_tokens)]

    else:
        for delimiter in ["\n\n", "\n", ". "]:
            left, right = halved_by_delimiter(text, delimiter=delimiter)
            if left == "" or right == "":

                continue
            else:

                results = []
                for half in [left, right]:
                    half_strings = split_strings_from_subsection(title, half,max_tokens,model,max_recursion - 1)
                    results.extend(half_strings)
                return results
            
    return [truncated_string(string, model, max_tokens)]



**Call all functions to retrieve the clean pdf sections to then split it into strings**

In [None]:

# TODO #2: Call the previous created functions
pdf_text = ...
pdf_sections = ...
cleaned_sections = ...

MAX_TOKENS = 1600
strings = []
titles = cleaned_sections[0]
texts = cleaned_sections[1]
for i in range(len(titles)):
    strings.extend(split_strings_from_subsection(titles[i], texts[i], max_tokens=MAX_TOKENS))


**Transforming the information to embeddings and saving it to a CSV file**

In [None]:
embeddings = []
for batch_start in range(0, len(strings), BATCH_SIZE):
    batch_end = batch_start + BATCH_SIZE
    batch = strings[batch_start:batch_end]
    
    # TODO #3: Make a request to the embeddings API with the batch as input
    response = client.embeddings.create(model=EMBEDDING_MODEL, input=batch)
    
    for i, be in enumerate(response.data):
        assert i == be.index  
    batch_embeddings = [e.embedding for e in response.data]
    embeddings.extend(batch_embeddings)
    
    
df = pd.DataFrame({"text": strings, "embedding": embeddings})


SAVE_PATH = "SISTCA_TEAM2.csv"
df.to_csv(SAVE_PATH, index=False)

**The first step is done, now we need to create a function so GPT can awnser anything about the pdf using the saved embeddings**

**Change the Embedding Model and read the CSV file**

In [None]:

# TODO #4: Change the EMBEDDING_MODEL (ada)


# TODO #5: Create a variable with the CSV file path


df = pd.read_csv(embeddings_path)
df['embedding'] = df['embedding'].apply(ast.literal_eval)

**Functions to compare the relatedness off the strings with the query**

In [None]:
def strings_ranked_by_relatedness(query, df , relatedness_fn = lambda x, y: 1 - spatial.distance.cosine(x, y), top_n = 100) :
    
    # TODO 6: Make a request to the embeddings API with the query as input




    
    query_embedding = query_embedding_response.data[0].embedding
    strings_and_relatednesses = [
        (row["text"], relatedness_fn(query_embedding, row["embedding"]))
        for i, row in df.iterrows()
    ]
    strings_and_relatednesses.sort(key=lambda x: x[1], reverse=True)
    strings, relatednesses = zip(*strings_and_relatednesses)
    return strings[:top_n], relatednesses[:top_n]

strings, relatednesses = strings_ranked_by_relatedness("open ai", df, top_n=5)


def num_tokens(text, model = GPT_MODEL):
    encoding = tiktoken.encoding_for_model(model)
    return len(encoding.encode(text))

def query_message(query,df, model, token_budget):
    strings, relatednesses = strings_ranked_by_relatedness(query, df)
    introduction = 'Use the below articles on the document about OpenAI made by Team 2, composed by Patrícia Sousa, Carlos Alves, Jose Leal and Tiago Ribeiro, for SISTCA to answer the subsequent question. If the answer cannot be found in the articles, write "Sorry, the information you seek cannot be found in the document in question."'
    question = f"\n\nQuestion: {query}"
    message = introduction
    for string in strings:
        next_article = f'\n\nPDF article section:\n"""\n{string}\n"""'
        if (
            num_tokens(message + next_article + question, model)
            > token_budget
        ):
            break
        else:
            message += next_article
    return message + question


**Create the final function**  


If the chat completions does not have information to answer the query, like defined, it will say _"Sorry, the information you seek cannot be found in the document in question."_

In [None]:
def ask(query, df = df, model = GPT_MODEL, token_budget = 4096 - 500, print_message = False):
    message = query_message(query, df, model=model, token_budget=token_budget)
    if print_message:
        print(message)
    messages = [
        {"role": "system", "content": "You answer questions about the document made by Team2 for SISTCA about OpenAI."},
        {"role": "user", "content": message},
    ]
    
    # TODO 7: Make a request to the Chat Completions API
    
    
    response_message = response.choices[0].message.content
    return response_message


**Test the ask function to verify it**

Because everytime you run ask you give a new prompt to chat completions the awnsers may very for one attempt to another.

In [None]:
print(ask("Scientific/technological background")) 
print(ask("Give me the authors")) 
print(ask("What can you tell me about the document")) 
print(ask("Give me the document structure"))

## 5.2 Exercise B

Despite, in it's current state, only allowing English translation, we can increment the Whisper API with other APIs so as to be able to translate audio into other languages.
By using the standard GPT model in the Chat Completions API we can translate text to and from any language.

In the following exercise you will create a program that translates an audio message into any other language. In order to achieve this you must:

 - Use Whisper to transcribe the original audio;
 - Translate the resulting transcription into any language using Chat Completions;
 - Return the translated audio using Text-To-Speech.

**Start by importing the required dependencies and initializing the client**

In [None]:
import os
from dotenv import load_dotenv
from openai import OpenAI

load_dotenv()

api_key = os.getenv("OPENAI_API_KEY")

client = OpenAI(api_key=api_key)

**Use Whisper to transcribe the original audio**

In [None]:
# TODO #1: load the audio file



# TODO #2: transcribe contents and detect the input language


**Translate the transcribed text into any language using Chat Completions**

In [None]:
# TODO #3: translate the transcription to another language using chatCompletions


**Return the translated audio using Text-To-Speech**

In [None]:
# TODO #4: output the translated audio file




# Save the output audio file
output_file_path = Path(__file__).parent / f"translation.mp3"
translated_audio.write_to_file(output_file_path)

In [None]:
# Test the output
import IPython.display as display

print(f"Detected language: {detected_language} \n")
print(f"Transcribed text: {transcribed_text} \n")

print(f"Translated text: {translated_text} \n")
print(f"Translated audio file saved to {output_file_path}")


display.Audio(speech_file_path)


# 6 Challenge

# References

[1] IBM, “What is an ai model?,” IBM. Accessed: 2024-03-28.  
[2] R. Merritt, “What is a transformer model?,” NVIDIA Blog, mar 2022. (Accessed:
2024-03-28.  
[3] “Openai.” Available at: https://openai.com/. Accessed: 2024-03-27.  
[4] xAI, “xai grok.” Available at: https://grok.x.ai/. (Accessed: 2024-03-27).  
[5] Anthropic, “Anthropic home.” Available at: https://www.anthropic.com/. (Ac-
cessed: 2024-03-27).  
[6] Google, “Deepmind technologies.” Available at:
https://deepmind.google/technologies/. (Accessed: 2024-03-27).  
[7] Cohere, “Cohere home.” Available at: https://cohere.com/. (Accessed: 2024-03-
27).  
[8] “Python.” Available at: https://wiki.python.org/. (Accessed: 2024-03-29).  