<h1> Meeting OpenAi </h1>
<h5><i>Extending keyword extraction to other models</i></h5>








**Discuss and consider with your fellow students what the difference between zero-shot learning and few-shot learning might be when using Large Language Models?**

**Answer:**

**Few shot learning** tries to imitate the human ability to recognize new items from a class of similar items with only a few examples. Traditional learning models use large amounts of data for training. This reduces the cost of building a large labeled dataset. Further, they are able to learn about rare data, where only a few examples are available.

**Zero-shot learning**, is when a model is able to learn from data, without ever having access to it. (0-shot).

During the first lesson we used Google Colab, and introduced **Co:Here**, which provides API access to NLP models. We gave the model a training_set of sentences, from which we tried to apply **Keyword extraction**. The sentences  were labeled with the correct keyword to extract, in this case the first occurrence of the first name of a human in the sentence.

Further, we loaded a csv file containing tweets about airline companies and labeled with an ordinal variable about the percieved emotion about said airline. We built a training set containing the sentence and the correct emotion, and tried to predict all the other sentences contained in the comma separated values file.

A loop was provided to iterate through the csv file and it had a built in wait statement, to delay the requests. The sleep 0.05 waited between each request to not overload resources and ensure that an answer was given before each next request. If there was an exception, it would sleep longer (0.2) before restarting the loop and do another request.
This was necessary because we only had access to the Trial version of <b>Co:Here</b> and requests are limited in time (5 API calls / minute)


<h3>Extention to OpenAi</h3>

To extend this code, I tried to do a similar task using another NLP model through it's respective API: The famous **OpanAi**.

I first installed openai using pip, formulating the CoLab syntax "!" to switch to shell commands. The OpenAi Api-key was provided. I then created two sets of 4 sentences each. The first set contained human first names and the second did not. I concatenated both sets into a unique <i>test_set</i>.

I then built a Python list (<i>correct_answers</i>), containing the correct keywords to extract, which are quite obvious to a human:

```python
correct_answers = ["John", "Sarah", "Michael", "Emily"]
```
I did declare an empty variable containing a list of the keywords to be extracted by the OpenAi model.

The following for loop iterates over the <i>train_set</i>, and for each sentence , it sends a call to the OpenAI API to generate a response for the prompt:

```python
f'Extract the first name of a person from the following sentence, if there is no first name return just "No":\n\n"{sentence}"'
```
...concatenated with the sentence it extracted from the train_set.

I learned that the models always return something and therefore it was necessary to specify what to return ("*No*") if no name was found. Then I excluded it from the extracted_name.
```python
for sentence in test_set:
    response = openai.Completion.create(
        model="ada",
        prompt=f'Extract the first name of a person from the following sentence, if there is no first name return just "No":\n\n"{sentence}"',
        max_tokens=3
    )
    extracted_name = response.choices[0].text.strip()
    if extracted_name != "No":
        extracted_names.append(extracted_name)

```


The max_tokens parameter specifies the maximum number of tokens (words) that the API should generate in response. It is set to 3, which means  the API will generate up to 3 words (Choosen for possible multiple-word names).

The API response is stored in the:
```python
response
``` 
variable. It contains a list of choices, where each choice represents a possible response from the API. The first index in the list is selected using
```python
response.choices[0]
```
The first name from the sentence is extracted from the selected choice object using the text attribute, and any leading or trailing spaces are removed using the 

```python
strip()
```
method.

The call has to follow the OpenAi API parameters defined in their <a href="https://platform.openai.com/docs/api-reference/completions/create">documentation</a>. Therefore I had a look at it, and set the <i>model</i> parameter to <i>"ada"</i>. This is one of the older models, less sophisticated, but already based on the GPT-3 architecture. As they say in their own documentation [<i>"ada: Capable of very simple tasks, usually the fastest model in the GPT-3 series, and lowest cost."</i>Link](https://platform.openai.com/docs/models/gpt-3). Moreover, Ada is a general-purpose language model that is trained on a diverse range of text. It has been trained on a variety of tasks, including language translation, summarization, and question-answering.

So, it is not specific to text analysis & keyword extraction. Let's see how well it does...

<h3>Measuring performance</h3>


Finally, I printed out the output of the model, vs the expected output, as an overview.

I calculated the performance of the model by first creating a set of unique extracted names and counting the number of true positives by finding the intersection of the set of extracted names and the set of correct answers.
```python

unique_extracted_names = set(extracted_names)
true_positives = len(set(extracted_names) & set(correct_answers))
```

To count false positives, I summed the count of extracted names for each unique extracted name that was not in the set of correct answers, and added the count of extracted names minus one for each unique extracted name that was in the set of correct answers. 


To count false negatives, I took the difference between the length of the set of correct answers and  the set of extracted names. 

I calculated **precision** as true positives divided by the sum of true positives and false positives, that is to say $\frac{N° right-answers}{N° all-answers}$ 

**Recall** as true positives divided by the sum of true positives and false negatives.

<b>Precision</b> measures how accurate positive predictions made are, while

<b>recall</b> measures the completeness of the positive predictions made by the model.

<b>High precision</b> =  the model is making fewer false positive predictions.

<b>High recall</b> = the model is correctly identifying most of the positive samples .


As we can see the performance is not great, but that might be because I chose a less specific model.

In [73]:
!pip install openai
import openai

openai.api_key = "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"

# List of sentences with and without names
sentences_with_names = [
    "John went to the store to buy some groceries.",
    "Sarah is a doctor and works at the hospital.",
    "Michael enjoys playing basketball in his free time.",
    "Emily is studying computer science in college."
]

sentences_without_names = [
    "The sun is shining and the birds are singing.",
    "I like to go for a walk in the park on weekends.",
    "My favorite color is blue.",
    "I am learning to play the guitar."
]

test_set = sentences_with_names + sentences_without_names

# List of correct answers
correct_answers = ["John", "Sarah", "Michael", "Emily"]

# Extracted first names using OpenAI API
extracted_names = []

for sentence in test_set:
    response = openai.Completion.create(
        model="ada",
        prompt=f'Extract the first name of a person from the following sentence, if there is no first name return just "No":\n\n{sentence}',
        max_tokens=3
    )
    extracted_name = response.choices[0].text.strip()
    if extracted_name != "No":
        extracted_names.append(extracted_name)

print("Extracted first names:", extracted_names)
print("Correct answers:      ", correct_answers)

# Calculate performance metrics
unique_extracted_names = set(extracted_names)
true_positives = len(set(extracted_names) & set(correct_answers))
false_positives = sum(extracted_names.count(name) for name in unique_extracted_names if name not in correct_answers)
false_positives += sum(extracted_names.count(name) - 1 for name in unique_extracted_names if name in correct_answers)
false_negatives = len(set(correct_answers) - set(extracted_names))
print("True positives:",true_positives)
print("False positives:", false_positives)
print("False negatives:", false_negatives)

precision = true_positives / (true_positives + false_positives)
recall = true_positives / (true_positives + false_negatives)

print("Precision:", precision)
print("Recall:", recall)

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Extracted first names: ['Ext', 'The', 'Speaked out', 'Derek is studying', 'A', '"', 'The', "I'm pretty"]
Correct answers:       ['John', 'Sarah', 'Michael', 'Emily']
True positives: 0
False positives: 8
False negatives: 4
Precision: 0.0
Recall: 0.0


<h3>Let's try other models from OpenAi</h3>

Having a poor performance, let's see if there has been some improvement with more advanced models. Let's see what OpenAi has to offer on their API.

In [74]:
openai.Model.list()

<OpenAIObject list at 0x7fc99dc7b090> JSON: {
  "data": [
    {
      "created": 1649358449,
      "id": "babbage",
      "object": "model",
      "owned_by": "openai",
      "parent": null,
      "permission": [
        {
          "allow_create_engine": false,
          "allow_fine_tuning": false,
          "allow_logprobs": true,
          "allow_sampling": true,
          "allow_search_indices": false,
          "allow_view": true,
          "created": 1669085501,
          "group": null,
          "id": "modelperm-49FUp5v084tBB49tC4z8LPH5",
          "is_blocking": false,
          "object": "model_permission",
          "organization": "*"
        }
      ],
      "root": "babbage"
    },
    {
      "created": 1649359874,
      "id": "davinci",
      "object": "model",
      "owned_by": "openai",
      "parent": null,
      "permission": [
        {
          "allow_create_engine": false,
          "allow_fine_tuning": false,
          "allow_logprobs": true,
          "allow_sa

Let's choose a more task-specific model: <i>text-ada-001</i>. It is a smaller model than ADA, but it has been fine-tuned on specific tasks such as sentiment analysis, classification, and question-answering.

In [71]:
# Extracted first names using OpenAI API
extracted_names = []

for sentence in test_set:
    response = openai.Completion.create(
        model="text-ada-001",
        prompt=f'Extract the first name of a person from the following sentence, if there is no first name return just "No":\n\n{sentence}',
        max_tokens=3
    )
    extracted_name = response.choices[0].text.strip()
    if extracted_name != "No":
        extracted_names.append(extracted_name)

print("Extracted first names:", extracted_names)
print("Correct answers:      ", correct_answers)

# Calculate performance metrics
unique_extracted_names = set(extracted_names)
true_positives = len(set(extracted_names) & set(correct_answers))
false_positives = sum(extracted_names.count(name) for name in unique_extracted_names if name not in correct_answers)
false_positives += sum(extracted_names.count(name) - 1 for name in unique_extracted_names if name in correct_answers)
false_negatives = len(set(correct_answers) - set(extracted_names))
print("True positives:",true_positives)
print("False positives:", false_positives)
print("False negatives:", false_negatives)

precision = true_positives / (true_positives + false_positives)
recall = true_positives / (true_positives + false_negatives)

print("Precision:", precision)
print("Recall:", recall)

Extracted first names: ['John', 'Michael', 'There']
Correct answers:       ['John', 'Sarah', 'Michael', 'Emily']
True positives: 2
False positives: 1
False negatives: 2
Precision: 0.6666666666666666
Recall: 0.5


Performance did increase! ✅

Let's try something more!

The models tried before "*ada*" & "*text-ada-001*" were from the GPT-3 family of models. Lets try the new models. These models are from the 3.5 GPT family and I will use the:
```python
text-davinci-003
```

It is described as ["*Can do any language task with better quality, longer output, and consistent instruction-following than the curie, babbage, or ada models. Also supports inserting completions within text.*"](https://platform.openai.com/docs/models/gpt-3-5)

In [70]:
# Extracted first names using OpenAI API
extracted_names = []

for sentence in test_set:
    response = openai.Completion.create(
        model="text-davinci-003",
        prompt=f'Extract the first name of a person from the following sentence, if there is no first name return just "No":\n\n{sentence}',
        max_tokens=3
    )
    extracted_name = response.choices[0].text.strip()
    if extracted_name != "No":
        extracted_names.append(extracted_name)

print("Extracted first names:", extracted_names)
print("Correct answers:      ", correct_answers)

# Calculate performance metrics
unique_extracted_names = set(extracted_names)
true_positives = len(set(extracted_names) & set(correct_answers))
false_positives = sum(extracted_names.count(name) for name in unique_extracted_names if name not in correct_answers)
false_positives += sum(extracted_names.count(name) - 1 for name in unique_extracted_names if name in correct_answers)
false_negatives = len(set(correct_answers) - set(extracted_names))
print("True positives:",true_positives)
print("False positives:", false_positives)
print("False negatives:", false_negatives)

precision = true_positives / (true_positives + false_positives)
recall = true_positives / (true_positives + false_negatives)

print("Precision:", precision)
print("Recall:", recall)

Extracted first names: ['John', 'Sarah', 'Michael', 'Emily']
Correct answers:       ['John', 'Sarah', 'Michael', 'Emily']
True positives: 4
False positives: 0
False negatives: 0
Precision: 1.0
Recall: 1.0


Now the performance is great 🎉

We can say that development in NLP models is going fast, and that choosing the right model for the task is an important aspect to consider.