<a href="https://colab.research.google.com/github/Shumin-li-mcit/Tensorfow/blob/main/5210_HW10_Large_Language_Models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Exploring GPT-3

In this homework assignment we will walk you through how to use GPT-3 a large pre-trained neural language model developed by OpenAI.  

You will learn about the following topics:
* Prompts and completions.  You should observe that the the quality of the text generated is high quality, but not necessarially factually accurate.
* Probabilities.  You'll learn how to inspect probabilities assigned to words in the model's output.
* Few shot learning.  We'll see an example of few-shot learning with a small handful of examples.
* Zero shot learning.  We will explore the zero-shot capabilities of pre-trained LMs.  You'll design zero-shot prompts for
1. summarization
2. question-answering
3. simplification
4. translation
* How to fine tune a model.  You will learn how to fine-tune GPT-3 to take a Wikipedia infobox as input and generate the text of a biography as its ouput.  You'll then write your own code to do the reverse task – given a biography, extract the  attributes and values in the style of a Wikipedia infobox.



# Prompt Completion

As a warm-up we'll have you play with [the OpenAI Playground](https://beta.openai.com/playground).  Try inputting this prompt:

> One of my favorite professors at the University of Pennsylvania is

And the click the "Submit" button to generate a completion.

Copy and paste the text below (including your prompt).

You might notice that the text that GPT-3 generates ends mid-sentence.  GPT-3 will generate text until it either generates a special "stop sequence" token `<|endoftext|>`, or it outputs the number of tokens specified by the `maximum length` variable.
You can press Submit again to have it continue generatin, or you can increase the max length variable in the sliderbar on the right.

In [None]:
favorite_professor_completion_1 = """
One of my favorite professors at the University of Pennsylvania is \

Dr. Sarah Johnson. She was my professor for a course on contemporary American literature. Dr. Johnson's passion for literature was infectious, and she always made the class engaging and thought-provoking.

She had a deep knowledge and understanding of the texts we studied, and she would often bring in real-life examples and current events to help us relate to the material. This made the literature come alive and helped us see its relevance in our own lives.

Dr. Johnson was also incredibly approachable and always made time for her students. She encouraged open discussions and welcomed different perspectives, which created a collaborative and inclusive learning environment.

One aspect I really appreciated about Dr. Johnson was her ability to challenge us intellectually while still providing support and guidance. She pushed us to think critically and analyze the texts in depth, but she was always there to provide further explanation or clarity when needed.

Her teaching style was dynamic, incorporating various methods such as group discussions, multimedia presentations, and interactive activities. This kept us engaged and motivated to participate actively in our learning.

Overall, Dr. Sarah Johnson was not only an exceptional professor but also a mentor who genuinely cared about her students' growth and success. Her enthusiasm for literature was contagious, and she played a significant role in shaping my love for the subject.
"""

GPT-3 generates fluent text, but it is not always grounded in fact.  Let's do a Google search for the person that GPT-3 generated as your favorite professor and check
* Are they actually a professor?
* Where do they work?

In [None]:
# Extract the professor's name
professor_name_1 = "Sarah Johnson"

# Do a Google search and answer these questions
actually_a_professor_1 = True

# Insitituion where they work
instituion_1 = "UC San Diego"

When it generates its completions, GPT-3 generates each new word/token according to its probability distribution.  It draws each word at random in proportion to its propability.  That randomness means that it can generate different completions. You can re-generate and get different completions each time.

Generate another 4 completions for the professor prompt:

> One of my favorite professors at the University of Pennsylvania is

and do Google searches for them.

*Tip: You can generate another response with the Regenerate button to the right of the Submit button.  The Regenerate button has a recycle symbol on it.*

In [None]:
favorite_professor_completion_2 = """
Professor John Smith. He is an exceptional teacher who has a deep knowledge and passion for his subject, which is economics. What sets him apart from other professors is his ability to make complex concepts easy to understand through real-life examples and engaging class discussions.

Professor Smith's teaching style is dynamic and interactive. He encourages student participation and creates a supportive learning environment where everyone feels comfortable sharing their thoughts and asking questions. He is always open to different perspectives and encourages critical thinking, which has greatly enhanced my learning experience.

Another aspect that I admire about Professor Smith is his dedication to his students' success. He is always available for extra help and provides feedback that is constructive and helpful. He genuinely cares about his students' progress and goes above and beyond to ensure that everyone is grasping the material.

Furthermore, Professor Smith's enthusiasm for economics is contagious. He brings energy and excitement to every class, which makes the subject matter even more interesting. His lectures are not only informative but also engaging and enjoyable.

Overall, Professor John Smith is an outstanding professor who has had a significant impact on my academic journey at the University of Pennsylvania. His expertise, teaching style, and dedication to students' success make him one of my favorite professors. I feel fortunate to have had the opportunity to learn from him and
"""

favorite_professor_completion_3 = """
Dr. Jane Smith. She is an incredibly knowledgeable and inspiring professor in the Department of Psychology. Dr. Smith has a unique teaching style that makes complex concepts easy to understand and engage with.

What sets Dr. Smith apart is her passion for the subject matter and genuine interest in her students' learning. She goes above and beyond to ensure that everyone in the class feels included and valued. Dr. Smith encourages open discussions and creates a supportive learning environment where students feel comfortable sharing their thoughts and ideas.

In addition to her teaching abilities, Dr. Smith is also actively involved in research and has published numerous articles in prestigious journals. Her expertise and experience in the field of psychology enrich her lectures and provide real-world examples to illustrate theoretical concepts.

Outside the classroom, Dr. Smith is approachable and always willing to provide guidance and support to her students. She is known for her willingness to meet one-on-one with students to discuss academic or career-related concerns. Dr. Smith truly cares about the success and well-being of her students, and her mentorship has had a significant impact on my academic journey.

Overall, Dr. Jane Smith is an exceptional professor who combines her expertise, passion, and dedication to create a memorable and valuable learning experience. She has been instrumental in shaping my education at the
"""

favorite_professor_completion_4 = """
Dr. John Smith. He teaches political science and is known for his engaging lectures and passion for the subject. \
Dr. Smith has a unique ability to break down complex concepts and make them accessible to all students. \
He encourages critical thinking and fosters an inclusive and interactive classroom environment. \
Additionally, he is always available to answer questions and provide guidance outside of class. \
Dr. Smith's dedication to his students' learning and his genuine enthusiasm for the subject make him a standout professor at the University of Pennsylvania.
"""

favorite_professor_completion_5 = """
One of my favorite professors at the University of Pennsylvania is Dr. Anthony W. Lee. \
He is a professor in the Department of Communication within the Annenberg School for Communication. \
Dr. Lee is an amazing teacher and mentor who is always willing to help his students and give them the best advice. \
He also has a great sense of humor which makes his classes a lot of fun. He is an expert in the communication field, \
and his classes are always informative and engaging. He encourages his students to think critically and to challenge their own assumptions. \
His enthusiasm for teaching and commitment to his students make him a great professor.
"""

# Do a Google search for these professors

professor_name_2 = "John Smith"
actually_a_professor_2 = True
instituion_2 = "Upenn CIS"

professor_name_3 = "Jane Smith"
actually_a_professor_3 = True
instituion_3 = "UMN Psychology"

professor_name_4 = "John Smith"
actually_a_professor_4 = True
instituion_4 = "Upenn CIS"

professor_name_5 = "Anthony W. Lee"
actually_a_professor_5 = True
instituion_5 = "Arkansas Tech University"

## Probabilities

Just like with the n-gram language models that we stuided earlier in the course, neural language models like GPT-3 assign probabilities to each token in a sequence.  

In the playground, you can see the probabilities for the top-5 words predicted at each position by choosing the `Full Spectrum` option from the `Show probabilities` dropdown menu in the controls.  Try selecting that option and then generate a completion for the prompt

> My favorite class in the Computer Science Department was taught by Professor

If you mouse over the word after professor, you'll see something like this:
```
Joe = 8.21%
John = 4.25%
Nancy = 2.27%
David = 2.09%
Barbara = 2.05%
Total: -2.50 logprob on 1 tokens
(18.87% probability covered in top 5 logits
```

One critical observation about language models is that they often encode societal biases that appear in their data.  For instance, after the disovery that LM embeddings could be used to solve word analogy problems like "**man** is to **woman** as **king** is to ___" (the model predicts **queen**), researchers discovered that LMs had a surpisingly sexist answer to the analogy problem  "**man** is to **woman** as **computer programmer** is to ___" (the model predicts **homemaker**).  These kinds of biases are prevelant and pernicious.

Let's examine the most probable names that GPT3 assigns to different completions and analyze their gender.  We'll see if it associates different genders with different academic disciplines.  (You can also see this for different careers like *nurse*, *plumber*, or *school teacher*).

Please create dictionaries mapping GPT's predictions for the first names of professors in these departmemnts
* Computer Science
* Gender Studies
* Physics
* Linguisticss
* Bioengineering
Use the prompt:
> My favorite class in the {deparment_name} Department was taught by Professor

**Note: you can also add a stop sequence of `.` to get the model to complete only a single sentence.**



In [None]:

# Classify each name as male, female, partial word, or unknown
computer_science_genders = {
  "Joe" : "male",
  "John" : "male",
  "Nancy" : "female",
  "David" : "male",
  "Barbara" : "female",
}

gender_studies_genders = {
  'Sarah': "female",
  'Elizabeth': 'female',
  'Jennifer': 'female',
  'Laura': 'female',
  'Mary': 'female',
}

physics_genders = {
  "John" : "male",
  "David" : "male",
  "Robert" : "male",
  "Stephen" : "male",
  "Michael" : "male"
}

lingusitics_genders = {
  "David" : "male",
  "John" : "male",
  "Robert" : "male",
  "Elizabeth" : "female",
  "Paul" : "male"
}

bioengineering_genders = {
  "David" : "male",
  "John" : "male",
  "Robert" : "male",
  "Michael" : "male",
  "Mark" : "male"
}

(If you wanted to systematically explore the predictions of the model, you could use the API's logprobs argument to return the the log probabilities on the logprobs most likely tokens, as well the chosen tokens.)

# Few Shot Learning

One of the remarkable properties of large language models is a consequence of the fact that they have been trained on so much language data.  They encode that training data as background information that lets them learn new tasks and to generalize patterns using only a few examples.  This is called "Few shot learning".

Here is an example.  Imagine that we want to build a system that allows a student to say something they want to learn, and the system will recommend the subject for them to study.  Here are examples of inputs and outputs to our program:

```
how to program in Python - computer science
factors leading up to WW2 - history
branches of government - political science
Shakespeare's plays - English
cellular respiration - biology
respiratory disease - medical
how to sculpt - art
```

We can use these 7 examples (and probably fewer!) as a prompt to GPT-3, and it will perform few shot learning by figuring out what our pattern is, and being able to perform the task for new inputs.

Try pasting those examples into the Playground, and then listing out a few subjects to see what is output.

```
cellular respiration
respiratory disease
how to play saxophone
autonomic system
how write a screenplay
perform in a play
stock market
planetary orbits
relativity
```



Fill in the dictionary below using the playground by replacing the TODOs with the model's predictions.

In [None]:
few_shot_subject_classification_results = {
  "cellular respiration" : "biology",
  "respiratory disease" : "medical",
  "how to play saxophone" : "music",
  "autonomic system" : "anatomy",
  "how write a screenplay" : "creative writing",
  "perform in a play" : "theatre",
  "stock market" : "economics",
  "planetary orbits" : "astronomy",
  "relativity" : "physics",
}

## Using the API

Now let's take a look at how to call the OpenAI API from our code, so that we don't have to manually enter inputs into the Playground.  

If you click on the "View code" button on the playground, you'll see a sample of code for whatever prompt you have.  For example, here's the code that we have for our few-shot learning that generates a subject to study for a topic that someone is interested in:

```python
import os
import openai

openai.api_key = os.getenv("OPENAI_API_KEY")

response = openai.Completion.create(
  model="text-davinci-002",
  prompt="how to program in Python - computer science\nfactors leading up to WW2 - history\nbranches of government - political science\nShakespeare's plays - English\ncellular respiration - biology\nrespiratory disease - medical\nhow to sculpt - art",
  temperature=0.7,
  max_tokens=256,
  top_p=1,
  frequency_penalty=0,
  presence_penalty=0
)
```
This is python code, so it'll be pretty easy for us to use this as a starting point and to modify it to create a function that we can call.


First, you'll need install the OpenAPI via pip.  You can use pip and other Unix command in a colab notebook by prefixing them with an exclamation point.  (The `%%capture` command before that just surpresses the output of running the Unix command.  You can remove it if you want to see the progress of the command).


In [3]:
%%capture
!pip install openai

Next, you will enter your secret key for the OpenAI API, then you can find your OpenAI API key [here](https://beta.openai.com/account/api-keys).  

We will enter it as a password, so that the raw text of it doesn't get saved in your Python notebook and you accidentally make your notebook public.  That would be bad because then other people could use your key and have you pay for their usage.

In [74]:
from getpass import getpass
import os
import openai

print('Enter OpenAI API key:')
openai.api_key = getpass()

os.environ['OPENAI_API_KEY']=openai.api_key

Enter OpenAI API key:
··········


Now let's write a function that takes a topic as input and then outputs a subject to study if you want to learn about that topic.

In [6]:
import openai
import os
import time

def generate_subject_few_shot(topic):
  few_shot_prompt = """how to program in Python - computer science
factors leading up to WW2 - history
branches of government - political science
Shakespeare's plays - English
cellular respiration - biology
respiratory disease - medical
how to sculpt - art
"""

  response = openai.Completion.create(
      model="text-davinci-002",
      prompt=few_shot_prompt + topic + " - ", # We'll append our topic and a dash to the end of the few shot prompt.
      temperature=0.7,
      max_tokens=256,
      top_p=1,
      frequency_penalty=0,
      presence_penalty=0,
      stop=["\n"]
  )
  # I recommend putting a short wait after each call,
  # since the rate limit for the platform is 60 requests/min.
  # (This increases to 3000 requests/min after you've been using the platform for 2 days).
  time.sleep(1)

  # the response from OpenAI's API is a JSON object that contains
  # the completion to your prompt plus some other information.  Here's how to access
  # just the text of the completion.
  return response['choices'][0]['text'].strip()

topic = "cellular respiration"
generate_subject_few_shot(topic)

'science'

That's it!  That's an exampe of how to write a function call to the OpenAI API in order for it to output a subject for a topic.

Here is some information about the different arguments that we to the `openai.Completion.create` call:
 * `model` – OpenAI offers four different sized versionf of the GPT-3 model: davinci, currie, babbage and ada.  Davinci has the largest number of parameters and is [the most expensive to run](https://openai.com/api/pricing/).  Ada has the fewest parameters, is the fastest to run and is the least expensive.
 * `prompt` - this is the prompt that the model will generate a completion for
 * `temperature` - controls how much of the probability distribution the model will use when it is generating each token. 1.0 means that it samples from the complete probability distrubiton, 0.7 means that it drops the bottom 30% of the least likely tokens when it is sampling. 0.0 means that it will perform deterministically and always output the single most probable token for each context.
 * `top_p` - is an alternative way of controling the sampling.
 * `frequency_penalty` and `presence_penalty` are two ways of reduing the model from repeating the same words in one output.  You can set these to be >0 if you're seeing a lot of repetition in your output.
 * `max_tokens` is the maximum length in tokens that will be output by calling the function.  A token is a subword unit.  There are roughly 2 or 3 tokens per word on average.
 * `stop` is a list of stop sequences.  The model will stop generating output once it generates one of these strings, even if it hasn't reached the max token length. By default this is set to a special token `<|endoftext|>`.

You can read more about [the Completion API call in the documentation](https://beta.openai.com/docs/api-reference/completions).

# Zero shot learning

In addition to few shot learning, GPT-3 can sometimes also perform "zero shot learning" where instead of giving it several examples of what we want it to do, we can instead give it instructions of what we want it to do.

For example, for our topic - subject task we could give GPT-3 the prompt

> Given a topic, output the subject that a student should study if they want to know more about that topic.

Then if we append
> cellular respiration -

GPT3 will output biology.

Try to adapt the `generate_subject_few_shot` function to do a zero-shot version.

In [7]:
def generate_subject_zero_shot(topic):
  # TODO - write this function
  zero_shot_prompt = """Given a topic, \
  output the subject that a student should study if they want to know more about that topic."""

  response = openai.Completion.create(
      model="text-davinci-002",
      prompt=zero_shot_prompt + topic + " - ", # We'll append our topic and a dash to the end of the few shot prompt.
      temperature=0.7,
      max_tokens=256,
      top_p=1,
      frequency_penalty=0,
      presence_penalty=0,
      stop=["\n"]
  )
  # I recommend putting a short wait after each call,
  # since the rate limit for the platform is 60 requests/min.
  # (This increases to 3000 requests/min after you've been using the platform for 2 days).
  time.sleep(1)

  # the response from OpenAI's API is a JSON object that contains
  # the completion to your prompt plus some other information.  Here's how to access
  # just the text of the completion.
  return response['choices'][0]['text'].strip()

topic = "cellular respiration"
generate_subject_few_shot(topic)

'biology'

A very cool recent finding is that training proceedure for large language models can be changed to improve this instruction following behavior.  If large LMs are [trained to do multiple tasks through prompting](https://arxiv.org/abs/2110.08207), they better generalize to complete new tasks in a zero-shot fashion.  The current version of GPT3 (text-davinci-2) uses this kind of training.

Try writing zero-shot prompts to do the following tasks:
1. Summarize a Wikipedia article.
2. Answer questions about an article.
3. Re-write an article so that it's suitable for a young child who is just learning how to read (age 8 or so).
4. Translate an article from Russian into English.

You should experiment with a few prompts in the playground to find a good prompt that seems to work well.

In [57]:
def summarize(article_text):
  # TODO - write this function
  summary = "Summarize this Wikipedia article."

  response = openai.Completion.create(
      model="text-davinci-003",
      prompt=summary + "\n'''" + article_text + "'''\n",
      temperature=0.7,
      max_tokens=256,
      top_p=1,
      frequency_penalty=0,
      presence_penalty=0,
      stop=["\n"]
  )

  time.sleep(1)

  return response['choices'][0]['text'].strip()

def answer_question(article_text, question):
  # TODO - write this function
  answer = "Answer the question based on this Wikipedia article below."

  response = openai.Completion.create(
      model="text-davinci-003",
      prompt= "Question: " + question \
      + "\n" + answer + "\n'''" + article_text + "'''\n",
      temperature=0.7,
      max_tokens=256,
      top_p=1,
      frequency_penalty=0,
      presence_penalty=0,
      stop=["\n"]
  )

  time.sleep(1)

  return response['choices'][0]['text'].strip()

def simplify(article_text):
  # TODO - write a function to re-write an article so that it's suitable for a young child.
  simplified_article = "Simplify the following article so that a 5-year-old can understand."

  response = openai.Completion.create(
      model="text-davinci-003",
      prompt= simplified_article + "\n\n'''" + article_text + "'''",
      temperature=0.7,
      max_tokens=256,
      top_p=1,
      frequency_penalty=0,
      presence_penalty=0
  )

  time.sleep(1)

  return response['choices'][0]['text'].strip()

def translate(article_text, source_language, target_language):
    # TODO - write a function to translate an article from a source language to a target language.
  translated_article = "Translate the following article from " + source_language + " to " + target_language

  response = openai.Completion.create(
      model="text-davinci-003",
      prompt= translated_article + "\n\n'''" + article_text + "'''",
      temperature=0.7,
      max_tokens=256,
      top_p=1,
      frequency_penalty=0,
      presence_penalty=0
  )

  time.sleep(1)

  return response['choices'][0]['text'].strip()

Show your outputs in your prompts.  The colab notebook that you turn in should have these outputs for the TAs and professor to review.

In [26]:
article_text = """
The wandering albatross, snowy albatross, white-winged albatross or goonie(Diomedea exulans) \
is a large seabird from the family Diomedeidae, which has a circumpolar range in the Southern Ocean. \
It was the last species of albatross to be described, and was long considered the same species as the \
Tristan albatross and the Antipodean albatross. A few authors still consider them all subspecies of the same species. \
The SACC has a proposal on the table to split this species, and BirdLife International has already split it. \
Together with the Amsterdam albatross, it forms the wandering albatross species complex.

The wandering albatross is one of the two largest members of the genus Diomedea (the great albatrosses), \
being similar in size to the southern royal albatross. It is one of the largest, best known, and most studied \
species of bird in the world. It has the greatest known wingspan of any living bird, and is also one of the most \
far-ranging birds. Some individual wandering albatrosses are known to circumnavigate the Southern Ocean three times, \
covering more than 120,000 km (75,000 mi), in one year.
"""

summarize(article_text)

'The wandering albatross (Diomedea exulans) is a large seabird found in the Southern Ocean. It is part of the Diomedeidae family and is the largest of the albatross species. It is known for its extensive wingspan, being the largest of any living bird, and its far-ranging habits, with some individuals being recorded to circumnavigate the Southern Ocean three times in a year. It is one of the most studied and well-known bird species in the world.'

In [34]:
article_text = """
The wandering albatross, snowy albatross, white-winged albatross or goonie(Diomedea exulans) \
is a large seabird from the family Diomedeidae, which has a circumpolar range in the Southern Ocean. \
It was the last species of albatross to be described, and was long considered the same species as the \
Tristan albatross and the Antipodean albatross. A few authors still consider them all subspecies of the same species. \
The SACC has a proposal on the table to split this species, and BirdLife International has already split it. \
Together with the Amsterdam albatross, it forms the wandering albatross species complex.

The wandering albatross is one of the two largest members of the genus Diomedea (the great albatrosses), \
being similar in size to the southern royal albatross. It is one of the largest, best known, and most studied \
species of bird in the world. It has the greatest known wingspan of any living bird, and is also one of the most \
far-ranging birds. Some individual wandering albatrosses are known to circumnavigate the Southern Ocean three times, \
covering more than 120,000 km (75,000 mi), in one year.
"""

questions = [
    "Which species are considered as the same species as the wandering albatross?",
    "What genus does the wandering albatross belong to?",
    "How does the wandering albatross look like?",
    "What is the flying range of wandering albatross?",
    "Who are the two largest members of the genus Diomedea?"
]

for question in questions:
  answer = answer_question(article_text, question)
  print(question)
  print(answer)
  print('---')

Which species are considered as the same species as the wandering albatross?
The species considered to be the same as the wandering albatross are the Tristan albatross and the Antipodean albatross. Some authors still consider them all subspecies of the same species. The Amsterdam albatross also forms part of the wandering albatross species complex.
---
What genus does the wandering albatross belong to?
Answer: The wandering albatross belongs to the genus Diomedea.
---
How does the wandering albatross look like?
The wandering albatross is a large seabird with the greatest known wingspan of any living bird. It is white in colour with black flight feathers on the wings and a black tail. It has yellow-tipped bill, pink legs and feet, and a white head with a white eye patch.
---
What is the flying range of wandering albatross?
The flying range of the wandering albatross is immense, and individual birds have been known to circumnavigate the Southern Ocean three times in one year, covering mo

In [56]:
article_text = """
The wandering albatross, snowy albatross, white-winged albatross or goonie(Diomedea exulans) \
is a large seabird from the family Diomedeidae, which has a circumpolar range in the Southern Ocean. \
It was the last species of albatross to be described, and was long considered the same species as the \
Tristan albatross and the Antipodean albatross. A few authors still consider them all subspecies of the same species. \
The SACC has a proposal on the table to split this species, and BirdLife International has already split it. \
Together with the Amsterdam albatross, it forms the wandering albatross species complex.

The wandering albatross is one of the two largest members of the genus Diomedea (the great albatrosses), \
being similar in size to the southern royal albatross. It is one of the largest, best known, and most studied \
species of bird in the world. It has the greatest known wingspan of any living bird, and is also one of the most \
far-ranging birds. Some individual wandering albatrosses are known to circumnavigate the Southern Ocean three times, \
covering more than 120,000 km (75,000 mi), in one year.
"""

simplify(article_text)

"The wandering albatross is a very big bird. It has the biggest wings of any bird. It lives in the Southern Ocean and some of them fly around the Southern Ocean three times in one year. That's like flying more than 120,000 km (75,000 mi)."

In [58]:
russian_article = """
Странствующий альбатрос — одна из самых крупных летающих птиц — тело у неё достигают длины 120 см, масса взрослой самки — 7—9 кг, самца до 11 кг. Эти птицы считаются обладателями одного из самым больших среди современных птиц размаха крыльев — до 325 см. Рекорд принадлежит старому самцу, пойманному в 1965 году у берегов Австралии — размах его крыльев составлял 3 метра 63 сантиметра. Крылья у этой птицы длинные и узкие.

У взрослых особей оперение полностью белое, за исключением тонких чёрных каёмок на задней части крыльев. У этих птиц мощный клюв, а лапы имеют бледный розовый оттенок. Глаза, как правило, тёмно-коричневого цвета. Молодняк существенно отличается от взрослых по своему внешнему виду. У него бурое оперение, которое лишь со временем выцветает и превращается в белое. Последние остатки бурой окраски встречаются, как правило, в качестве полоски на груди.
"""

source_language = "Russian"
target_language = "English"
translate(russian_article, source_language, target_language)

'The wandering albatross is one of the largest flying birds - its body reaches a length of 120 cm, the weight of an adult female is 7-9 kg, and the male is up to 11 kg. These birds are considered to have one of the largest wingspans among modern birds - up to 325 cm. The record belongs to an old male caught in 1965 near the coast of Australia - the wingspan of his wings was 3 meters 63 centimeters. The wings of this bird are long and narrow.\n\nAdult plumage is completely white, except for thin black edging on the back of the wings. These birds have a powerful beak, and their paws have a pale pink tint. The eyes are usually dark brown. The chicks differ significantly from adults in appearance. It has a brown plumage that gradually fades and turns white. The last remnants of brown color are usually in the form of stripes on the chest.'

## TODO - Pick your own task

For this section you should pick some task that you'd like to have GPT3 do.  Add a description and code to your notebook here.  You should:
1. Write a short description of what task you tried, why you were interested in it.
2. Give some code so that we can reproduce what you did via an Open API call.  You should include output of your code in the Python Notebook that you turned in.
3. Write a short qualitative analysis of whether or not GPT3 did the task well.

TODO - your task description

**Given the signalment and symptoms of a canine patient, ask GPT3 to generate a list of differential diagnosis.**

Why interested? I am a Veterinarian, Doctor of Veterinary medicine, myself. I am curious how GPT3 would perform as a virtual veterinarian when it comes to some cases that a general practitioner would see routinely at the clinics.

In [59]:
# TODO your code
def diagnose(signalment):
    # TODO - write a function to translate an article from a source language to a target language.
  ddx_prompt = """
  Give a list of differential diagnosis for this canine patient based on the signalment and symptoms provided below.
  """

  response = openai.Completion.create(
      model="text-davinci-003",
      prompt= ddx_prompt + "\n\n'''" + signalment + "'''",
      temperature=0.7,
      max_tokens=256,
      top_p=1,
      frequency_penalty=0,
      presence_penalty=0
  )

  time.sleep(1)

  return response['choices'][0]['text'].strip()

signalments = [
    "Coco is a 12-year-old spayed female poodle who was presented for chronic dry skin with scaling, \
    weight gain, inactivity, depressed mentation and cold intolerance. Her CBC shows normocytic, normochromic, non-regenerative anemia. \
    Her chemistry panel reveals hypercholesterolemia, elevated ALP and ALT. Total T4 test is lower than normal.",
    "Grace is a 3-year-old intact male Great Dane, presented for polyruria, inappetence, lethargy, chronic vomiting and diarrhea, weight loss. \
    CBC shows mild anemia, neutrophilia, lymphocytosis and mild eusinophilia. Chemistry shows azotemia, electrolyte imbalance (hyperkalemia, hyponatremia, \
    hypochloremia) and acidosis.",
    "Pup is a 10-year-old spayed female American Cocker Spaniel, presented for squinting, nictitating membrane protrusion, red eye and mydriasis. \
    She has been bumping into things more often recently, and the owner worried it was due to her reduced vision. On her ocular examination, \
    both eyes had an IOP over 30mmHg, and Gonioscoipic exam reveals reduced iridocorneal angle in both eyes. "
               ]
for sig in signalments:
  ddx = diagnose(sig)
  print(sig)
  print(ddx)
  print('---')

Coco is a 12-year-old spayed female poodle who was presented for chronic dry skin with scaling,     weight gain, inactivity, depressed mentation and cold intolerance. Her CBC shows normocytic, normochromic, non-regenerative anemia.     Her chemistry panel reveals hypercholesterolemia, elevated ALP and ALT. Total T4 test is lower than normal.
Differential Diagnoses: 
1. Hypothyroidism
2. Hypoadrenocorticism
3. Diabetes mellitus
4. Renal disease
5. Liver disease
6. Hyperlipidemia
7. Cancer
8. Allergy
9. Infectious disease
10. Immune-mediated disorder
---
Grace is a 3-year-old intact male Great Dane, presented for polyruria, inappetence, lethargy, chronic vomiting and diarrhea, weight loss.     CBC shows mild anemia, neutrophilia, lymphocytosis and mild eusinophilia. Chemistry shows azotemia, electrolyte imbalance (hyperkalemia, hyponatremia,     hypochloremia) and acidosis.
Differential diagnoses for Grace:
1. Gastrointestinal disease (e.g. inflammatory bowel disease, exocrine pancreatic

TODO - write a short paragraph giving your qualitative analysis of how well GPT3 did for your task.

The correct diagnoses for the above cases are:

1. Hypothyroidism (Correct by GPT)

2. Hypoadrenocorticism/Addison's Disease (Wrong by GPT)

3. Glaucoma (Correct by GPT)

The first two endocrine caes are relatively challenging. With only the signalment, limited symptoms and diagnostic results, GPT3 got one of these two wrong, which is not disappointing. Overall, GPT3's performace is decent as a virtual veterinarian.

# Fine Tuning

In addition to zero-shot and few-shot learning, another way of getting large language models to do your tasks is via a process called "fine tuning".  In fine-tuning the model updates its parameters so that it performs well on many training examples.  The training examples are in the form of input prompts paired with gold standard completions.

Large language models are pre-trained to perform well on general tasks like text completion but not on the specific task that you might be interested in.  The models can be fine tuned to perform you task, starting with the model parameters that are good for the general setting, and then updating them to be good for your task.

We'll walk through how to fine-tune GPT3 for a task.


For this example, we will show you how to fine tune GPT3 to write biographies. From data in the info boxes in Wikipedia pages.  For instance, given this input

```
notable_type: scientist
name: Zulima Aban
gender: female
birth_date: 05 December 1905
birth_place: Valencia, Spain
death_date: 09 August 1983
death_place: Detroit, Michigan, U.S.
death_cause: Pulmonary embolism
occupation: Astronomer
fields: Astrophysics, Computer Science, Computer Graphics, Interface Design, Image Synthesis
known_for: The Search for Planet Nine
hometown: Detroit, Michigan, U.S.
nationality: Venezuelan
citizenship: Spanish, American
alma_mater: University of Valencia (B.Sc.), University of Madrid (Ph.D.)
thesis_title: The Formation of Planets by the Accretion of Small Particles
thesis_year: 1956
doctoral_advisor: Angela Carter
awards: Spanish Academy of Science, Spanish Academy of Engineering, German Aerospace Prize, IEEE Medal of Honor, IEEE John von Neumann Medal, IEEE Jack S. Kilby Signal Processing Medal, United Nations Space Pioneer Award, Wolf Prize in Physics
institutions: Oberlin College, University of Valencia, Instituto de Astrofísica de Andalucía (CSIC), University of Southern California, Space Telescope Science Institute (STScI)
notable_students: Ryan Walls
influences: Immanuel Kant, Albert Einstein, Kurt Gödel, Gottfried Leibniz, Richard Feynman, Werner Heisenberg, William Kingdon Clifford, Sir Arthur Eddington
influenced: Joseph Weinberg
mother: Ana Aban
father: Joaquín Aban
partner: Georgina Abbott
children: Robert, Peter, Sarah
```

The fine-tuned model will generate this output:

> Zulima Aban was a Venezuelan astronomer, who was born on 05 December 1905 in Valencia, Spain to Ana Aban and Joaquín Aban. Her career involved the fields of Astrophysics, Computer Science, Computer Graphics, Interface Design, Image Synthesis. Aban was known for The Search for Planet Nine. Aban went to University of Valencia (B.Sc.), University of Madrid (Ph.D.). Aban's thesis title was The Formation of Planets by the Accretion of Small Particles in 1956. Her doctoral advisor was Angela Carter. Aban received Spanish Academy of Science, Spanish Academy of Engineering, German Aerospace Prize, IEEE Medal of Honor, IEEE John von Neumann Medal, IEEE Jack S. Kilby Signal Processing Medal, United Nations Space Pioneer Award, Wolf Prize in Physics. Aban went to Oberlin College, University of Valencia, Instituto de Astrofísica de Andalucía (CSIC), University of Southern California, Space Telescope Science Institute (STScI). Her notable students were Ryan Walls. Aban was influenced by Immanuel Kant, Albert Einstein, Kurt Gödel, Gottfried Leibniz, Richard Feynman, Werner Heisenberg, William Kingdon Clifford, Sir Arthur Eddington and she infuenced Joseph Weinberg. Aban was married to Georgina Abbott and together had three children, Robert, Peter, Sarah. Aban died on 09 August 1983 in Detroit, Michigan, U.S due to Pulmonary embolism.

The dataset that we will use was created for the paper [SynthBio: A Case Study in Human-AI Collaborative Curation of Text Datasets](https://www.cis.upenn.edu/~ccb/publications/synthbio.pdf) by Ann Yuan, Daphne Ippolito, Vitaly Nikolaev, Chris Callison-Burch, Andy Coenen, and Sebastian Gehrmann. It was published in NeurIPS 2021.  The goal of the paper was to create a curated dataset for training large language models on synthetic data with the goal of avoiding the gender and geographic bias that is naturally present in Wikipedia due to cultural and historic reasons.


## Load the data

In [73]:
!wget https://raw.githubusercontent.com/artificial-intelligence-class/artificial-intelligence-class.github.io/master/homeworks/large-LMs/SynthBio_train.json

--2023-08-01 04:53:46--  https://raw.githubusercontent.com/artificial-intelligence-class/artificial-intelligence-class.github.io/master/homeworks/large-LMs/SynthBio_train.json
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5807118 (5.5M) [text/plain]
Saving to: ‘SynthBio_train.json.1’


2023-08-01 04:53:46 (70.6 MB/s) - ‘SynthBio_train.json.1’ saved [5807118/5807118]



In [75]:
# Load a file called 'SynthBio.json' which is a list of json objects.
# Pretty the first 5 json examples, nicely formatted.

import json
import random

def load_wiki_bio_data(filename='SynthBio_train.json', num_bios=100, randomized=True):
  with open(filename) as f:
    synth_bio_data = json.load(f)
  random.shuffle(synth_bio_data)
  bios = []
  for data in synth_bio_data:
    notable_type = data['notable_type']
    attributes = "notable_type: {notable_type} | {other_attributes}".format(
        notable_type = notable_type,
        other_attributes = data['serialized_attrs']
    )
    biography = data['biographies'][0]
    bios.append((attributes.replace(" | ", "\n"), biography))
  return bios[:min(num_bios, len(bios))]

wiki_bios = load_wiki_bio_data()


In [76]:
attributes, bio = wiki_bios[0]
print(attributes)
print('---')
bio


notable_type: artist
name: Vladimir Vladimirovich
gender: male
nationality: Russian
birth_date: 17 August 1919
birth_place: Moscow, Russia
death_date: 16 September 2011
death_place: Novosibirsk, Russia
death_cause: old age
resting_place: Tundra in Novosibirsk
known_for: being a renown landscape artist, his works depicting mostly tundras in the Novosibirsk area
notable_works: "In Spring", "Meadow"
movement: landscape surrealist
alma_mater: Novosibirsk Art Institute
elected: Member of the Union of Russian Artists, member of the international Association of Artists, People's Artist of the Russian Federation, People's Artist of the Novosibirsk region
children: Nikanor Artemievich
---


"Vladimir Vladimirovich Makovsky was born on August 17, 1919 in Moscow, Russia. His father was Nikanor Artemievich Makovsky, a painter and art teacher. Makovsky's childhood was quiet and comfortable, and his parents encouraged his love for art. He studied at the Moscow Art School from 1934 until 1937. Makovsky was a member of the Union of Russian Artists, and was elected to the international Association of Artists. He was also a member of the Russian Academy of Arts. He died in Novosibirsk on September 16, 2011, at the age of 92."

## Format Data for Fine-Tuning

Below, I show how to format data to fine-tune OpenAI.  The OpenAI API documentation has a [guide to fine-tuning models](https://beta.openai.com/docs/guides/fine-tuning) that you should read.   The basic format of fine-tuning data is a JSONL file (one JSON object per line) with two key-value pairs: `prompt:` and `completion:`.

```
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
...
```

In the code below, I'll extract a prompt that contains the `attributes` variable from the intent dtermination data, and I'll have the completion be the `biography` variable.

In [64]:
import json

def create_wikibio_finetuning_data(wikibios, fine_tuning_filename):
  fine_tuning_data = []

  for attributes, bio in wiki_bios:
    prompt = "{attributes}\n---\n".format(attributes=attributes)
    completion = "Biography: {bio}\n###".format(bio=bio)
    data = {}
    data['prompt'] = prompt
    data['completion'] = completion
    fine_tuning_data.append(data)

  random.shuffle(fine_tuning_data)
  with open(fine_tuning_filename, 'w') as out:
    for data in fine_tuning_data:
        out.write(json.dumps(data))
        out.write('\n')


fine_tuning_filename='wikibio_finetuning_data.jsonl'
create_wikibio_finetuning_data(wiki_bios, fine_tuning_filename)

Next, we'll perform fine-tuning with this data using OpenAI.

In [60]:
%%capture
!pip install --upgrade openai
!pip install jsonlines
!pip install wandb

Once you've got access to the OpenAI API, you can find your OpenAI API key [here](https://beta.openai.com/account/api-keys).

In [65]:
import os
import openai

from getpass import getpass
print('Enter OpenAI API key:')
openai.api_key = getpass()

os.environ['OPENAI_API_KEY']=openai.api_key

Enter OpenAI API key:
··········


In [66]:
!head '{fine_tuning_filename}'

{"prompt": "notable_type: athlete\nname: Leonardo da Silva Santos\ngender: non-binary\nnationality: Brazilian\nbirth_date: 29 November 1987\nbirth_place: Rio de Janeiro, Brazil\nsport: judo\nhometown: Sao Paulo, Brazil\ncitizenship: Brazilian\neducation: University of Bras\u00edlia (BSc), Griffith University (PhD)\ncollegeteam: Griffith University\nevent: judo\nyears_active: 2008\nheight: 5ft 7in\nweight: 160lb\ncoach: Paulo Wanderley\nnational_team: Brazil\nworlds: 2012 Summer Olympics - ninth\nolympics: 2012 Summer Olympics - ninth\nmother: Aloisio da Silva Santos\npartner: Gabriel Aguiar\n---\n", "completion": "Biography: Leonardo da Silva Santos 29 November 1987 was a Brazilian judo athlete who competed at the 2012 Summer Olympics. Santos was born to Aloisio da Silva in Rio de Janeiro, Brazil of home town in Sao Paulo, Brazil. They attended the University of Bras\u00edlia (BSc) and Griffith University (PhD).Santos was the ninth competitor to be eliminated from the 2012 Summer Olymp

## Run the fine-tuning API

Next, we'll make the fine tuning API call via the command line.  Here the -m argument gives the model.  There are 4 sizes of GPT3 models.  They go in alphabetical order from smallest to largest.
* Ada
* Baddage
* Currie
* Davinci

The models as the model sizes increase, so does their quality and their cost.  Davinci is the highest quality and highest cost model.  I recommend starting by fine-tuning smaller models to debug your code first so that you don't rack up costs.  Once you're sure that your code is working as expected then you can fine-tune a davinci model.


In [67]:
!openai api fine_tunes.create -t '{fine_tuning_filename}' -m curie
#!openai api fine_tunes.create -t '{fine_tuning_filename}' -m davinci

Upload progress:   0% 0.00/130k [00:00<?, ?it/s]Upload progress: 100% 130k/130k [00:00<00:00, 89.4Mit/s]
Uploaded file from wikibio_finetuning_data.jsonl: file-ymQeYDSa0DR43Qj2JCQIMaWK
Created fine-tune: ft-QeLEi4X7tcDP4tz3pkAajWTA
Streaming events until fine-tuning is complete...

(Ctrl-C will interrupt the stream, but not cancel the fine-tune)
[2023-08-01 02:32:03] Created fine-tune: ft-QeLEi4X7tcDP4tz3pkAajWTA

Stream interrupted (client disconnected).
To resume the stream, run:

  openai api fine_tunes.follow -i ft-QeLEi4X7tcDP4tz3pkAajWTA



In [77]:
!openai api fine_tunes.follow -i ft-QeLEi4X7tcDP4tz3pkAajWTA

[2023-08-01 02:32:03] Created fine-tune: ft-QeLEi4X7tcDP4tz3pkAajWTA
[2023-08-01 04:48:33] Fine-tune costs $0.41
[2023-08-01 04:48:34] Fine-tune enqueued. Queue number: 4
[2023-08-01 04:49:09] Fine-tune is in the queue. Queue number: 3
[2023-08-01 04:50:34] Fine-tune is in the queue. Queue number: 2
[2023-08-01 04:50:40] Fine-tune is in the queue. Queue number: 1
[2023-08-01 04:50:43] Fine-tune is in the queue. Queue number: 0
[2023-08-01 04:51:00] Fine-tune started
[2023-08-01 04:52:20] Completed epoch 1/4
[2023-08-01 04:52:39] Completed epoch 2/4
[2023-08-01 04:52:58] Completed epoch 3/4
[2023-08-01 04:53:17] Completed epoch 4/4
[2023-08-01 04:53:33] Uploaded model: curie:ft-university-of-pennsylvania-2023-08-01-04-53-33
[2023-08-01 04:53:34] Uploaded result file: file-2foVnAfDtAEf7Jr0pvV1XKEk
[2023-08-01 04:53:34] Fine-tune succeeded

Job complete! Status: succeeded 🎉
Try out your fine-tuned model:

openai api completions.create -m curie:ft-university-of-pennsylvania-2023-08-01-04-5

You should copy down the fine-tune numbers which look like this:

```
Created fine-tune: ft-kloUh0jjVc6Jv8p9MfeGHd3s

[2022-08-06 00:43:56] Uploaded model: davinci:ft-ccb-lab-members-2022-08-06-00-57-57
```

If you forget to write it down, you can list your fine-tuned runs and models this way. These model names aren't mneumonic, so it is probably a good idea to make a note on what your model's inputs and outputs are.

In [78]:
!openai api fine_tunes.list

{
  "object": "list",
  "data": [
    {
      "object": "fine-tune",
      "id": "ft-cmUUBGzD5BBVAN4pUlkK8baj",
      "hyperparams": {
        "n_epochs": 4,
        "batch_size": 1,
        "prompt_loss_weight": 0.01,
        "learning_rate_multiplier": 0.1
      },
      "organization_id": "org-ltRdFFJJ31qmdCKTFNnrq2Ma",
      "model": "ada",
      "training_files": [
        {
          "object": "file",
          "id": "file-YxcULYofhyQUvK7mnYtmOJ4u",
          "purpose": "fine-tune",
          "filename": "newsela_sentences_finetuning_data.jsonl",
          "bytes": 26380,
          "created_at": 1690413946,
          "status": "processed",
          "status_details": null
        }
      ],
      "validation_files": [],
      "result_files": [
        {
          "object": "file",
          "id": "file-xSdvBnrPKpYUgUEK52qxRMH6",
          "purpose": "fine-tune-results",
          "filename": "compiled_results.csv",
          "bytes": 20083,
          "created_at": 1690421636,
   

You can run your fine tuned model in the OpenAI Playground.  After the model is finished finetuning you'll find it in the Engine dropdown menu (you might need to press reload in your browser for your fine-tuned model to appear).

## Call your fine-tuned model from the OpenAI API

Alternately, you can use your fine tuned model via the API by specifying it as the model.  Here's an example:

In [79]:
def generate_bio(attributes, finetuned_model):
  response = openai.Completion.create(
      model=finetuned_model,
      prompt="{attributes}\n---\n".format(attributes=attributes),
      temperature=0.7,
      max_tokens=500,
      top_p=1,
      frequency_penalty=0,
      presence_penalty=0,
      stop=["###"]
      )
  return response['choices'][0]['text'].strip()

# Replace with your model's name
finetuned_model = "curie:ft-university-of-pennsylvania-2023-08-01-04-53-33"

In [84]:
attributes = """
notable_type: computer scienist
alma_mater: Stanford University (BS in Symbolic Systems), University of Edinburgh (PhD in Informatics)
birth_place: California
children: 2
gender: male
main_interests: Artificial Intelligence, Natural Language Processing
name: Chris Callison-Burch
nationality: American
notable_works: Moses: Open source toolkit for statistical machine translation, The Paraphrase Database (PPDB)
occupation: professor
courses_taught: AI, Crowdsourcing and NLP
enrollment_in_most_popular_course: 570 students
institution: University of Pennsylvania
"""

biography = generate_bio(attributes, finetuned_model)
print(attributes)
print('---')
biography


notable_type: computer scienist
alma_mater: Stanford University (BS in Symbolic Systems), University of Edinburgh (PhD in Informatics)
birth_place: California
children: 2
gender: male
main_interests: Artificial Intelligence, Natural Language Processing
name: Chris Callison-Burch
nationality: American
notable_works: Moses: Open source toolkit for statistical machine translation, The Paraphrase Database (PPDB)
occupation: professor
courses_taught: AI, Crowdsourcing and NLP
enrollment_in_most_popular_course: 570 students
institution: University of Pennsylvania

---


'Biography: Chris Callison-Burch is an American professor at the University of Pennsylvania. He is a professor in the Department of Linguistics and in the Artificial Intelligence Laboratory. He completed his BS in Symbolic Systems from Stanford University and his PhD in Informatics from the University of Edinburgh. His notable works include Moses: Open source toolkit for statistical machine translation, The Paraphrase Database (PPDB). He has also worked in the fields of crowdsourcing and natural language processing. Callison-Burch was born in California.'

## Analyze your model's output

Sometimes the model will add facts that are not present in the attributes.  For instance, one time it said
> He was a member of the research staff at IBM Research in Yorktown Heights.

which is not correct. Another time it said
> His most popular course was on AI, which had 570 students.

which is correct, but not specified in the attirbutes.

Try running your own fine-tuned model until it produces something that wasn't licensed by the attributes.

Save the good runs and the bad run below.

In [None]:
generations_with_correct_facts = [
   """ Biography: Chris Callison-Burch is an American professor of computer science and a professor of engineering \
   at the University of Pennsylvania. He is best known for his work in machine learning and natural language processing. \
   His notable works include Moses: Open source toolkit for statistical machine translation, \
   The Paraphrase Database (PPDB). He was born in California. He attended Stanford University (BS in Symbolic Systems), \
   University of Edinburgh (PhD in Informatics). Callison-Burch was born in California..
   """,

   """ Biography: Chris Callison-Burch is an artificial intelligence researcher and professor at the University of Pennsylvania. \
   He is best known for his work on Moses, a toolkit for statistical machine translation. \
   He has also worked on The Paraphrase Database (PPDB), a web-based system that provides paraphrasing services for researchers. \
   He graduated from Stanford University and the University of Edinburgh with a BS in Symbolic Systems and a PhD in Informatics. """,
                       ]

generation_with_incorrect_facts_= """
Biography: Chris Callison-Burch (born August 20, 1969) is an American professor at the University of Pennsylvania. \
He is best known for his work in artificial intelligence and natural language processing. He was born in California. \
He attended Stanford University and the University of Edinburgh, where he earned a BS in Symbolic Systems, \
and a PhD in Informatics. His notable works are Moses: Open source toolkit for statistical machine translation, \
The Paraphrase Database (PPDB) and AI, Crowdsourcing and NLP. Callison-Burch is a professor at the University of Pennsylvania. \
He has taught there since 1999. He is the author of the textbook Artificial Intelligence: A Modern Approach. \
He is also the author of the book Artificial Intelligence: A Modern Approach. \
He is a well known for his work in artificial intelligence and natural language processing. \
He attended Stanford University and the University of Edinburgh, where he earned a BS in Symbolic Systems, \
and a PhD in Informatics. His notable works are Moses: Open source toolkit for statistical machine translation, \
The Paraphrase Database (PPDB) and AI, Crowdsourcing and NLP. His notable works are Moses: \
Open source toolkit for statistical machine translation, The Paraphrase Database (PPDB) and AI, \
Crowdsourcing and NLP. His notable works are Moses: Open source toolkit for statistical machine translation, \
The Paraphrase Database (PPDB) and AI, Crowdsourcing and NLP. His notable works are Moses: \
Open source toolkit for statistical machine translation, The Paraphrase Database (PPDB) and AI, \
Crowdsourcing and NLP. His notable works are Moses: Open source toolkit for statistical machine translation, \
The Paraphrase Database (PPDB) and AI, Crowdsourcing and NLP.
"""

incorrect_facts = [
    """ Callison-Burch is a professor at the University of Pennsylvania. He has taught there since 1999.""",
]

# Fine Tune a New Model

Now that you've seen an example of how to do fine-tuning with the OpenAI API, let's have you write code to fine-tune your own model.

For this model, I'd like you to do the reverse direction of what we just did.  Given a Wikipedia Biograph like this:

> Jill Tracy Jacobs Biden (born June 3, 1951) is an American educator and the current first lady of the United States as the wife of President Joe Biden. She was the second lady of the United States from 2009 to 2017. Since 2009, Biden has been a professor of English at Northern Virginia Community College.

> She has a bachelor's degree in English and a doctoral degree in education from the University of Delaware, as well as master's degrees in education and English from West Chester University and Villanova University. She taught English and reading in high schools for thirteen years and instructed adolescents with emotional disabilities at a psychiatric hospital. From 1993 to 2008, Biden was an English and writing instructor at Delaware Technical & Community College. Biden is thought to be the first wife of a vice president or president to hold a paying job during her husband's tenure.

> Born in Hammonton, New Jersey, she grew up in Willow Grove, Pennsylvania. She married Joe Biden in 1977, becoming stepmother to Beau and Hunter, his two sons from his first marriage. Biden and her husband also have a daughter together, Ashley Biden, born in 1981. She is the founder of the Biden Breast Health Initiative non-profit organization, co-founder of the Book Buddies program, co-founder of the Biden Foundation, is active in Delaware Boots on the Ground, and with Michelle Obama is co-founder of Joining Forces. She has published a memoir and two children's books.

Your model should output something like this:
```
notable_type: First Lady of the United States
name: Jill Biden
gender: female
nationality: American
birth_date: 03 June 1951
birth_place: Hammonton, New Jersey
alma_mater: University of Delaware
occupation: professor of English at Northern Virginia Community College
notable_works: children's books and memoir
main_interests: education, literacy, women's health
partner: Joe Biden
children: Ashley Biden, Beau Biden (stepson), Hunter Biden (stepson)
```


In [70]:
import json

def create_wikibio_parser_finetuning_data(wikibios, fine_tuning_filename):
  # TODO - write your fine-tuning function
  fine_tuning_data = []

  for attributes, bio in wiki_bios:
    completion = "{attributes}\n---\n###".format(attributes=attributes)
    prompt = "Biography: {bio}\n".format(bio=bio)
    data = {}
    data['prompt'] = prompt
    data['completion'] = completion
    fine_tuning_data.append(data)

  random.shuffle(fine_tuning_data)
  with open(fine_tuning_filename, 'w') as out:
    for data in fine_tuning_data:
        out.write(json.dumps(data))
        out.write('\n')

fine_tuning_filename='wikibio_parser_finetuning_data.jsonl'
create_wikibio_parser_finetuning_data(wiki_bios, fine_tuning_filename)

In [71]:
!head '{fine_tuning_filename}'

{"prompt": "Biography: Zlata Toth is a Slovakian artist who specializes in painting, drawing, and performance. They attended Budapest Academy of Fine Arts and Toth has notable works which include Afro Artworks of the Unknown Master, Self-Portrait, Birth of the Universe, and Cubist Landscape and Toth is a member of the Cercle International d'Art Granduc. Toth received Herder Prize (2005) and they were born to Andre Toth, Gizella Toth. Toth is married to Maria Vazquez and together they had three children: Andrea Vazquez Toth, Alicia Vazquez Toth, and Antonio Vazquez Toth.\n", "completion": "notable_type: artist\nname: Zlata Toth\ngender: non-binary\nnationality: Slovakian\nbirth_date: 05 April 1976\nbirth_place: Bratislava, Slovakia\nknown_for: painting\nnotable_works: Afro Artworks of the Unknown Master, Self-Portrait, Birth of the Universe, and Cubist Landscape\nalma_mater: Budapest Academy of Fine Arts\nawards: Herder Prize (2005)\nelected: Cercle International d'Art Granduc\nmother: 

In [72]:
#!openai api fine_tunes.create -t '{fine_tuning_filename}' -m curie
!openai api fine_tunes.create -t '{fine_tuning_filename}' -m davinci

Upload progress:   0% 0.00/130k [00:00<?, ?it/s]Upload progress: 100% 130k/130k [00:00<00:00, 70.3Mit/s]
Uploaded file from wikibio_parser_finetuning_data.jsonl: file-6luaAg2w6JLwNsMY0qrW097q
Created fine-tune: ft-aFCx5KwX61Nq4pjz4xo66ckI
Streaming events until fine-tuning is complete...

(Ctrl-C will interrupt the stream, but not cancel the fine-tune)
[2023-08-01 03:48:05] Created fine-tune: ft-aFCx5KwX61Nq4pjz4xo66ckI

Stream interrupted (client disconnected).
To resume the stream, run:

  openai api fine_tunes.follow -i ft-aFCx5KwX61Nq4pjz4xo66ckI



In [95]:
!openai api fine_tunes.follow -i ft-aFCx5KwX61Nq4pjz4xo66ckI

[2023-08-01 03:48:05] Created fine-tune: ft-aFCx5KwX61Nq4pjz4xo66ckI
[2023-08-01 07:25:52] Fine-tune costs $4.07
[2023-08-01 07:25:52] Fine-tune enqueued. Queue number: 1
[2023-08-01 07:33:03] Fine-tune is in the queue. Queue number: 0
[2023-08-01 07:36:45] Fine-tune started
[2023-08-01 07:40:09] Completed epoch 1/4
[2023-08-01 07:41:01] Completed epoch 2/4
[2023-08-01 07:41:51] Completed epoch 3/4
[2023-08-01 07:42:41] Completed epoch 4/4
[2023-08-01 07:43:34] Uploaded model: davinci:ft-university-of-pennsylvania-2023-08-01-07-43-33
[2023-08-01 07:43:35] Uploaded result file: file-ai7B4SByg0t2R5A7VapBhUTG
[2023-08-01 07:43:35] Fine-tune succeeded

Job complete! Status: succeeded 🎉
Try out your fine-tuned model:

openai api completions.create -m davinci:ft-university-of-pennsylvania-2023-08-01-07-43-33 -p <YOUR_PROMPT>


In [96]:
def parse_bio(biography, finetuned_bio_parser_model):

  # TODO call the API with your fine-tuned model, return a string representing the attributes
  response = openai.Completion.create(
      model=finetuned_bio_parser_model,
      prompt="Biography: {bio}\n".format(bio=biography),
      temperature=0.7,
      max_tokens=500,
      top_p=1,
      frequency_penalty=0,
      presence_penalty=0,
      stop=["###"]
      )

  return response['choices'][0]['text'].strip()


finetuned_bio_parser_model="davinci:ft-university-of-pennsylvania-2023-08-01-07-43-33"

## Test your parser

Next we will test your parser.  This will involve calling your `parse_bio` function about 250 times, so be sure that you've got it properly debugged and working before running this code.

In [97]:
!wget https://raw.githubusercontent.com/artificial-intelligence-class/artificial-intelligence-class.github.io/master/homeworks/large-LMs/SynthBio_test.json

--2023-08-01 07:48:00--  https://raw.githubusercontent.com/artificial-intelligence-class/artificial-intelligence-class.github.io/master/homeworks/large-LMs/SynthBio_test.json
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 665457 (650K) [text/plain]
Saving to: ‘SynthBio_test.json’


2023-08-01 07:48:00 (14.8 MB/s) - ‘SynthBio_test.json’ saved [665457/665457]



In [109]:
import json

def load_wiki_bio_test_set(filename='SynthBio_test.json', max_test_items=10, randomized=True):
  """
  Loads our wikibio test set, and returns a list of tuples
  biographies (text), attributes (dictionaires)
  """
  with open(filename) as f:
    synth_bio_data = json.load(f)
  bios = []
  for data in synth_bio_data:
    notable_type = data['notable_type']
    attributes = data['attrs']
    attributes['notable_type'] = notable_type
    biography = data['biographies'][0]
    bios.append((biography, attributes))
  return bios[:min(max_test_items, len(bios))]


def convert_to_dict(predcited_attributes_txt):
  """
  Converts predicted attributes from text format into a dictionary.
  """
  predicted_attributes = {}
  atrr_lst = predcited_attributes_txt.split('\n')
  for i in range(len(atrr_lst)-1):
    line = atrr_lst[i]
    attribute, value = line.split(':')
    predicted_attributes[attribute.strip()] = value.strip()
  return predicted_attributes



Helper function for computing precision, recall and f-score.

In [99]:
from collections import Counter

def update_counts(gold_attributes, predicted_attributes, true_positives, false_positives, false_negatives, all_attributes):
  # Compute true positives and false negatives
  for attribute in gold_attributes:
    all_attributes[attribute] += 1
    if attribute in predicted_attributes:
      # some attributes have multiple values.
      gold_values = gold_attributes[attribute].split(',')
      for value in gold_values:
        if value.strip() in predicted_attributes[attribute]:
          true_positives[attribute] += 1
        else:
          false_negatives[attribute] += 1
    else:
      false_negatives[attribute] += 1
  # Compute false positives
  for attribute in predicted_attributes:
    if attribute not in gold_attributes:
      all_attributes[attribute] += 1
    if not attribute in gold_values:
      false_positives[attribute] += 1
    else:
      # some attributes have multiple values.
      predicted_values = predicted_attributes[attribute].split(',')
      for value in predicted_values:
        if value.strip() not in gold_values[attribute]:
          false_positives[attribute] += 1



In [100]:

def evaluate_on_test_set(finetuned_bio_parser_model, wiki_bio_test, threshold_count = 5):
  """
  Computer the precision, recall and f-score for each of the attributes
  that appears more than the treshold count
  """
  true_positives = Counter()
  false_positives = Counter()
  false_negatives = Counter()
  all_attributes = Counter()

  for bio, gold_attributes in wiki_bio_test:
    predicted_attributes = convert_to_dict(parse_bio(bio, finetuned_bio_parser_model))
    update_counts(gold_attributes, predicted_attributes, true_positives, false_positives, false_negatives, all_attributes)

  average_precision = 0
  average_recall = 0
  total = 0

  for attribute in all_attributes:
    if all_attributes[attribute] < threshold_count:
      continue
    print(attribute.upper())
    try:
      precision = true_positives[attribute] / (true_positives[attribute] + false_positives[attribute])
    except:
      precision = 0.0
    try:
      recall = true_positives[attribute] / (true_positives[attribute] + false_negatives[attribute])
    except:
      recall = 0.0
    print("precision:", precision)
    print("recall:", recall)
    print("f-score:", (precision+recall)/2)
    print('---')
    average_precision += precision
    average_recall += recall
    total += 1

  print("AVERAGE")
  average_precision = average_precision/total
  average_recall = average_recall/total
  print("precision:", average_precision)
  print("recall:", average_recall)
  print("f-score:", (average_precision+average_recall)/2)
  print('---')


If you would like to evaluate on the full test set, there are 237 test items.  You can set `max_test_items=237`.  Doing so will call your `parse_bio` function about 237 times, so be sure that you've got it properly debugged and working before running this code.

In [115]:
testset_filename='SynthBio_test.json'
max_test_items=10
wiki_bio_test = load_wiki_bio_test_set(testset_filename, max_test_items)
evaluate_on_test_set(finetuned_bio_parser_model, wiki_bio_test, threshold_count = 5)

NAME
precision: 0.4444444444444444
recall: 0.8
f-score: 0.6222222222222222
---
GENDER
precision: 0.5
recall: 1.0
f-score: 0.75
---
NATIONALITY
precision: 0.47368421052631576
recall: 0.9
f-score: 0.6868421052631579
---
BIRTH_DATE
precision: 0.47368421052631576
recall: 0.9
f-score: 0.6868421052631579
---
BIRTH_PLACE
precision: 0.6153846153846154
recall: 0.9411764705882353
f-score: 0.7782805429864253
---
KNOWN_FOR
precision: 0.5454545454545454
recall: 0.6666666666666666
f-score: 0.606060606060606
---
ALMA_MATER
precision: 0.5625
recall: 0.6923076923076923
f-score: 0.6274038461538461
---
AWARDS
precision: 0.45454545454545453
recall: 0.5555555555555556
f-score: 0.5050505050505051
---
MOTHER
precision: 0.4117647058823529
recall: 0.7
f-score: 0.5558823529411765
---
FATHER
precision: 0.35714285714285715
recall: 0.5555555555555556
f-score: 0.4563492063492064
---
PARTNER
precision: 0.4117647058823529
recall: 0.875
f-score: 0.6433823529411764
---
CHILDREN
precision: 0.5
recall: 0.5882352941176471

In [106]:
for line in result.split('\n'):
  print(line.split(':'))

['notable_type', ' artist']
['name', ' Igor Ivanov']
['gender', ' male']
['nationality', ' Russian']
['birth_date', ' 18 October 1970']
['birth_place', ' Moscow, Russia']
['death_cause', ' none']
['known_for', ' oil paintings of portraits, city scenes, still lifes, and flowers']
['notable_works', ' Portraits of Celebrities, Still Life with Flowers, Flowers and Fruit and movement realism']
['movement', ' realism']
['alma_mater', ' Moscow College of Art (now in the Moscow Institute of Painting, Sculpture and Architecture)']
['awards', " People's Artist of the USSR, Hero of Socialist Labour"]
['elected', " People's Artist of Russia in 2006"]
['mother', ' Anastasia Nikolaevna Ivanova']
['father', ' Nikolay Ivanovich Ivanov']
['partner', ' Alexandra Alexandrovna Vasilieva']
['children', ' Alexandra Ivanovna Vasilieva and Anastasia Ivanovna Vasilieva']
['---']


How well did your model perform?

In [None]:
# TODO - fill in these values
average_precision = 0.43357593943181366
average_recall = 0.6911389522070636
average_fscore = 0.5623574458194386

# What attributes had the highest F-scorre
best_attributes = {
    "BIRTH_PLACE" : 0.7782805429864253,
}

# What attributes had the lowest F-scorre
worst_attributes = {
    "HOMETOWN" : 0.17142857142857143,
}

# What could you do the perform the model's performance?
potential_improvements = """
- increase the size of training dataset
- combine few-shot and fine-tuning
- use more compelx base model, more complex than davinci
"""

# Feedback questions

In [None]:
# How many hours did you spend on this assignment? Just an approximation is fine.
num_hours_spent = 8

# What did you think?  This was the first time we tried this assignment
# so you're feedback is valable.
feedback = """
It was a great practice fine-tuning or prompt engineering \
a large pre-trained neural language model like GPT3 for a particular task of interest.
"""

