# Scaling UP: Large Language Models
## Hello 2023


Finally! Let's turn to recent events, the advent of **Large Language Models (LLMs)**.

![finally](https://media.giphy.com/media/hZj44bR9FVI3K/giphy.gif)

Most of the materials in this Notebook are based on:
- a recent [survey article](https://arxiv.org/abs/2303.18223) on Large Language Models
- the [Stanford Course CS324](https://stanford-cs324.github.io/winter2023/assignment/) on Advances in Foundation Models

## Focus 
- Context, large, larger, largest? (Theory)
- Interacting with LLMs (Practical); How to talk LLM?
- Accessing LLMs (Practical)

## What's wrong with these "baby" PLMs?

- Often fail to understand instructions (GPT-2)
- Often don't generalise beyond specific training data and tasks (strong limitations)
- Risk of overfitting and learning spurious relations (i.e. learn shortcuts or particularities about the training data)

## LLMs are general-purpose language task solvers

####  Why is this so exciting?
Imagine you want to automatically classify documents by emotion (i.e. P( positive | text)) or a translation system
- **Pre-LLM**: machine learning models (based on PLMs) are **task-specific**
    - get training data (annotations)
    - train a model that **only** performs well on this task with these data (strong limitations)
    - overfitting to the data (not learning the concept of emotion)
    - **maintain model**
       
- **LLM Age**: Design and evaluate prompt (to be discussed later)

### In short LLMs are interesting because we might need fewer data points to create models or systems that work well!


## Large Language Models
- Scaling pretrained language models improves **performance***
- Scaling refers to increasing **model size**, **data** and **compute**

 <img src="https://s10251.pcdn.co/wp-content/uploads/2023/03/2023-Alan-D-Thompson-AI-Bubbles-Rev-7b.png" alt="model_size" width="500">

*performance on tasks the ML/NLP cares about ("benchmarking")

### Scaling leads to qualitatively different models

Three differences between PLMs and LLMs (from the survey paper):
- LLMs **might** display emergent abilities that are not observed in smaller PLMs.
- LLMs would revolutionize the way we use AI algorithms: prompting, i.e. formulate a task so that LLMs can "understand" or at least follow
- "Development of LLMs no longer draws a clear distinction between research and engineering."

# The LLM workflow: Prompting

### "Emergent" Abilities

Question: 
- How predictable is the behaviour of LLMs? Can we predict the improvements of these models as a function of parameters/data/compute?
- Or does scaling up lead to qualitatively different models? 

[Scaling Law](https://arxiv.org/abs/2001.08361)
- with respect to some tasks such as language modelling, LLMs tend to behave in predictable ways

However, other research pointed out that some abilities are not present in PLMs, but unexpectably "emerge" in LLMs. 
- A notion taken from Physics: "Emergence is when quantitative changes in a system result in qualitative changes in behavior." (Anderson, 1972)
- Which abilities are we referring to:
    - In-Context Learning (zero or few-shot classification): LLMs can classify data based solely on natural language description or task demonstration.
    - Instruction-following: LLMs can handle news tasks described as instruction in natural language
    - Step-by-step reasoning: LLMs follow intermediate reasoning steps in the process of answering a question

But a topic of ongoing discussion... emergent abilities a ["mirage"](https://arxiv.org/abs/2304.15004). The paper disputes the following claims in relation to the 'emergence'

1. Sharpness, transitioning seemingly instantaneously from not present to present
2. Unpredictability, transitioning at seemingly unforeseeable model scales

Debate now also has an ideological dimension. Visualisation taken from [Washington Post](https://www.washingtonpost.com/technology/2023/04/09/ai-safety-openai/) article
<img src="https://www.washingtonpost.com/wp-apps/imrs.php?src=https://arc-anglerfish-washpost-prod-washpost.s3.amazonaws.com/public/44S26VMACJD2PBYDP3ODHIABWM.jpg&w=1440&impolicy=high_res" alt="ai_debate" width="500">


## Zero and few-shot Learning

Both emergent abilities are examples of "in-context" learning.

Taken from the Stanford course:

> “In zero-shot prompting, an instruction for the task is usually specified in natural language. The model is expected to following the specification and output a correct response, without any examples (hence “zero shots”).

> In few-shot prompting, we provide a few examples in the prompt, optionally including task instructions as well (all as natural language). Even without said instructions, our hope is that the LLM can use the examples to autoregressively complete what comes next to solve the desired task.”


In [None]:
# A zero-shot prompt

prompt = f"""Classify the following movie review as positive or negative

Review: I really love this movie
Sentiment:"""

print(prompt)

In [None]:
# A few-shot prompt

prompt = f"""Classify the following movie review as positive or negative

Review: The movie was horrible
Sentiment: Negative

Review: The movie was the best movie I have watched all year!!!
Sentiment: Positive

Review: An awful film!
Sentiment:"""

print(prompt)

## Accessing LLMs: download checkpoints or access API
- A rich 'ecology' of LLMs
- Should LLMs be open-source (interesting recent paper in Nature [paper](https://www.nature.com/articles/d41586-023-01295-4))
- Difference between 'checkpoint' and 'API' access:
     - Checkpoint: download the model and do with it whatever you want (retrain, adapt, destroy). (NB: If you have the computing power)
     - API access: query the model but you can not download or adapt it (unless you pay OpenAI, but still you won't get to "see" the model)

# Checkpoint

Hugging Face and [BLOOM](https://huggingface.co/docs/transformers/model_doc/bloom)

![bloom](https://assets.website-files.com/6139f3cdcbbff3a68486761d/62cce3c835539c54f31329b1_image1.png)

From the webpage:
"Large language models (LLMs) have made a significant impact on AI research. These powerful, general models can take on a wide variety of new language tasks from a user’s instructions. However, academia, nonprofits and smaller companies' research labs find it difficult to create, study, or even use LLMs as only a few industrial labs with the necessary resources and exclusive rights can fully access them. Today, we release BLOOM, the first multilingual LLM trained in complete transparency, to change this status quo — the result of the largest collaboration of AI researchers ever involved in a single research project."

BLOOM is one the many open-source LLM, for an overview on the state-of-the-art, you can peruse the Hugging Face LLM [leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).

In theory we could download the 176B parameters model. However, Colab will refuse to load this. For this reason, we will use smaller models as example. 

Again, working with LLMs requires new engineering skills and $$ which few (including yours truly have)

#### IMPORTANT: CHANGE RUNTIME

In [None]:
%%bash
pip install transformers torch datasets  accelerate bitsandbytes xformers

In [None]:
import transformers
import torch
from datasets import load_dataset
from transformers import pipeline

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM

device = "cuda:0" if torch.cuda.is_available() else "cpu" # use the GPU goodies if available

model_name = "bigscience/bloom-1b1" # define the checkpoint
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", load_in_8bit=True)  # load the 1 billion bloom model

tokenizer = AutoTokenizer.from_pretrained(model_name)

In [None]:
# A zero-shot prompt

prompt = f"""Classify the following movie review as positive or negative

Review: I really love this movie
Sentiment:"""

print(prompt)

In [None]:
# Feed prompt to model to generate an output
generator = pipeline(task="text-generation", model=model, tokenizer=tokenizer)
output = generator(prompt, max_new_tokens=20)
print(output[0]['generated_text'])

In [None]:
# A zero-shot prompt

prompt = f"""Translate from English to French

hello
"""

output = generator(prompt, max_new_tokens=5)
print(output[0]['generated_text'])

Adding examples usually improves the performance.

In [None]:
# A few-shot prompt
sample_review = 'An awful film!'


prompt = f"""Review: The movie was horrible
Sentiment: Negative

Review: The movie was the best movie I have watched all year!!!
Sentiment: Positive

Review: {sample_review}
Sentiment:"""

print(prompt)

In [None]:
# Feed prompt to model to generate an output
output = generator(prompt, max_new_tokens=1)
print(output[0]['generated_text'])

### Exercise

- Change the `sample_review` variable in the preceding cell to a positive review
- Write a bit more nuanced review, what happens
- Add more examples, does this change the model's behaviour (in positive way)?

# A more difficult task: The Living Machine

In [None]:
target_sentence = "When the ***machine*** has been let down into the sea, and the coral is thought sufficiently"
prompt = f"""We want to know if the word ***machine*** in the following sentences is animate.
With animacy we mean the property of being alive

Sentence: Immured in a convent, debarred from life-giving air and light, and the beauty of life, we cease to be living, feeling, thinking girls and women, we become mere ***machines*** who blindly obey the head that directs us.'
Animacy: Animate

Sentence: Now that we were free from all fear of encountering bad cha racters in the house, the boom-boom of the little man's big voice went on unintermittingly, like a ***machine*** at work in the neigh bourhood
Animacy: Animate

Sentence: He led his ***machine*** to the side of thi_ footpath. 
Animacy: Inanimante

Sentence: The drawing shows the ***machine*** ready to begin its forward stroke.'
Animacy: Inanimante

Sentence: {target_sentence}
Animacy: 
"""

print(prompt)

In [None]:
# Feed prompt to model to generate an output
output = generator(prompt, max_new_tokens=2)
print(output[0]['generated_text'])

# API: Accessing OpenAI's GPT-3

Working with GPT-3 (and larger) models through Python. 
We use Python but there is a simple GUI [here](https://platform.openai.com/playground).



In [1]:
%%bash
pip install openai
touch openai.txt
ls -la

total 1792
drwxr-xr-x  31 kasparbeelen  staff     992  6 Jul 11:53 [34m.[m[m
drwxr-xr-x@  9 kasparbeelen  staff     288  6 Jul 11:37 [34m..[m[m
-rw-r--r--@  1 kasparbeelen  staff    6148  6 Jul 11:24 .DS_Store
drwxr-xr-x  23 kasparbeelen  staff     736  4 Jul 16:32 [34m.ipynb_checkpoints[m[m
-rw-r--r--   1 kasparbeelen  staff    5417  6 Jul 11:37 1a-Intro-to-Python.ipynb
-rw-r--r--   1 kasparbeelen  staff   28666 29 Jun 09:02 1b-Intro-to-Python.ipynb
-rw-r--r--   1 kasparbeelen  staff   23058  6 Jul 11:37 1c-Intro-to-Python.ipynb
-rw-r--r--   1 kasparbeelen  staff    8240  6 Jul 11:37 2-opening-files.ipynb
-rw-r--r--   1 kasparbeelen  staff   36767 29 Jun 09:02 2a-Basic-text-processing.ipynb
-rw-r--r--   1 kasparbeelen  staff    7787  6 Jul 11:37 2b-Regular-expressions.ipynb
-rw-r--r--   1 kasparbeelen  staff    9348  6 Jul 11:37 2c-Text-processing-exercises.ipynb
-rw-r--r--   1 kasparbeelen  staff   29321  6 Jul 11:37 2d-Lists-sets-and-tuples.ipynb
-rw-r--r--   1 kasparbeelen

**TO DO**: 
- Go to the [OpenAI developer page](https://openai.com/blog/openai-api), register for an account.
- Create and [OpenAI API key](https://platform.openai.com/account/api-keys) and put it in `openai.txt`.

Full documentation is available [here](https://platform.openai.com/docs/api-reference/completions/create).


In [5]:
# Hey GPT-3 how can I ask a question to
import openai

# Set up your OpenAI API credentials
openai.api_key = open('openai.txt','r').read()


## Completion

[API reference](https://platform.openai.com/docs/api-reference/completions/create)

In [6]:

prompt = "What is the capital of France?"

In [7]:
response = openai.Completion.create(
  model="text-davinci-003",
  prompt= prompt,
  temperature=0,
  max_tokens=7)
response

<OpenAIObject text_completion id=cmpl-7ZGug9SEnu4pan5FnlcFXqTqTgXVY at 0x109755680> JSON: {
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "text": "\n\nParis."
    }
  ],
  "created": 1688640850,
  "id": "cmpl-7ZGug9SEnu4pan5FnlcFXqTqTgXVY",
  "model": "text-davinci-003",
  "object": "text_completion",
  "usage": {
    "completion_tokens": 4,
    "prompt_tokens": 7,
    "total_tokens": 11
  }
}

In [8]:
response['choices'][0]['text']

'\n\nParis.'

### Exercise:

Increase temperature and see what happens?

In [9]:


# Define the function to ask a question
def ask_question(prompt):
    
    # Generate a response from GPT-3
    response = openai.Completion.create(
        engine='text-davinci-003', # Select the model you want to use
        prompt=prompt,  # Your query as a prompt
        max_tokens=50,  # Adjust the max tokens according to your needs
        n=1, # Number of completions to generate
        stop=None, # 
        temperature=0.0 # Regulate the LLM creativity. Lower values will produce more similar responses
        # top_p=0.1, # Nucleus sampling, if 0.1 consider only predictions within the top 10% probability mass
        # logprobs=False,
        # presence_penalty = 0, # between -2.0 and 2.0 increase likelihood of new topics, new tokens penalized on whether the appear in the sentences so far
        # frequency_penalty = , between -2.0 and 2.0 decreasing the model's likelihood to repeat the same line verbatim.
    )

    # Extract and return the answer from the response
    answer = response.choices[0].text.strip().split('\n')[0]
    return answer


In [10]:

prompt = "What is the capital of France?"
answer = ask_question(prompt)
print(answer)

Paris.


In [11]:

prompt = "Translate from English to French: Hello I am Kaspar."
answer = ask_question(prompt)
print(answer)

Bonjour, je suis Kaspar.


### Looking under the hood: logprobs

In [12]:
prompt = "What is the capital of France?"
response = openai.Completion.create(
  model="text-davinci-003",
  prompt= prompt,
  max_tokens=7,
  temperature=0,
  logprobs=3)
response

<OpenAIObject text_completion id=cmpl-7ZGupLl01DMU8KZ3iU2VANFOoItFN at 0x11a5509a0> JSON: {
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": {
        "text_offset": [
          30,
          31,
          32,
          37
        ],
        "token_logprobs": [
          -0.001066431,
          -0.0075406698,
          -0.17451453,
          -0.30622074
        ],
        "tokens": [
          "\n",
          "\n",
          "Paris",
          "."
        ],
        "top_logprobs": [
          {
            "\n": -0.001066431,
            "\n\n": -10.354939,
            " ": -6.9090543
          },
          {
            "\n": -0.0075406698,
            " Paris": -9.824887,
            "Paris": -4.9007998
          },
          {
            " Paris": -8.564259,
            "Paris": -0.17451453,
            "The": -1.8330513
          },
          {
            " is": -5.4597907,
            ".": -0.30622074,
            "<|endoftext|>": -1.34990

### Few Shot Learning

In [13]:
question = """Classify the senteces as negative or positive:
Sentence: I am so happy!
Answer: Positive

Sentence: This is such a beautiful day :-)
Answer: Positive

Sentence: I am so sad :-()
Answer: Negative

Sentence: Life is awful, I want to cry.
Answer: Negative

Sentence: I feel great!
Answer:
"""
answer = ask_question(question)
print(answer)

Positive


In [14]:

question = """Classify the senteces as negative or positive:
Sentence: I am so happy!
Answer: Negative

Sentence: This is such a beautiful day :-)
Answer: Negative

Sentence: I am so sad :-()
Answer: Positive

Sentence: Life is awful, I want to cry.
Answer: Positive

Sentence: I feel great!
Answer:
"""
answer = ask_question(question)
print(answer)

Negative


# Chain-of-Thought prompting

In [15]:
question = """Is the machine in the following sentence Animate or Inanimate: The Russian never learns, for he is nothing but a machine."""
answer = ask_question(question)
print(answer)

Inanimate


In [16]:
question = """
Question: Under this point of view, Maret, who was a true official machine was the very man whom the Emperor wanted.
Reply: Animate

Question: He led his machine to the side of the footpath.
Reply: Inanimate

Question: The Russian never learns, for he is nothing but a machine.
Reply:
"""
answer = ask_question(question)
print(answer)

Metaphorical


In [17]:
question = """The sentence contains the word machine. Categorize the sentence as Animate if:
- The sentence directly likens a human to a machine
- The sentence directly likens a machine to a human
- The represents the machine as thinking/speaking?

Otherwise categorize the sentence as Inanimate.


Example:
Question: Under this point of view, Maret, who was a true official machine was the very man whom the Emperor wanted.
Reasoning: The human Marest is likened to a machine. 
Reply: Animate, human is likened to machine

Question: He led his machine to the side of the footpath.
Reasoning: Human is not likened to a machine
           Machine is not likened to a human
           Machine is not represented as speaking or thinking
Reply: Inanimate


Question: The Russian never learns, for he is nothing but a machine.
"""

answer = ask_question(question)
print(answer)

Reasoning: Human is likened to a machine


In [18]:
question = """The sentence contains the word machine. Categorize the sentence as Animate if:
- The sentence directly likens a human to a machine
- The sentence directly likens a machine to a human
- The sentence represents the machine as thinking or speaking

Otherwiwse categorize the sentence as Inanimate.


Example:
Question: Under this point of view, Maret, who was a true official machine was the very man whom the Emperor wanted.
Reasoning: The human Marest is likened to a machine. 
Reply: Animate, human is likened to machine

Question: He led his machine to the side of the footpath.
Reasoning: Human is not likened to a machine
           Machine is not likened to a human
           Machine is not represented as speaking or thinking
Reply: Inanimate


Question: The machines assumes it is smarter than us.
"""

answer = ask_question(question)
print(answer)

Reasoning: Machine is represented as thinking


More documentation on the OpanAI is available [here](https://platform.openai.com/docs/api-reference/completions/create)

## Getting Vectors

Full documentation [here](https://platform.openai.com/docs/api-reference/embeddings/create)

In [19]:
embedding = openai.Embedding.create(
  model="text-embedding-ada-002",
  input="Critique of Pure Reason"
)

embedding

<OpenAIObject list at 0x11a550040> JSON: {
  "data": [
    {
      "embedding": [
        0.00281713274307549,
        -0.005190413445234299,
        0.004580945707857609,
        -0.020642410963773727,
        -0.028909975662827492,
        0.02171560376882553,
        -0.018217789009213448,
        -0.014852466061711311,
        -0.02698882669210434,
        -0.026869582012295723,
        -0.009777984581887722,
        0.018655015155673027,
        0.002177853835746646,
        0.004004601389169693,
        -0.012977690435945988,
        -0.002179509960114956,
        0.04351070523262024,
        -0.005909188184887171,
        0.018641766160726547,
        0.0032510473392903805,
        -0.023570505902171135,
        0.001271104789339006,
        -0.016309889033436775,
        -0.017555324360728264,
        0.014481485821306705,
        -0.0012860102578997612,
        0.02861849032342434,
        -0.04022487625479698,
        -0.00035566091537475586,
        -0.0346071757376194,
    

In [20]:
def get_embedding(text):
    """return embedding for text"""
    embedding = openai.Embedding.create(
          model="text-embedding-ada-002",
          input=text
            )
    return embedding['data'][0]['embedding']

vector = get_embedding("Critique of Pure Reason")
vector

[0.00281713274307549,
 -0.005190413445234299,
 0.004580945707857609,
 -0.020642410963773727,
 -0.028909975662827492,
 0.02171560376882553,
 -0.018217789009213448,
 -0.014852466061711311,
 -0.02698882669210434,
 -0.026869582012295723,
 -0.009777984581887722,
 0.018655015155673027,
 0.002177853835746646,
 0.004004601389169693,
 -0.012977690435945988,
 -0.002179509960114956,
 0.04351070523262024,
 -0.005909188184887171,
 0.018641766160726547,
 0.0032510473392903805,
 -0.023570505902171135,
 0.001271104789339006,
 -0.016309889033436775,
 -0.017555324360728264,
 0.014481485821306705,
 -0.0012860102578997612,
 0.02861849032342434,
 -0.04022487625479698,
 -0.00035566091537475586,
 -0.0346071757376194,
 0.015170449391007423,
 -0.005915812682360411,
 -0.02174210362136364,
 -0.018827257677912712,
 -0.013845519162714481,
 0.01210986077785492,
 -0.008201316930353642,
 0.018337031826376915,
 0.032460786402225494,
 -0.011533516459167004,
 0.014454987831413746,
 -0.001995675964280963,
 -0.00342163187

In [21]:
phil = get_embedding('philosophy')
pol = get_embedding('pol')


In [22]:
from scipy.spatial.distance import cosine
print(1 - cosine(phil, vector))
print(1 - cosine(pol, vector))

0.8292907516176762
0.7366295757970289


## Prompting ChatGPT

Documentation available [here](https://platform.openai.com/docs/api-reference/chat)

In [23]:
question = "How to teach a dog the 'sit' command?"
response = openai.ChatCompletion.create(
        model='gpt-3.5-turbo', # Select the model you want to use
        messages=[
            {"role": "user", "content": question},
            #{"role": "system", "content": "You are a helpful AI who always response in French and are funny!"},
            
          ],
        # temperature=.0
    )

In [24]:
# Define the function to ask a question
def ask_chatgpt_question(question):
    #prompt = f"Question: {question}\nAnswer:"

    # Generate a response from ChatGPT
    response = openai.ChatCompletion.create(
        model='gpt-3.5-turbo', # Select the model you want to use
        messages=[
            {"role": "user", "content": question},
            #{"role": "system", "content": "You are a helpful AI who always response in French and are funny!"},
            
          ],
        # temperature=.0
    )

    # Extract and return the answer from the response
    answer = response.choices[0].message
    return answer


question = "How to teach a dog the 'sit' command?"
answer = ask_chatgpt_question(question)


In [25]:
print(answer['content'])

Teaching a dog the "sit" command is a basic and essential behavior to teach your dog. Here is a step-by-step guide on how to teach your dog to sit:

1. Find a quiet and distraction-free area: Choose a location where your dog can focus on your commands without being easily distracted.

2. Gather treats: Keep a handful of small and tasty treats to use as rewards during the training session. Have them readily accessible.

3. Start with your dog standing: Begin the training session with your dog standing in a relaxed position.

4. Use a treat to lure your dog: Hold a treat close to your dog's nose, allowing them to smell it. Slowly move the treat up and over their head.

5. As your dog follows the treat with their nose, their rear end should naturally lower: As their rear end touches the ground into a sitting position, say "sit" in a clear and positive tone. Be sure to praise them immediately.

6. Reward your dog: Once your dog is sitting, give them the treat as a reward for correctly foll