In [8]:
import random
import re
import time
from typing import Dict, List, Tuple

import lingua
import pandas as pd
import torch
import torch.nn as nn
from metrics import ag_news_metrics, map_ag_news_int_labels
from transformers import AutoTokenizer

# Getting Started

There is a bit of documentation on how to interact with the large models [here](https://lingua-sdk.readthedocs.io/en/latest/getting_started.html). The relevant github links to the SDK are [here](https://github.com/VectorInstitute/lingua-sdk) and underlying code [here](https://github.com/VectorInstitute/lingua).

First we connect to the service through which, we'll interact with the LLMs and see which models are avaiable to us

In [9]:
# Establish a client connection to the Lingua service
client = lingua.Client(gateway_host="llm.cluster.local", gateway_port=3001)

Show all supported models

In [10]:
client.models

['OPT-175B', 'OPT-6.7B']

Show all model instances that are currently active

In [11]:
client.model_instances

[{'id': 'c402a90b-5867-476b-950d-9921585335ec',
  'name': 'OPT-6.7B',
  'state': 'ACTIVE'},
 {'id': 'af334811-a4fc-483d-91be-a65a3a98d34e',
  'name': 'OPT-175B',
  'state': 'ACTIVE'}]

Let's start by querying the OPT-175B model. We'll try other models below. Get a handle to a model. In this example, let's use the OPT-175B model.

In [12]:
model = client.load_model("OPT-175B")
# If this model is not actively running, it will get launched in the background.
# In this case, wait until it moves into an "ACTIVE" state before proceeding.
while model.state != "ACTIVE":
    time.sleep(1)

We need to configure the model to generate in the way we want it to. We set important parameters.

*`max_tokens` sets the number the model generates before haulting generation.
*`top_k`: Range: 0-Vocab size. At each generation step this is the number of tokens to select from with relative probabilities associated with their likliehoods. Setting this to 1 is "Greedy decoding." If top_k is set to zero them we exclusively use nucleus sample (i.e. top_p below).
*`top_p`: Range: 0.0-1.0, nucleus sampling. At each generation step, the tokens the largest probabilities, adding up to `top_p` are sampled from relative to their likliehoods.
*`rep_penalty`: Range >= 1.0. This attempts to decrease the likelihood of tokens in a generation process if they have been generated before. A value of 1.0 means no penalty and larger values increasingly penalize repeated values. 1.2 has been reported as a good default value.
*`temperature`: Range >=0.0. This value "sharpens" or flattens the softmax calculation done to produce probabilties over the vocab. As temperature goes to zero: only the largest probabilities will remain non-zero (approaches greedy decoding). As it approaches infinity, the distribution spreads out evenly over the vocabulary.

In [13]:
short_generation_config = {"max_tokens": 2, "top_k": 4, "top_p": 3, "rep_penalty": 1.0, "temperature": 1.0}

Let's try a basic prompt for factual information.

__Note__ that if you run the cell multiple times, you'll get different responses due to sampling.

In [14]:
generation = model.generate("What is the capital of Canada?", short_generation_config)
# Extract the text from the returned generation
generation.generation["text"]

[' Actually,']

We're going to have our model attempt to classify some news articles from the AG News Dataset. Articles have a single label 1-4

1. World
2. Sports
3. Business
4. Sci/Tech

This is a constrained label space. We'll use the words World, Sports, Business, and Science as our targets for each of the labels.

In [15]:
def remove_markup(text: str) -> str:
    text = re.sub(r"https?://\S+|www\.\S+", "", text)
    text = re.sub(r"<.*?>+", "", text)
    return text


def ag_news_processor(path: str) -> Tuple[List[str], List[str], List[str]]:
    ag_news_data = pd.read_csv(path)
    labels = ag_news_data["Class Index"].tolist()
    titles = ag_news_data["Title"].apply(lambda x: remove_markup(x)).tolist()
    descriptions = ag_news_data["Description"].apply(lambda x: remove_markup(x)).tolist()
    return labels, titles, descriptions


int_to_label_map = {1: "world", 2: "sports", 3: "business", 4: "science"}
ag_news_labels, ag_news_titles, ag_news_descriptions = ag_news_processor("resources/ag_news_sample.csv")

In [16]:
ag_news_labels = map_ag_news_int_labels(ag_news_labels, int_to_label_map)
ag_news_descriptions = [description.replace("\\", " ").strip() for description in ag_news_descriptions]
ag_news_titles = [title.strip() for title in ag_news_titles]
label_words = ["World", "Sports", "Business", "Science"]

In [17]:
model_input_texts = [
    f"Title: {ag_news_title} Description: {ag_news_description}"
    for ag_news_title, ag_news_description in zip(ag_news_titles, ag_news_descriptions)
]

Let's start by trying out a basic instruction prompt to see what the model does.

In [18]:
prompt_template = "To which category does this news article belong?"
sample_texts = [f"{model_input_text} {prompt_template}" for model_input_text in model_input_texts[0:3]]
generation = model.generate(sample_texts, short_generation_config)
for text in generation.generation["text"]:
    print(text)
    print("==================================")

 Company News



 <a


Not well...Now let's try to constrain the model a bit by including the desired labels in the instruction.

In [28]:
prompt_template = "From World, Sports, Business, Science, the category is "
sample_texts = [f"{model_input_text} {prompt_template}" for model_input_text in model_input_texts[0:3]]
generation = model.generate(sample_texts, short_generation_config)
for text in generation.generation["text"]:
    print(text)
    print("==================================")

 Khmers
 Business &
____.


The model doesn't really answer in the space that we want it to. Let's try with some few-shot examples to see if that helps

In [20]:
prompt_template_prefix = """Title: Lane drives in winning run in ninth Description: Jason Lane took an unusual post-game batting practice with hitting coach Gary Gaetti after a disappointing performance Friday night. Cateogry (World, Sports, Business, Science): Sports
Title: Arson attack on Jewish centre in Paris (AFP) Description: AFP - A Jewish social centre in central Paris was destroyed by fire overnight in an anti-Semitic arson attack, city authorities said. Cateogry (World, Sports, Business, Science): World
Title: Oil prices look set to dominate Description: The price of oil looks set to grab headlines as analysts forecast that its record-breaking run may well continue. Cateogry (World, Sports, Business, Science): Business
Title: More Evidence for Past Water on Mars Description: Summary - (Aug 22, 2004) NASA #39;s Spirit rover has dug up plenty of evidence on slopes of  quot;Columbia Hills quot; that water once covered the area. Cateogry (World, Sports, Business, Science): World
Title: Indexes in Japan fall short of hype Description: Japanese stocks have failed to measure up to an assessment made in April by Merrill Lynch #39;s chief global strategist, David Bowers, who said Japan was  quot;very much everyone #39;s favorite equity market. Cateogry (World, Sports, Business, Science): Business
"""  # noqa
prompt_template_postfix = "Cateogry (World, Sports, Business, Science):"
sample_texts = [
    f"{prompt_template_prefix}{model_input_text} {prompt_template_postfix}"
    for model_input_text in model_input_texts[0:3]
]
generation = model.generate(sample_texts, short_generation_config)
for text in generation.generation["text"]:
    # We'll limit ourselves to the single next token since we want it to respond that way
    print(text)
    print("==================================")

 Business

 World

 World



Few-shot learning definitely helps a lot! We'll measure an accuracy sample below. However, there is nothing stoping the model from not selecting our labels. So can we do better? We can work around this by understanding the likliehood of our labels from the models perspective

In [21]:
# We're interested in the activations from the last layer of the model, because this will allow us to caculation the
# likliehoods
last_layer_name = model.module_names[-1]
last_layer_name

'decoder.output_projection'

Need to instantiate a tokenizer to obtain appropriate token indices for our labels. 

__NOTE__: All OPT models, regardless of size, used the same tokenizing. However, if you want to use a different type of model, a different tokenizer may be needed.

In [22]:
tokenizer = AutoTokenizer.from_pretrained("facebook/opt-350m")
# Need to consider the token ids of our labels in the context of the prompt, as they may be different in context.
tokenized_inputs = tokenizer([f"{prompt_template} {label_word}" for label_word in label_words], return_tensors="pt")[
    "input_ids"
]

label_token_ids = tokenized_inputs[:, -1]
# If you ever need to move back from token ids, you can use tokenizer.decode or tokenizer.batch_decode
tokenizer.decode(label_token_ids)

' World Sports Business Science'

Let's look at how we can extract the likelihoods given the label tokens

In [23]:
single_prompted_input = f"{model_input_texts[0]} {prompt_template} {label_words[0]}"
# Create a prompt with one of the label words as a completion
activations = model.get_activations(single_prompted_input, [last_layer_name], short_generation_config)

In [24]:
def get_label_with_highest_likelihood(
    layer_matrix: torch.Tensor, label_token_ids: torch.Tensor, int_to_label_map: Dict[int, str]
) -> str:
    # The activations we can about are the last token (corresponding to our label token) and the values for our label
    #  vocabulary
    label_activations = layer_matrix[-1][label_token_ids].float()
    softmax = nn.Softmax(dim=0)
    label_distributions = softmax(label_activations)
    # Plus one needed to correct into our label space of 1-4
    max_label_index = torch.argmax(label_distributions) + 1
    return int_to_label_map[max_label_index.item()]

In [25]:
last_layer_matrix = activations.activations[0][last_layer_name]
# The shape of this tensor should be number of input tokens by the vocabulary size (n x 50272)
print(f"Activations matrix shape: {last_layer_matrix.shape}")
predicted_label = get_label_with_highest_likelihood(last_layer_matrix, label_token_ids, int_to_label_map)
print(f"Predicted Label: {predicted_label}")

Activations matrix shape: torch.Size([66, 50272])
Predicted Label: sports


## Accuracy

Time to compare our results across three methods. 
1. Measure the accuracy of our few-shot prompting approach.
2. Measure the accuracy of our likelihood approach without few-shot.
3. Measure the accuracy of our likelihood approach with few-shot.

In [19]:
lowercase_labels = [word.lower() for word in label_words]

### Few-shot only

In [20]:
prompt_template_prefix = """Title: Lane drives in winning run in ninth Description: Jason Lane took an unusual post-game batting practice with hitting coach Gary Gaetti after a disappointing performance Friday night. Cateogry (World, Sports, Business, Science): Sports
Title: Arson attack on Jewish centre in Paris (AFP) Description: AFP - A Jewish social centre in central Paris was destroyed by fire overnight in an anti-Semitic arson attack, city authorities said. Cateogry (World, Sports, Business, Science): World
Title: Oil prices look set to dominate Description: The price of oil looks set to grab headlines as analysts forecast that its record-breaking run may well continue. Cateogry (World, Sports, Business, Science): Business
Title: More Evidence for Past Water on Mars Description: Summary - (Aug 22, 2004) NASA #39;s Spirit rover has dug up plenty of evidence on slopes of  quot;Columbia Hills quot; that water once covered the area. Cateogry (World, Sports, Business, Science): World
Title: Indexes in Japan fall short of hype Description: Japanese stocks have failed to measure up to an assessment made in April by Merrill Lynch #39;s chief global strategist, David Bowers, who said Japan was  quot;very much everyone #39;s favorite equity market. Cateogry (World, Sports, Business, Science): Business
"""  # noqa
prompt_template_postfix = "Cateogry (World, Sports, Business, Science):"
prompts = [
    f"{prompt_template_prefix}{model_input_text} {prompt_template_postfix}" for model_input_text in model_input_texts
]
generation = model.generate(prompts, short_generation_config)

In [21]:
# We'll use tokens this time and consider just the first token
first_predicted_tokens = [tokens[0].strip().lower() for tokens in generation.generation["tokens"]]
# If a token doesn't correspond to one of our labels, we'll randomly select one and count how many times that happens
# for reporting
predicted_labels = []
n_no_match = 0
for potential_prediction in first_predicted_tokens:
    if potential_prediction in lowercase_labels:
        predicted_labels.append(potential_prediction)
    else:
        n_no_match += 1
        print(f"Potential Prediction: {potential_prediction} does not match any label")
        predicted_labels.append(random.choice(lowercase_labels))
print(f"There were {n_no_match} unmatched predictions out of 100 predictions")

Potential Prediction: u does not match any label
Potential Prediction: us does not match any label
Potential Prediction: technology does not match any label
Potential Prediction: electronics does not match any label
Potential Prediction: politics does not match any label
Potential Prediction: technology does not match any label
Potential Prediction: com does not match any label


In [22]:
ag_news_metrics(predicted_labels, ag_news_labels)

Prediction Accuracy: 0.57
Confusion Matrix with ordering ['world', 'sports', 'business', 'science']
<function confusion_matrix at 0x7fc95a0b1b80>
Label: world, F1: 0.5666666666666667, Precision: 0.53125,Recall: 0.6071428571428571
Label: sports, F1: 0.7317073170731706, Precision: 0.75,Recall: 0.7142857142857143
Label: business, F1: 0.59375, Precision: 0.4634146341463415,Recall: 0.8260869565217391
Label: science, F1: 0.34285714285714286, Precision: 0.8571428571428571,Recall: 0.21428571428571427


### Likelihood No Few-shot

In [29]:
def split_prompts_into_batches(prompts: List[str], batch_size: int = 10) -> List[List[str]]:
    return [prompts[x : x + batch_size] for x in range(0, len(prompts), batch_size)]

In [41]:
prompts = [f"{model_input_text} {prompt_template} {label_words[0]}" for model_input_text in model_input_texts][0:20]
# For memory management, we split the prompts into batches of size 10
predicted_labels = []
prompt_batches = split_prompts_into_batches(prompts)
len(prompt_batches)
len(prompt_batches[0])
for prompt_batch in prompt_batches:
    activations = model.get_activations(prompts, [last_layer_name], short_generation_config)
    for activations_single_prompt in activations.activations:
        last_layer_matrix = activations_single_prompt[last_layer_name]
        predicted_label = get_label_with_highest_likelihood(last_layer_matrix, label_token_ids, int_to_label_map)
        predicted_labels.append(predicted_label)

In [43]:
len(prompts)

20

In [37]:
ag_news_metrics(predicted_labels, ag_news_labels[0:20])

ValueError: Found input variables with inconsistent numbers of samples: [40, 100]

### Likelihood with Few-Shot

In [28]:
prompt_template_prefix = """Title: Lane drives in winning run in ninth Description: Jason Lane took an unusual post-game batting practice with hitting coach Gary Gaetti after a disappointing performance Friday night. Cateogry (World, Sports, Business, Science): Sports
Title: Arson attack on Jewish centre in Paris (AFP) Description: AFP - A Jewish social centre in central Paris was destroyed by fire overnight in an anti-Semitic arson attack, city authorities said. Cateogry (World, Sports, Business, Science): World
Title: Oil prices look set to dominate Description: The price of oil looks set to grab headlines as analysts forecast that its record-breaking run may well continue. Cateogry (World, Sports, Business, Science): Business
Title: More Evidence for Past Water on Mars Description: Summary - (Aug 22, 2004) NASA #39;s Spirit rover has dug up plenty of evidence on slopes of  quot;Columbia Hills quot; that water once covered the area. Cateogry (World, Sports, Business, Science): World
Title: Indexes in Japan fall short of hype Description: Japanese stocks have failed to measure up to an assessment made in April by Merrill Lynch #39;s chief global strategist, David Bowers, who said Japan was  quot;very much everyone #39;s favorite equity market. Cateogry (World, Sports, Business, Science): Business
"""  # noqa
prompt_template_postfix = "Cateogry (World, Sports, Business, Science):"
prompts = [
    f"{prompt_template_prefix}{model_input_text} {prompt_template_postfix}" for model_input_text in model_input_texts
]
# For memory management, we split the prompts into batches of size 10
predicted_labels = []
prompt_batches = split_prompts_into_batches(prompts)
for prompt_batch in prompt_batches:
    activations = model.get_activations(prompts, [last_layer_name], short_generation_config)
    for activations_single_prompt in activations.activations:
        last_layer_matrix = activations_single_prompt[last_layer_name]
        predicted_label = get_label_with_highest_likelihood(last_layer_matrix, label_token_ids, int_to_label_map)
        predicted_labels.append(predicted_label)

ConnectionError: HTTPConnectionPool(host='llm.cluster.local', port=3001): Max retries exceeded with url: /models/instances/af334811-a4fc-483d-91be-a65a3a98d34e/generate_activations (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fc898d4c7f0>: Failed to establish a new connection: [Errno 61] Connection refused'))

In [None]:
ag_news_metrics(predicted_labels, ag_news_labels)

Prediction Accuracy: 1.0
Confusion Matrix with ordering ['world', 'sports', 'business', 'science']
<function confusion_matrix at 0x7fc9e8567040>
Label: world, F1: nan, Precision: nan,Recall: nan
Label: sports, F1: 1.0, Precision: 1.0,Recall: 1.0
Label: business, F1: 1.0, Precision: 1.0,Recall: 1.0
Label: science, F1: nan, Precision: nan,Recall: nan


  recall = TP / (TP + FN)
  precision = TP / (TP + FP)
