# IndoML 2023 Tuturial: Part 2
## The Era of LLMs!

### In-context learning and Prompt Engineering


1. We will use recent LLMs like GPT-3/FLAN-T5/LLAMA to query the models in natural language to get answers/predictions.
2. These models are finetuned on instructions or human-feedbacks to enable them to perform a task through "prompting".
3. Best part is we wouldn't need to train our models to get started, direct inference from these pretrained models is fine.
    * NOTE: Although there can be methods to finetune these models on our data to get better results, we will not be covering that in this tutorial.

### Methods that we will try:

1. FLAN-T5

## Load `dataset`

In [1]:
%env CUDA_VISIBLE_DEVICES=3

env: CUDA_VISIBLE_DEVICES=3


In [2]:
from tqdm import tqdm
from datasets import load_dataset
from transformers import AutoTokenizer, DataCollatorWithPadding

tqdm.pandas()

dataset = load_dataset("AmazonScience/massive")


  from .autonotebook import tqdm as notebook_tqdm


## Load `AutoTokenizer` and `AutoModelForSeq2SeqLM`

In [3]:
# pip install -q transformers accelerate bitsandbytes
from transformers import AutoTokenizer, AutoModelForCausalLM

# checkpoint = "bigscience/mt0-base"
checkpoint = "bigscience/bloomz-3b"
# checkpoint = "google/flan-t5-xxl"

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto", load_in_8bit=True)

Downloading model.safetensors: 100%|██████████| 6.01G/6.01G [05:22<00:00, 18.6MB/s]


## Model Specific Example of Prompt Engineering

In [8]:
inputs = tokenizer.encode("Detect the intent class of the utterance.\nUtterance: I am going to school.; Intent:", return_tensors="pt").to("cuda")
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))



Detect the intent class of the utterance.
Utterance: I am going to school.; Intent: go


## Preprocessing

- We will create prompts for each test sample in the dataset. 
- There are few ways to format these prompt and this step is called "Prompt Engineering".
    - Few-shot In-context learning: Use task-description and examples
    - Zero-shot In-context learning: Use task-description only

## Exemplars for the multilingual intent-detection task

In [5]:
# Gather examples from the training dataset
import pandas as pd
df_train = dataset['train'].to_pandas()

# Extract one random sample per intent, we will randomly sample the rows
# Hope is that the model will predict english labels for any language that way.
df_intent_samples = df_train.groupby("intent").apply(lambda x: x.sample(1, random_state=42)).reset_index(drop=True)    

In [6]:
# Add formatted prompt for each sample
def int2str(x):
    return dataset['train'].features['intent'].int2str(x)

df_intent_samples['example_prompt_format'] = df_intent_samples.apply(lambda x: f'Utterance: {x["utt"]}; Intent: {int2str(x["intent"])}', axis=1)

# merge examples into a single string
prompt_exemplars = df_intent_samples['example_prompt_format'].str.cat(sep='\n')

In [9]:
print(prompt_exemplars)

Utterance: የዛሬውን ቀን ልትነግሩኝ ትችላላቹ; Intent: datetime_query
Utterance: cambia las luces a azul; Intent: iot_hue_lightchange
Utterance: je veux un billet de train pour l' oregon; Intent: transport_ticket
Utterance: உணவகத்தில் எடுத்துச் செல்லும் வசதி உள்ளதா; Intent: takeaway_query
Utterance: berapakah nilai saham apple; Intent: qa_stock
Utterance: buna; Intent: general_greet
Utterance: olly ni matukio yapi yanafanyika karibu na mimi leo; Intent: recommendation_events
Utterance: bu mahnını xoşlamadığımı yadda saxla; Intent: music_dislikeness
Utterance: oprește-mi priza wemo acum; Intent: iot_wemo_off
Utterance: hvor mange grøntsager burde jeg spise om dagen; Intent: cooking_recipe
Utterance: இந்திய ரூபாவுடன் டாலரின் மாற்று விகிதம் என்ன; Intent: qa_currency
Utterance: 運輸; Intent: transport_traffic
Utterance: چطور پیش می‌رود; Intent: general_quirky
Utterance: آج آب و ہوا کیسی ہے; Intent: weather_query
Utterance: volumen på musikken der spiller nu skal op; Intent: audio_volume_up
Utterance: እባክ

## Generate prompts for each query

In [10]:
# Add a new feature column to the dataset
# Prompt: What is the intent of the following sentence?\m "{utt}"

few_shot = True
def add_prompt(example):
    if few_shot:
        example["prompt"] = f'# Detect intent of the input utterance.\n\n{prompt_exemplars}\nUtterance: {example["utt"]}; Intent:'
    else:
        example["prompt"] = f'# Detect intent of the input utterance.\n\nUtterance: {example["utt"]}; Intent:'
    
    example["str_label"] = int2str(example["intent"])
    return example


extended_eval_set = dataset['validation'].map(add_prompt)

Map: 100%|██████████| 103683/103683 [00:18<00:00, 5650.94 examples/s]
Map: 100%|██████████| 103683/103683 [00:47<00:00, 2203.93 examples/s]


In [11]:
extended_eval_set[0]

{'id': '11',
 'locale': 'mn-MN',
 'partition': 'dev',
 'scenario': 8,
 'intent': 40,
 'utt': 'гэрлийг унтраа',
 'annot_utt': 'гэрлийг унтраа',
 'worker_id': '5',
 'slot_method': {'slot': [], 'method': []},
 'judgments': {'worker_id': ['43', '4', '30'],
  'intent_score': [1, 1, 1],
  'slots_score': [1, 1, 1],
  'grammar_score': [4, 3, 4],
  'spelling_score': [2, 2, 2],
  'language_identification': ['target', 'target', 'target']},
 'prompt': "# Detect intent of the input utterance.\n\nUtterance: የዛሬውን ቀን ልትነግሩኝ ትችላላቹ; Intent: datetime_query\nUtterance: cambia las luces a azul; Intent: iot_hue_lightchange\nUtterance: je veux un billet de train pour l' oregon; Intent: transport_ticket\nUtterance: உணவகத்தில் எடுத்துச் செல்லும் வசதி உள்ளதா; Intent: takeaway_query\nUtterance: berapakah nilai saham apple; Intent: qa_stock\nUtterance: buna; Intent: general_greet\nUtterance: olly ni matukio yapi yanafanyika karibu na mimi leo; Intent: recommendation_events\nUtterance: bu mahnını xoşlamadığımı ya

## Now let's try to predict using the LLM

In [25]:
print(extended_eval_set[100]['prompt'])

x = extended_eval_set[100]['prompt']
tok_x = tokenizer(x, return_tensors="pt")
y = model.generate(tok_x['input_ids'].to("cuda"), num_beams=5, num_return_sequences=5, max_length=2000)
output = tokenizer.decode(y[0], skip_special_tokens=True)
print(output)

# Detect intent of the input utterance.

Utterance: የዛሬውን ቀን ልትነግሩኝ ትችላላቹ; Intent: datetime_query
Utterance: cambia las luces a azul; Intent: iot_hue_lightchange
Utterance: je veux un billet de train pour l' oregon; Intent: transport_ticket
Utterance: உணவகத்தில் எடுத்துச் செல்லும் வசதி உள்ளதா; Intent: takeaway_query
Utterance: berapakah nilai saham apple; Intent: qa_stock
Utterance: buna; Intent: general_greet
Utterance: olly ni matukio yapi yanafanyika karibu na mimi leo; Intent: recommendation_events
Utterance: bu mahnını xoşlamadığımı yadda saxla; Intent: music_dislikeness
Utterance: oprește-mi priza wemo acum; Intent: iot_wemo_off
Utterance: hvor mange grøntsager burde jeg spise om dagen; Intent: cooking_recipe
Utterance: இந்திய ரூபாவுடன் டாலரின் மாற்று விகிதம் என்ன; Intent: qa_currency
Utterance: 運輸; Intent: transport_traffic
Utterance: چطور پیش می‌رود; Intent: general_quirky
Utterance: آج آب و ہوا کیسی ہے; Intent: weather_query
Utterance: volumen på musikken der spiller nu skal o

In [31]:
N_class = dataset['train'].features['intent'].num_classes
str2int = {}
for i in range(N_class):
    str2int[int2str(i)] = i

def parse_prediction(prompt, output_txt):
    # take the diff between the prompt and the generated text
    # cut it till the first \n
    pred_class = output_txt[len(prompt):].split('\n')[0].strip()
    print(pred_class)

    # Check if it matches any label in the dataset
    if pred_class in str2int:
        return str2int[pred_class], int2str(pred_class)
    else:
        return -1, pred_class

In [32]:
parse_prediction(x, output)

food_query


(-1, 'food_query')

## Setup Evaluation Metric

In [33]:
import numpy as np
import evaluate

metric_acc = evaluate.load("accuracy")
metric_f1 = evaluate.load("f1")

# We need to define a compute_metric function that is supported by the Trainer output
# It basically converts the logits to predictions and then calls the metric
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    # print("Predictions: ", predictions)
    # print("Labels: ", labels)
    return metric_f1.compute(predictions=predictions, references=labels, average="macro")