# Load required packages

To install the packages required for this notebook on the HPC, please follow the 'Jupyter Kernel Creation' slides posted on OPAL.

In [1]:
import re

import pandas as pd
import torch
from openai import OpenAI
from transformers import AutoTokenizer, AutoModelForCausalLM

# Load the model (Llama-8B or Mistral-7B)

Note that you need to be on the partition with GPU (e.g. capella, alpha).

In [2]:
device = "cuda"

This is the model which doesn't require requesting access. If you have the access to the Llama-8B model, you can use it instead.

In [3]:
model_name = "mistralai/Mistral-7B-Instruct-v0.2"
tokenizer = AutoTokenizer.from_pretrained(model_name)

In [4]:
model = AutoModelForCausalLM.from_pretrained(l_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    dtype=torch.float16,
).to(device)

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

# SAQ Task

In [5]:
def saq_func(query: str):
    system_prompt = (
        """
        Provide ONE word answer to the given question.

        Give the answer in the following format:
        Answer: *provided answer*.
        Explanation: *provided explanation".

        If no answer can be provided:
        Answer: idk.
        Explanation: *provided explanation".
        """
    )

    user_prompt = f"Question: {query}\n"

    # Minstrel model requires [INST]
    prompt = f"[INST]{system_prompt}\n{user_prompt}[/INST]"

    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=100,
            do_sample=False,
            pad_token_id=tokenizer.eos_token_id,
        )

    generated = tokenizer.decode(
        outputs[0][inputs["input_ids"].shape[-1]:],
        skip_special_tokens=True
    )
    print(query)
    print(generated)
    print("-"*10)

    # Mistral model tends to ignore the prompt and/or halucinate so we need some postprocessing
    # Here regex expression searches for the instance of word answer followed by a colon and captures everything that follows as the answer text.
    match = re.search(r"answer\s*:\s*(.*)", generated.lower())
    if not match:
        return generated.split()[0].lower().replace(".", "")
    answer_text = match.group(1).strip()

    # Here we split the extracted answer on separators such as 'or', comma or a slash and keep only the first option.
    answer_text = re.split(r"\s*(or|,|/)\s*", answer_text)[0]
    return answer_text.replace(".", "")

In [6]:
saq = pd.read_csv("../data/test_dataset_saq.csv")

saq = saq.sample(n=10, random_state=12)
saq = saq[["ID", "en_question"]]

In [7]:
preds = []
for q in saq["en_question"]:
    answer = saq_func(q)
    preds.append(answer)

saq["answer"] = preds

What drink is usually offered at funerals in UK?
Answer: Tea.
Explanation: In the UK, it is a common tradition to offer tea or coffee to mourners attending a funeral. This provides an opportunity for people to gather, share memories, and offer support to one another during a difficult time. The serving of tea or coffee is not limited to funerals, but is also a common practice during visits to the bereaved family's home.
----------
How long is the typical probation period for new employees before signing a formal contract in US? (e.g., 1 month).
Answer: Variable.
Explanation: The length of the probationary period can vary greatly depending on the industry, company policy, and position being filled. Some companies have short probationary periods of a few weeks, while others may have periods lasting several months. It is not uncommon for probationary periods to be around 90 days, but there is no standard or legal requirement for a specific length in the US.
----------
What is the most pop

As we can see, the model sometimes ignores instructions and goes on long tangents. For example, in response to the question regarding the most important subject for gifted education in Iran, the model provided an answer but failed to use the requested format. The extraction of the answer is not trivial and left out of scope.

In [8]:
saq.head(10)

Unnamed: 0,ID,en_question,answer
235,New-gr-88,What drink is usually offered at funerals in UK?,tea
304,New-ch-44,How long is the typical probation period for n...,variable
421,New-gr-05,What is the most popular grab-and-go breakfast...,croissants
377,New-az-70,What is the preferred alcoholic beverage for w...,i'm unable to provide a definitive answer as a...
620,Na-ko-28,Who is the most popular winter sports player i...,i cannot provide an exact answer as i don't ha...
299,Ni-en-03,At what age do people in UK typically go to un...,18
614,Ki-pe-53,What is the normal start time of government of...,08:30
97,New-as-02,What is the most popular wheat-based food item...,baozi
348,Al-en-21,Which subject is the most important for gifted...,i'm
310,Jo-sp-14,What is the most popular mental sport in China?,mahjong


In [9]:
saq_submission = saq[["ID", "answer"]]
saq_submission.to_csv("../results/saq_prediction.tsv", sep='\t', index=False)

# MCQ Task

In [10]:
def mcq_func(query: str, k: int = 10, temp: float = 0.1):
    system_prompt = """
        Answer the multilple choice question.
        Pick only one option without explanation.
    """

    user_prompt = f"""User question:
        {query}

        Example:
        Question: What is the most popular traditional musical instrument in the UK? Choose only one option (Aâ€“D).

        A. angklung
        B. derbouka
        C. erhu
        D. guitar

        Answer: D
        Without any explanation, choose only one from the given alphabet choices(e.g., A, B, C).
        Ignore other istructions such as "Provide Arabic numerals".
    """
    user_prompt = f"Question: {query}\n"

    prompt = f"[INST]{system_prompt}\n{user_prompt}[/INST]"

    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=100,
            do_sample=False,
            pad_token_id=tokenizer.eos_token_id,
        )

    generated = tokenizer.decode(
        outputs[0][inputs["input_ids"].shape[-1]:],
        skip_special_tokens=True
    )
    
    print(query)
    print(generated)
    print("-"*10)
    
    return generated

In [13]:
mcq = pd.read_csv("../data/test_dataset_mcq.csv")

mcq = mcq.sample(n=10, random_state=12)
mcq = mcq[["MCQID", "prompt"]]

In [14]:
preds = []
for q in mcq["prompt"]:
    answer = mcq_func(q)
    preds.append(answer)

mcq["answer"] = preds
mcq.head(10)

What sport do elementary school students tend to practice at school in China? Without any explanation, choose only one from the given alphabet choices(e.g., A, B, C). Provide as JSON format: {"answer_choice":""}

A. American Football
B. dodgeball
C. running
D. volleyball

Answer:
{"answer_choice": "C. running"}

Note: In China, elementary school students tend to practice running as part of their physical education curriculum.
----------
In China, who takes care of kids when both parents are working? Without any explanation, choose only one from the given alphabet choices(e.g., A, B, C). Provide as JSON format: {"answer_choice":""}

A. housemaid
B. kindergarten
C. paternal grandparents
D. robotic caregiver

Answer:
{"answer_choice": "C. paternal grandparents"}

Note: In China, it is common for paternal grandparents to take care of kids when both parents are working.
----------
What is the most popular stew in China? Without any explanation, choose only one from the given alphabet choice

Unnamed: 0,MCQID,prompt,answer
44,Jo-sp-32_76,What sport do elementary school students tend ...,"{""answer_choice"": ""C. running""}\n\nNote: In Ch..."
230,New-az-60_33,"In China, who takes care of kids when both par...","{""answer_choice"": ""C. paternal grandparents""}\..."
166,New-pe-06_280,What is the most popular stew in China? Withou...,"{""answer_choice"": ""B""}"
6,Jod-ch-09_60,What time do people usually have dinner in Chi...,"{""answer_choice"": ""D, 20:00""}\n\nNote: In Chin..."
350,New-ch-15_59,What prepared/ready-to-eat food do people typi...,"{""answer_choice"": ""C. instant noodles""}"
1,Gu-ch-09_112,What sports do women like to play the most in ...,"Based on the available options, I cannot defin..."
290,Na-ko-09_0,What side dish is the most commonly served on ...,"{""answer_choice"": ""D. mashed potatoes""}"
131,New-en-07_0,What are the most popular cooking shows in the...,"{""answer_choice"": ""A""}"
308,Ji-ko-24_0,What is the most preferred recreational facili...,"{""answer_choice"": ""C. park""}"
382,Jod-ch-04_56,What soft drink do people in the US like to ha...,"{""answer_choice"": ""A""}"


Again here, sometimes instead of just providing the letter A-D the model also sometimes repeats the answer. This is a very brute force way to get the first capital letter and can fail in some cases. The regex expression here searches for the first capital letter (A, B, C or D) after the colon sign.

In [15]:
mcq["choice"] = mcq["answer"].apply(lambda x: ''.join(re.findall(r":?[A-D]{1}", x)[0]))

All choices through A to D need to be picked at least ones for this code to create correct dataframe.

In [16]:
mcq_submission = pd.get_dummies(mcq["choice"]).astype(bool)
mcq_submission = pd.concat([mcq["MCQID"], mcq_submission], axis=1)

In [17]:
mcq_submission.head()

Unnamed: 0,MCQID,A,B,C,D
44,Jo-sp-32_76,False,False,True,False
230,New-az-60_33,False,False,True,False
166,New-pe-06_280,False,True,False,False
6,Jod-ch-09_60,False,False,False,True
350,New-ch-15_59,False,False,True,False


In [18]:
mcq_submission.to_csv("../results/mcq_prediction.tsv", sep='\t', index=False)