## CS310 Natural Language Processing
## Lab 14: In-Context Learning and Prompting

In this lab, we will practice some in-context learning techniques, such as few-shot learning and chain-of-thought prompting, for solving QA problems.

## T1. Run LLMs locally

### Step 1) Install llama.cpp

Build the [llama.cpp](https://github.com/ggml-org/llama.cpp) tool, or download the binaries from the [release page](https://github.com/ggml-org/llama.cpp/releases).

---

### Step 1) Download model

We are going to download the model that is quantized and format-converted to `gguf` format.

**Model option a**: 
- Using the `huggingface-cli` tool.
- Following the tutorial here: (Qwen2.5-7B-Instruct-GGUF)[https://huggingface.co/Qwen/Qwen2.5-7B-Instruct-GGUF]

A quick command to download the model is:
```bash
huggingface-cli download Qwen/Qwen1.5-7B-Chat-GGUF qwen1_5-7b-chat-q5_k_m.gguf --local-dir . --local-dir-use-symlinks False
```


**Model option b**: 
- Or you can download the ChatGLM-3 model from ModelScope: https://modelscope.cn/models/ZhipuAI/chatglm3-6b/files
  - `model.safetensors.index.json`, `config.json`, `configuration.json`
  - `model-00001-of-00007.safetensors` to `model-00007-of-00007.safetensors`
  - `tokenizer_config.json`, `tokenizer.model`
Put all the files in a folder such as `./chatglm3-6b`. 
- Then use tools like [`chatglm.cpp`](https://github.com/li-plus/chatglm.cpp) to manually convert the model weights to `ggml` format.

---


### Step 3) Run model

You can run the model with following command:

```bash
llama-cli -m $MODEL_PATH
```

Then you can start interacting with the model in command line. Try to solve the following problems.
 - Use zero-shot and few-shot prompting to solve the problems.
 - Add Chain-of-Thought prompt if needed.


Try solving these problems with prompting:
1. Q: A juggler can juggle 16 balls. Half of the balls are golf balls, and half of the golf balls are blue. How many blue golf balls are there? A: 
2. 鸡和兔在一个笼子里，共有35个头，94只脚，那么鸡有多少只，兔有多少只？
3. Q: 242342 + 423443 = ? A: 
4. 一个人花8块钱买了一只鸡，9块钱卖掉了，然后他觉得不划算，花10块钱又买回来了，11块卖给另外一个人。问他赚了多少?

---

# Answers:
## A1
First, let's find out how many balls the juggler can juggle in total:

16 balls

Half of these balls are golf balls:

\( \frac{16}{2} = 8 \) golf balls

Half of the golf balls are blue:

\( \frac{8}{2} = 4 \) blue golf balls

So, there are 4 blue golf balls.
## A2
设鸡的数量为 \( x \)，兔的数量为 \( y \)。

根据题意，我们有两个方程：
1. 头的总数：\( x + y = 35 \) （因为每个头有1个)
2. 脚的总数：鸡有2只脚，兔有4只脚，所以 \( 2x + 4y = 94 \) （因为总共有94只脚）

首先解第一个方程得到 \( y \) 的表达式：
\( y = 35 - x \)

然后将 \( y \) 的表达式代入第二个方程：
\( 2x + 4(35 - x) = 94 \)

解这个方程：
\( 2x + 140 - 4x = 94 \)
\( -2x = 94 - 140 \)
\( -2x = -46 \)
\( x = 23 \)

所以鸡有23只。

现在用 \( x \) 的值来解 \( y \)：
\( y = 35 - x = 35 - 23 = 12 \)

兔子有12只。

答案是：鸡有23只，兔有12只。
## A3
242342 + 423443 = 665785
## A4
这个人第一次交易赚了1块钱，因为他卖鸡得到9块钱，而买鸡花了8块钱。第二次交易他又赚了1块钱，因为他卖鸡得到11块钱，而买回来的成本是10块钱。

所以，总共赚的钱是 \(1 + 1 = 2\) 元。

## T2. Practice few-shot prompting

For this pratice, you need to first download the [Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B) model from HuggingFace, by running the following command:

```bash
huggingface-cli download Qwen/Qwen2.5-7B --local-dir $MODEL_PATH
```

The task set we use is [MMLU](https://huggingface.co/datasets/cais/mmlu). You need to download the zip file and extract it to the `./MMLU` folder.

In [26]:
from transformers import AutoTokenizer,AutoModelForCausalLM
import torch
import json
import numpy as np
from pprint import pprint

First, define some helper functions for constructing prompts and running inference.

In [27]:
choices = ["A", "B", "C", "D"]

def format_subject(subject):
    l = subject.split("_")
    s = ""
    for entry in l:
        s += " " + entry
    return s

def format_example(input_list):
    prompt = input_list[0]
    k = len(input_list) - 2
    for j in range(k):
        prompt += "\n{}. {}".format(choices[j], input_list[j+1])
    prompt += "\nAnswer:"
    return prompt

def format_shots(prompt_data):
    prompt = ""
    for data in prompt_data:
        prompt += data[0]
        k = len(data) - 2
        for j in range(k):
            prompt += "\n{}. {}".format(choices[j], data[j+1])
        prompt += "\nAnswer:"
        prompt += data[k+1] + "\n\n"

    return prompt

def gen_prompt(input_list, subject, prompt_data):
    prompt = "The following are multiple choice questions (with answers) about {}.\n\n".format(
        format_subject(subject)
    )
    prompt += format_shots(prompt_data)
    prompt += format_example(input_list)
    return prompt

The following `inference()` function constructs the full input by prepending the few-shot examples to the `input_text`, and generate **1** token as the output, because the task modality is multiple choice question.

In [28]:
def inference(tokenizer, model, input_text, subject, prompt_data):
    if len(prompt_data) > 0:
        full_input = gen_prompt(input_text, subject, prompt_data) # add few-shot examples
    else:
        full_input = input_text
    inputs = tokenizer(full_input, return_tensors="pt").to("cpu")

    ids = inputs['input_ids']
    outputs = model.generate(
                ids,
                attention_mask = inputs['attention_mask'],
                pad_token_id = tokenizer.eos_token_id,
                max_new_tokens = 1, # Generate one token because it is multiple choice question
                output_scores = True,
                return_dict_in_generate=True
            )
    logits = outputs['scores'][0][0]    #The first token
    probs = (
            torch.nn.functional.softmax(
                torch.tensor(
                    [
                        logits[tokenizer("A").input_ids[0]],
                        logits[tokenizer("B").input_ids[0]],
                        logits[tokenizer("C").input_ids[0]],
                        logits[tokenizer("D").input_ids[0]],
                    ]
                ),
                dim=0,
            )
            .detach()
            .cpu()
            .numpy()
    )
    output_text = {0: "A", 1: "B", 2: "C", 3: "D"}[np.argmax(probs)]
    conf = np.max(probs)
        
    return output_text, full_input, conf.item()

In [29]:
model_path = '/Volumes/Star/Qwen2.5-7B/'

tokenizer = AutoTokenizer.from_pretrained(model_path,
                                          use_fast=True,
                                          unk_token="<unk>",
                                          bos_token="<s>", eos_token="</s>",
                                          add_bos_token=False)

model = AutoModelForCausalLM.from_pretrained(model_path, device_map='auto')

Loading checkpoint shards: 100%|██████████| 4/4 [00:05<00:00,  1.43s/it]
Some parameters are on the meta device because they were offloaded to the disk.


Load the json data.

In [30]:
data = {}
prompt = {}

with open(f"./MMLU/MMLU_ID_test.json",'r') as f:
    data = json.load(f)
    
with open(f"./MMLU/MMLU_ID_prompt.json",'r') as f:
    prompt = json.load(f)

We can see the data is organized by subjects.

In [31]:
print(data.keys())

print()
pprint(data['high_school_mathematics'][3])

dict_keys(['abstract_algebra', 'anatomy', 'astronomy', 'business_ethics', 'clinical_knowledge', 'college_biology', 'college_chemistry', 'college_computer_science', 'college_mathematics', 'college_medicine', 'college_physics', 'computer_security', 'conceptual_physics', 'econometrics', 'electrical_engineering', 'elementary_mathematics', 'formal_logic', 'global_facts', 'high_school_biology', 'high_school_chemistry', 'high_school_computer_science', 'high_school_european_history', 'high_school_geography', 'high_school_government_and_politics', 'high_school_macroeconomics', 'high_school_mathematics', 'high_school_microeconomics', 'high_school_physics'])

['At breakfast, lunch, and dinner, Joe randomly chooses with equal '
 'probabilities either an apple, an orange, or a banana to eat. On a given '
 'day, what is the probability that Joe will eat at least two different kinds '
 'of fruit?',
 '\\frac{7}{9}',
 '\\frac{8}{9}',
 '\\frac{5}{9}',
 '\\frac{9}{11}',
 'B']


Few-shot prompts also come in subjects, and each subject has a list of 5 examples.

In [32]:
print(len(prompt['high_school_mathematics']))
print(len(prompt['high_school_physics']))

5
5


We stick to one subject, `high_school_mathematics` for this example.

In [33]:
subject = 'high_school_mathematics'
data_sub = data[subject]
prompt_sub = prompt[subject]

Take one input example and generate the full prompt by calling `gen_prompt()`

In [34]:
input_text = data_sub[3]
prompt_text = gen_prompt(input_text, subject, prompt_sub)
print(prompt_text)

The following are multiple choice questions (with answers) about  high school mathematics.

Joe was in charge of lights for a dance. The red light blinks every two seconds, the yellow light every three seconds, and the blue light every five seconds. If we include the very beginning and very end of the dance, how many times during a seven minute dance will all the lights come on at the same time? (Assume that all three lights blink simultaneously at the very beginning of the dance.)
A. 3
B. 15
C. 6
D. 5
Answer:B

Five thousand dollars compounded annually at an $x\%$ interest rate takes six years to double. At the same interest rate, how many years will it take $\$300$ to grow to $\$9600$?
A. 12
B. 1
C. 30
D. 5
Answer:C

The variable $x$ varies directly as the square of $y$, and $y$ varies directly as the cube of $z$. If $x$ equals $-16$ when $z$ equals 2, what is the value of $x$ when $z$ equals $\frac{1}{2}$?
A. -1
B. 16
C. -\frac{1}{256}
D. \frac{1}{16}
Answer:C

Simplify and write th

In [35]:
output, _, conf = inference(tokenizer, model, input_text, subject, prompt_sub)



In [36]:
print(output)
print(conf)

B
0.4174620807170868


Test with zero-shot prompting.

In [37]:
zs_prompt = '''
    At breakfast, lunch, and dinner, Joe randomly chooses with equal probabilities either an apple, an orange, or a banana to eat. On a given day, what is the probability that Joe will eat at least two different kinds of fruit?
    A. \frac{7}{9}
    B. \frac{8}{9}
    C. \frac{5}{9}
    D. \frac{9}{11}
    Answer:
'''

In [38]:
output, _, conf = inference(tokenizer, model, zs_prompt, subject, prompt_data=[])
print(output)
print(conf)

C
0.40220537781715393
