# 3. More about Prompting

Welcom to the Third Lesson of the NNLG Tutorial!

In this session, we will learn about:

- Integrating previous queries

- Chain-of-Thought

- Few-Shot

Let's start by loading the model ([`HuggingFaceTB/SmolLM2-1.7B-Instruct`](https://huggingface.co/HuggingFaceTB/SmolLM2-1.7B-Instruct)) and  the trivia dataset ([`mandarjoshi/trivia_qa`](https://huggingface.co/datasets/mandarjoshi/trivia_qa)) from the previous lesson.

In [7]:
# Install Transformers Library
! pip install transformers --quiet

# Import Pipeline for the LLM and Pytorch to find the best available device
from transformers import pipeline
import torch

# Find the best available device
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu') # Use GPU if available, otherwise use CPU

# Load the model
model_identifier = 'HuggingFaceTB/SmolLM2-1.7B-Instruct'
llm = pipeline(model=model_identifier, device=device)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


In [8]:
# Install Datasets library
! pip install datasets --quiet

# Import load_dataset and Dataset
from datasets import load_dataset, Dataset

# Instantiate the Trivia QA dataset in streaming model
dataset = load_dataset(
    'mandarjoshi/trivia_qa',
    'rc',
    split='train',
    streaming=True
)

# Preprocess the dataset
def preprocess_trivia_qa(sample):
    wiki_context = []
    for title, context in zip(sample['entity_pages']['title'], sample['entity_pages']['wiki_context']):
        wiki_context.append(tuple([title, context]))
    new_sample = {
        'wiki_context':wiki_context,
        'answer':sample['answer']['value']
    }
    return new_sample

dataset = dataset.map(
    preprocess_trivia_qa,
    remove_columns=[
        'question_id',
        'question_source',
        'entity_pages',
        'search_results'
    ]
)

# Take the first 8 elements ofthe dataset
dataset = dataset.take(8)

# Convert from IterableDataset to Dataset
dataset = Dataset.from_generator(lambda: iter(dataset), features=dataset.features)

dataset[0]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Resolving data files:   0%|          | 0/26 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/26 [00:00<?, ?it/s]

{'question': 'Which American-born Sinclair won the Nobel Prize for Literature in 1930?',
 'answer': 'Sinclair Lewis',
 'wiki_context': []}


Las time we observed that, although many of the generated answers were correct, they were vey verbose. A first approach to solve this could be asking the model to answer in as few words as possible:


In [9]:
%%time
for sample in dataset:

    conversation = [
        {
            'role':'system',
            'content':'''You are a helpful assistant.'''
        },
        {
            'role':'user',
            'content':'Answer the following question using as few words as possible: '+sample['question']
        }
    ]

    prompt = llm.tokenizer.apply_chat_template(conversation, tokenize=False, add_generation_prompt=True)

    generation = llm(prompt, max_new_tokens = 32)

    new_text = generation[0]['generated_text'][len(prompt):]

    print('Question:  ', sample['question'])
    print('Answer:    ', sample['answer'])
    print('Generation:', new_text)
    print()

Starting from v4.46, the `logits` model output will have the same type as the model (except at train time, where it will always be FP32)


Question:   Which American-born Sinclair won the Nobel Prize for Literature in 1930?
Answer:     Sinclair Lewis
Generation: F. Scott Fitzgerald

Question:   Where in England was Dame Judi Dench born?
Answer:     York
Generation: England.

Question:   In which decade did Billboard magazine first publish and American hit chart?
Answer:     30s
Generation: 1950s

Question:   From which country did Angola achieve independence in 1975?
Answer:     Portugal
Generation: Angola

Question:   Which city does David Soul come from?
Answer:     Chicago
Generation: New York

Question:   Who won Super Bowl XX?
Answer:     Chicago Bears
Generation: Patriots

Question:   Which was the first European country to abolish capital punishment?
Answer:     Norway
Generation: Sweden.

Question:   In which country did he widespread use of ISDN begin in 1988?
Answer:     Japan
Generation: Germany

CPU times: user 1min 31s, sys: 1.52 s, total: 1min 32s
Wall time: 11.8 s


## 3.1 Integration of previous Queries and Chain of Thought

While now the answers are short and concise, more in line with what we expected, they are less correct than before. By limiting the size of the answer we reduced the accuracy of the model.

Let's try something else. Let's generate the long (but more accurate) answers like we did on the previous lesson and **then** have the model rephrase it so that it is short and concise.

In [10]:
%%time
for sample in dataset:

    # Generate the long answer

    conversation = [
        {
            'role':'system',
            'content':'''You are a helpful assistant.'''
        },
        {
            'role':'user',
            'content':'Answer the following question: '+sample['question']
        }
    ]

    prompt = llm.tokenizer.apply_chat_template(conversation, tokenize=False, add_generation_prompt=True)

    generation = llm(prompt, max_new_tokens = 32)

    # Shorten the answer

    long_answer = generation[0]['generated_text'][len(prompt):]

    conversation += [
        {
            'role':'assistant',
            'content': long_answer
        },
        {
            'role':'user',
            'content':'Consider the previous question and your answer, then answer the question again in as few words as possible.'
        }
    ]

    prompt = llm.tokenizer.apply_chat_template(conversation, tokenize=False, add_generation_prompt=True)

    generation = llm(prompt, max_new_tokens = 32)

    short_answer = generation[0]['generated_text'][len(prompt):]

    print('Question:    ', sample['question'])
    print('Answer:      ', sample['answer'])
    print('long answer: ', long_answer)
    print('short answer:', short_answer)
    print()

Question:     Which American-born Sinclair won the Nobel Prize for Literature in 1930?
Answer:       Sinclair Lewis
long answer:  The American-born Sinclair who won the Nobel Prize for Literature in 1930 was Sinclair Lewis. He was the first American author to win the Nobel
short answer: Sinclair

Question:     Where in England was Dame Judi Dench born?
Answer:       York
long answer:  Dame Judi Dench was born in London, England.
short answer: London, England.

Question:     In which decade did Billboard magazine first publish and American hit chart?
Answer:       30s
long answer:  Billboard magazine first published and American hit chart in the 1950s.
short answer: 1950s

Question:     From which country did Angola achieve independence in 1975?
Answer:       Portugal
long answer:  Angola achieved independence from Portugal in 1975.
short answer: Portugal.

Question:     Which city does David Soul come from?
Answer:       Chicago
long answer:  David Soul is an American actor best known 

## 3.2 Chain-of-Thought and Few-shot

That looks much better! However, now we are doing twice the computation. Maybe we can get away with doing both things in a single go. But we will need Few-shot to ensure that the model follows the expected format.

In [11]:
%%time
for sample in dataset:

    conversation = [
        {
            'role':'system',
            'content':'You are a helpful assistant.'
        },
        {
            'role':'user',
            'content':'Answer the following question: From which state was the Apollo 11 mission launched?\nThen answer it again in its shortest form.'
        },
        {
            'role':'assistant',
            'content':'The Apollo 11 mission was launched from the state of Florida.\nShortest from: Florida.'
        },
        {
            'role':'user',
            'content':'Answer the following question: Who was the author of the novel Frankenstein?\nThen answer it again in its shortest form.'
        },
        {
            'role':'assistant',
            'content':'The author of the novel Frankenstein was Mary Shelly.\nShortest form: Mary Shelley.'
        },
        {
            'role':'user',
            'content':'Answer the following question: '+sample['question']+'\nThen answer it again in its shortest form.'
        }
    ]

    prompt = llm.tokenizer.apply_chat_template(conversation, tokenize=False, add_generation_prompt=True)

    generation = llm(prompt, max_new_tokens = 32)

    new_text = generation[0]['generated_text'][len(prompt):]

    print('Question:  ', sample['question'])
    print('Answer:    ', sample['answer'])
    print(new_text)
    print()

Question:   Which American-born Sinclair won the Nobel Prize for Literature in 1930?
Answer:     Sinclair Lewis
The American-born Sinclair who won the Nobel Prize for Literature in 1930 was Sinclair Lewis.
Shortest form: Sinclair Lewis.

Question:   Where in England was Dame Judi Dench born?
Answer:     York
Dame Judi Dench was born in London, England.
Shortest form: London, England.

Question:   In which decade did Billboard magazine first publish and American hit chart?
Answer:     30s
Billboard magazine first published and American hit chart in the 1950s.
Shortest form: 1950s.

Question:   From which country did Angola achieve independence in 1975?
Answer:     Portugal
Angola achieved independence from Portugal in 1975.
Shortest form: Portugal.

Question:   Which city does David Soul come from?
Answer:     Chicago
David Soul comes from the city of New York.
Shortest form: New York.

Question:   Who won Super Bowl XX?
Answer:     Chicago Bears
The team that won Super Bowl XX was the 

We have tested different ways of enforcing the desired output format for this task. On the next lesson we will see how to provide aditional information to the model to overcome it's limitations.

## 3.3 Exercise

Can you imporve the performance of the LLM on your dataset from the second lesson by using any of the techniques discussed here? Give it a try!

### Group A

- Oyetunji ABIOYE
- Mehsen AZIZI
- Mohammad AL TAKACH
- Hawawou  Oumarou Tchapchet

In [37]:
# Your Code Here
from transformers import pipeline
import torch

# Find the best available device
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu') # Use GPU if available, otherwise use CPU

# Load the model
model_identifier = 'unsloth/Llama-3.2-3B-Instruct'
llm = pipeline(model=model_identifier, device=device)

In [38]:
from datasets import load_dataset

dataset = load_dataset(
    'ImruQays/Rasaif-Classical-Arabic-English-Parallel-texts',
    split='train',
    streaming=True
)

Resolving data files:   0%|          | 0/24 [00:00<?, ?it/s]

In [39]:
sample = dataset.__iter__().__next__()

def nested_print(key, element, level=0):
    if isinstance(element, dict):
        print(f'{"│ "*(level)}├─{key}:')
        for k, v in element.items():
            nested_print(k, v, level+1)
    else:
        print(f'{"│ "*(level)}├─{key}: {element}')

nested_print('sample', sample)

├─sample:
│ ├─ar: وبعد، فلما كان السلطان الأعظم الملك الناصر، العالم المجاهد المرابط المتاغر، المؤيد المظفر المنصور، زين الدنيا والدين، سلطان الإسلام والمسلمين، محيى العدل فى العالمين، وارث ملك ملوك العرب والعجم والترك، ظل الله فى أرضه، القائم بسنته وفرضه
│ ├─en: To proceed: Since the great Sultan, the King, the Victor, the Sage, the Just, the Struggler, the Perseverer, the Trail-blazer, the God-supported, the Conquering, the Victorious, the Ornament of the World and of Religion, the Sultan of Islam and of the Muslims, the Rejuvenator of Justice in the Worlds, the Heir of the kingdom of the Kings of the Arabs and the Persians and the Turks, Shadow of God in His land, the Upholder of God’s sunnah and of His Ordinances.


In [40]:
def preprocess_en_ar(sample):
    return {
        'source':sample['en'],
        'target':sample['ar'],
    }

dataset = dataset.map(
    preprocess_en_ar,
    remove_columns=[
            'en',
            'ar'
        ])


In [41]:
# Filter out samples that are too short or too long, we increase the lower bound to 30 because Arabic is a more verbose language
def filter_en_ar(sample):
    return (len(sample['source']) >= 30) and (len(sample['source']) <= 60) and (len(sample['target']) >= 30) and (len(sample['target']) <= 60)

dataset = dataset.filter(filter_en_ar)

In [42]:
dataset = dataset.take(16)

In [43]:
dataset = Dataset.from_generator(lambda: iter(dataset), features=dataset.features)
dataset['source']

['Chapter One: about the maintenance of caution generally.',
 'Chapter One: about the choosing of the site for camping.',
 'Chapter Two: about the method of night raiding.',
 'Chapter Three: about the method of investment.',
 '“You never do anything right!” sighed Abu‘Abdullah.',
 '“So what should I do?” moaned the sheikh.',
 'Thumama got extremely upset when his house burned down.',
 'As a result, good drinking water went to waste.',
 'Now there’s just the problem of what to do with the blood.',
 'Suddenly I saw her relax and smile to herself.',
 '‘You must have realised what to do with the blood.',
 'In general, soups spare you demands for water and wine.',
 'Soup fills you up and dulls the appetite.',
 'Sip hot soup and you save on fuel bills and padded clothes.',
 'Take it from one who knows, as a piece of friendly advice.”',
 'I’d become a highway robber, a secret agent or a spy.']

In [None]:
%%time
for sample in dataset:

    conversation = [
        {
            'role':'system',
            'content':'''You are an expert translator specializing in English to Arabic translation.
                         If you can't find a translation, you should say: I do not know the answer.'''
        },
        {
            'role':'user',
            'content':'You must Translate the following English text to Arabic only: Pannum is bread donated charitably to prisoners and beggars.'
        },
        {
            'role':'assistant',
            'content':'The following English text: Pannum is bread donated charitably to prisoners and beggars. is translated to Arabic as: البنوم هو خبز يتم التبرع به بشكل خيري للسجناء والمتسولين.'
        },
        {
            'role':'user',
            'content':'You must Translate the following English text to Arabic only: But then I see them dip it in the mustard.'
        },
        {
            'role':'assistant',
            'content':'The following English text: But then I see them dip it in the mustard. is translated to Arabic as:   ولكن بعد ذلك أراهم يغمسونه في الخردل.'
        },
        {
            'role':'user',
            'content':'You must Translate the following English text to Arabic only: Boy, that chicken was tough. Bring me one that’s tender!'
        },
        {
            'role':'assistant',
            'content':'The following English text: Boy, that chicken was tough. Bring me one that’s tender! is translated to Arabic as: يا صبي، كان الدجاج قاسيًا. أحضر لي واحدًا طريًا!'
        },
        {
            'role':'user',
            'content':'You must Translate the following English text to Arabic only: I praise Him as befits His honor and sublime glory.'
        },
        {
            'role':'assistant',
            'content':'The following English text: I praise Him as befits His honor and sublime glory. is translated to Arabic as: أثني عليه كما يليق بعظمته ومجده السامي.'
        },
        {
            'role':'user',
            'content':'You must Translate the following English text to Arabic only: This vizir was a man of great shrewdness and abilities.'
        },
        {
            'role':'assistant',
            'content':'The following English text: This vizir was a man of great shrewdness and abilities. is translated to Arabic as: كان هذا الوزير رجلاً ذا ذكاء وقدرات عظيمة.'
        },
        {
            'role':'user',
            'content':'You must Translate the following English text to Arabic only: This Yazid was the ancestor of al-Wazir al-Muhallabi.'
        },
        {
            'role':'assistant',
            'content':'The following English text: This Yazid was the ancestor of al-Wazir al-Muhallabi. is translated to Arabic as: كان هذا اليزيد سلف الوزير المهلبي.'
        },
        {
            'role':'user',
            'content':'You must Translate the following English text to Arabic only: He dwell for some time at Abadan and also at Basra.'
        },
        {
            'role':'assistant',
            'content':'The following English text: He dwell for some time at Abadan and also at Basra. is translated to Arabic as: عاش لبعض الوقت في آبادان وأيضًا في البصرة.'
        },
        {
            'role':'user',
            'content':'You must Translate the following English text to Arabic only: This verse forms part of a long qasida.'
        },
        {
            'role':'assistant',
            'content':'The following English text: This verse forms part of a long qasida. is translated to Arabic as: يشكل هذا البيت جزءًا من قصيدة طويلة.'
        },

        {
            'role':'user',
            'content':'You must Translate the following English text to Arabic only: '+ sample['source']
        }
    ]

    prompt = llm.tokenizer.apply_chat_template(conversation, tokenize=False, add_generation_prompt=True)

    generation = llm(prompt, max_new_tokens = 128)

    new_text = generation[0]['generated_text'][len(prompt):]

    print('Source:  ', sample['source'])
    print('Target:    ', sample['target'])
    print('Generation:', new_text)
    print()
    # break

Source:   Chapter One: about the maintenance of caution generally.
Target:     الفصل الأول: فى أخذ الحذر فى الجملة.
Generation: The following English text: Chapter One: about the maintenance of caution generally. is translated to Arabic as: فصل أول: عن الحفاظ على الحذر بشكل عام.

Source:   Chapter One: about the choosing of the site for camping.
Target:     الفصل الأول: فى اختيار موضع المنزل.
Generation: The following English text: Chapter One: about the choosing of the site for camping. is translated to Arabic as:kapitول أحد: عن اختيار المكان للاصطلاح.

Source:   Chapter Two: about the method of night raiding.
Target:     الفصل الثانى: فى كيفية البيات.
Generation: The following English text: Chapter Two: about the method of night raiding. is translated to Arabic as: chapter الثانية: عن طريقة الغنaim

Source:   Chapter Three: about the method of investment.
Target:     الفصل الثالث: فى كيفية الحصار.
Generation: The following English text: Chapter Three: about the method of investment. 