<a href="https://colab.research.google.com/github/AmirJlr/LLMs/blob/master/04_in_context_learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**In-context learning (ICL)** allows large language models (LLMs) to perform new tasks without explicitly training them on new data. Instead, they learn from examples included in the prompt itself, like a teacher guiding a student. This approach is task-specific and transient, meaning the model adapts to the examples provided in the prompt only for that particular task.


## Here's a breakdown of how in-context learning works:


- #### Task-Specific Learning:
ICL enables LLMs to learn new tasks on the fly, rather than requiring extensive retraining for each new task.


- #### Prompt-Based Learning:
The model learns from examples included in the prompt, not from a separate training dataset.


- #### Zero-Shot/Few-Shot Learning:
ICL is often used in conjunction with zero-shot or few-shot prompting, where the model is provided with examples or instructions within the prompt.


- #### Example-Driven Learning:
The key is to provide clear and diverse examples that demonstrate the desired task and format within the prompt.


- #### Not Explicitly Finetuned:
Unlike traditional finetuning where model parameters are updated, ICL relies on the model's pre-trained knowledge and the examples provided in the prompt.


## Why In-Context Learning is Important:

- #### Flexibility:
ICL allows LLMs to adapt to new tasks without requiring extensive retraining, making them more versatile.


- #### Cost-Effective:
ICL reduces the need for large, labeled datasets, making it a more cost-effective way to train and deploy LLMs.


- #### Rapid Prototyping:
ICL allows for quick experimentation with different tasks and prompts, enabling rapid prototyping of new applications.


## Examples of In-Context Learning:


- #### Sentiment Analysis:
Providing example sentences and their corresponding sentiment labels in the prompt allows the model to perform sentiment analysis on new text.


- #### Question Answering:
Including examples of questions and answers based on a specific text allows the model to answer new questions about that text.


- #### Translation:
Providing examples of translated sentences in the prompt allows the model to translate new sentences.


In essence, in-context learning empowers LLMs to learn and perform new tasks by leveraging their pre-trained knowledge and the examples provided in the prompt.

In [1]:
import torch
from transformers import AutoTokenizer, pipeline

# model_id = "togethercomputer/Llama-2-7B-32K-Instruct"
model_id = "NousResearch/Llama-2-7b-chat-hf"


tokenizer = AutoTokenizer.from_pretrained(model_id)
pipeline = pipeline(
    "text-generation",
    model=model_id,
    tokenizer=tokenizer,
    torch_dtype=torch.float16,
    device_map="auto",
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/746 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/21.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/435 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/583 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/200 [00:00<?, ?B/s]

Device set to use cuda:0


In [4]:
prompt_template = """[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Answer exactly in few words.
<</SYS>>

Answer the question below:

Janet's ducks lay 16 eggs per day.
She eats three for breakfast every morning and bakes muffins for her friends every day with four.
She sells the remainder at the farmers' market daily for $2.5 per fresh duck egg.
How much in dollars does she make every day at the farmers' market?

Return final results in numeric format.
[/INST]
""".strip()


sequences = pipeline(
    prompt_template,
    do_sample=True,
    top_k=10,
    temperature=0.7,
    top_p=0.95,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
    max_length=256,
)

for seq in sequences:
    print(f"Result: {seq['generated_text']}")


Both `max_new_tokens` (=256) and `max_length`(=256) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Result: [INST] <<SYS>>
You are a helpful, respectful and honest assistant. Answer exactly in few words.
<</SYS>>

Answer the question below:

Janet's ducks lay 16 eggs per day. 
She eats three for breakfast every morning and bakes muffins for her friends every day with four.
She sells the remainder at the farmers' market daily for $2.5 per fresh duck egg. 
How much in dollars does she make every day at the farmers' market?

Return final results in numeric format.
[/INST]  Janet makes $40 per day at the farmers' market.


### Add `Let's think step by step then return final results in numeric format.` to prompt => Zero-Shot in-context learning

In [5]:
prompt_template = """[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Answer exactly in few words.
<</SYS>>

Answer the question below:

Janet's ducks lay 16 eggs per day.
She eats three for breakfast every morning and bakes muffins for her friends every day with four.
She sells the remainder at the farmers' market daily for $2.5 per fresh duck egg.
How much in dollars does she make every day at the farmers' market?

Let's think step by step then return final results in numeric format.
[/INST]
""".strip()


sequences = pipeline(
    prompt_template,
    do_sample=True,
    top_k=10,
    temperature=0.7,
    top_p=0.95,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
    max_length=256,
)

for seq in sequences:
    print(f"Result: {seq['generated_text']}")


Both `max_new_tokens` (=256) and `max_length`(=256) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Result: [INST] <<SYS>>
You are a helpful, respectful and honest assistant. Answer exactly in few words.
<</SYS>>

Answer the question below:

Janet's ducks lay 16 eggs per day. 
She eats three for breakfast every morning and bakes muffins for her friends every day with four. 
She sells the remainder at the farmers' market daily for $2.5 per fresh duck egg. 
How much in dollars does she make every day at the farmers' market?

Let's think step by step then return final results in numeric format.
[/INST]  Sure, I'd be happy to help! Here's the calculation:

Janet's ducks lay 16 eggs per day.
She eats 3 eggs for breakfast every morning, so 16 - 3 = 13 eggs remain.
She bakes muffins with 4 eggs, so 13 - 4 = 9 eggs remain.
She sells the remaining eggs at the farmers' market for $2.5 per egg, so 9 eggs x $2.5 = $22.50 per day.

Therefore, Janet makes $22.50 per day at the farmers' market.
