## Tutorial: Exploring Large Langauge Models

**Note: This tutorial is graded. Please complete the exercises and turn it under Canvas->Files>-Week13**

Also Note: IN colab, enable GPU before running this code.

## Overview:
Today, we'll delve into the fascinating world of large language models, specifically focusing on loading and utilizing instruction-tuned models. Our main language model for this session is Microsoft's `Phi2`, a remarkable language model boasting 2.7 billion parameters. Phi2 stands out for its exceptional reasoning and language understanding capabilities, setting the bar high among base models with less than 13 billion parameters.

You can learn more about the `Phi2` model by visiting this link: https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/

Phi-2 has been made part of the Huggingface Ecosystem (https://huggingface.co/microsoft/phi-2) and can be loaded and used through the `transformers` library.

## Loading Phi2 model

Since Phi2 is already available as a part of Huggingface platform, we can load it using the transformers library in similar manner as we loaded BERT and other models in previous labs.

We will also need the `accelerate` library for efficient model loading and data processing.

In [6]:
%pip install accelerate
%pip install --upgrade transformers


Collecting transformers
  Downloading transformers-4.39.3-py3-none-any.whl (8.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.8/8.8 MB[0m [31m61.3 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: transformers
  Attempting uninstall: transformers
    Found existing installation: transformers 4.38.2
    Uninstalling transformers-4.38.2:
      Successfully uninstalled transformers-4.38.2
Successfully installed transformers-4.39.3


In [1]:
from transformers import AutoTokenizer, PhiForCausalLM
import torch

# load model tokenizer
tokenizer = AutoTokenizer.from_pretrained(
    "microsoft/phi-2",
    trust_remote_code = True
)

# load model
model = PhiForCausalLM.from_pretrained(
    "microsoft/phi-2",
    torch_dtype = "auto",
    device_map = "auto",
    trust_remote_code = True
)
model.eval()

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

PhiForCausalLM(
  (model): PhiModel(
    (embed_tokens): Embedding(51200, 2560)
    (embed_dropout): Dropout(p=0.0, inplace=False)
    (layers): ModuleList(
      (0-31): 32 x PhiDecoderLayer(
        (self_attn): PhiSdpaAttention(
          (q_proj): Linear(in_features=2560, out_features=2560, bias=True)
          (k_proj): Linear(in_features=2560, out_features=2560, bias=True)
          (v_proj): Linear(in_features=2560, out_features=2560, bias=True)
          (dense): Linear(in_features=2560, out_features=2560, bias=True)
          (rotary_emb): PhiRotaryEmbedding()
        )
        (mlp): PhiMLP(
          (activation_fn): NewGELUActivation()
          (fc1): Linear(in_features=2560, out_features=10240, bias=True)
          (fc2): Linear(in_features=10240, out_features=2560, bias=True)
        )
        (input_layernorm): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)
        (resid_dropout): Dropout(p=0.1, inplace=False)
      )
    )
    (final_layernorm): LayerNorm((256

## Preparing the prompt



In [2]:
prompt = """Write a short summary of the main idea and the key points of the following paragraph:

Input: The Mount Rushmore National Memorial is a national memorial centered on a colossal
sculpture carved into the granite face of Mount Rushmore
in the Black Hills near Keystone, South Dakota, United States.
Sculptor Gutzon Borglum designed the sculpture, called Shrine of Democracy, and oversaw
the project's execution from 1927 to 1941 with the help of his son, Lincoln Borglum.
The sculpture features the 60-foot-tall (18 m) heads of four
United States presidents: George Washington, Thomas Jefferson,
Theodore Roosevelt, and Abraham Lincoln, chosen to represent the nation's birth, growth,
development and preservation, respectively.
Mount Rushmore attracts more than two million visitors annually
to the memorial park which covers 1,278 acres (2.00 sq mi; 5.17 km2).
The mountain's elevation is 5,725 feet (1,745 m) above sea level.

"""


In [3]:
token_ids = tokenizer.encode(prompt, add_special_tokens=False ,return_tensors="pt")
output_ids = model.generate(
      token_ids.to(model.device),
      max_new_tokens=100,
    )

output = tokenizer.decode(output_ids[0][token_ids.size(1) :])
print(output)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



Output: The paragraph is about the Mount Rushmore National Memorial, a national monument that honors four U.S. presidents with a massive sculpture carved into a mountain. The paragraph gives some background information on the sculptor, the design, and the location of the memorial, as well as some statistics on its popularity and size.
<|endoftext|>


## Exploring Chain of Throught (CoT) for in context learning:
Chain of Thought (CoT) in large language models (LLMs) refers to a strategic approach in crafting text prompts to guide the model through a sequence of logical steps to accomplish a task.

By chaining together these sequential steps, users can leverage the LLM's capabilities more effectively to generate desired outputs. We also avoid expensive fine-tuning of models to teach them how to perform certain tasks.

Let's try to do Part of Speech (PoS) tagging with LLMs. We will try to PoS tag sentence with and without CoT.

In [13]:
prompt = """Find part-of-speech tags for each token in the input sentence:

The quick brown fox jumped over the lazy dog"

"""

In [14]:
token_ids = tokenizer.encode(prompt, add_special_tokens=False ,return_tensors="pt")
output_ids = model.generate(
      token_ids.to(model.device),
      max_new_tokens=100,
    )

output = tokenizer.decode(output_ids[0][token_ids.size(1) :])
print(output)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



```python
import nltk

# Define input sentence
sentence = "The quick brown fox jumped over the lazy dog"

# Tokenize sentence
tokens = nltk.word_tokenize(sentence)

# Find part-of-speech tags for each token
pos_tags = nltk.pos_tag(tokens)

# Print part-of-speech tags
print(pos_tags


As we can see, the output is a piece of broken python code and is not the expected output. Let's try to guide the model to do better.

In [15]:
cot_prompt = """Find part-of-speech tags for each token in the input sentence:

Input: "The quick brown fox jumped over the lazy dog"

Example: Input: "Marry had a little lamb"
Output: [("Marry","NOUN"), ("had", "VERB"), ("a", "ARTICLE"), ("little", "ADJECTIVE"), ("lamb", "NOUN")]

Output:
"""

In [16]:
token_ids = tokenizer.encode(cot_prompt, add_special_tokens=False ,return_tensors="pt")
output_ids = model.generate(
      token_ids.to(model.device),
      max_new_tokens=100,
    )

output = tokenizer.decode(output_ids[0][token_ids.size(1) :])
print(output)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[("The", "DET"), ("quick", "ADJECTIVE"), ("brown", "ADJECTIVE"), ("fox", "NOUN"), ("jumped", "VERB"), ("over", "ADP"), ("the", "DET"), ("lazy", "ADJECTIVE"), ("dog", "NOUN")]

Solution:

```python
def get_pos_tags(sentence):
    tokens = nltk.


Better? Perhaps. But we need to tell the system not to output any python code.

In [18]:

modified_cot_prompt = """Find part-of-speech tags for each token in the input sentence. Do not print any Python code:

Input: "The quick brown fox jumped over the lazy dog"

Example: Input: "Marry had a little lamb"
Output: [("Marry","NOUN"), ("had", "VERB"), ("a", "ARTICLE"), ("little", "ADJECTIVE"), ("lamb", "NOUN")]

Output:
"""

In [19]:
token_ids = tokenizer.encode(cot_prompt, add_special_tokens=False ,return_tensors="pt")
output_ids = model.generate(
      token_ids.to(model.device),
      max_new_tokens=100,
    )

output = tokenizer.decode(output_ids[0][token_ids.size(1) :])
print(output)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[("The", "DET"), ("quick", "ADJECTIVE"), ("brown", "ADJECTIVE"), ("fox", "NOUN"), ("jumped", "VERB"), ("over", "ADP"), ("the", "DET"), ("lazy", "ADJECTIVE"), ("dog", "NOUN")]
<|endoftext|>


That's what we need. So, in-context learning helped improve model's output.

## Exercise E1: Explore different prompting strategies for Named Entity Recognition Task

1. As we have emphasized earlier, Named Entity Recognition aims at identifying proper nouns such as names of persons, organizations, geographical locations, word of art etc from the input text. NER serves as the backbone for many important information extraction related applications.

2. In this exercise, explore various prompting strategies to identify and label named entities present in the input text. Start with a simple prompt and observe model's output.

3. Apply CoT strategies similary to the PoS example above. Do you see any difference in results? Communicate your observations properly.

