<h1>Chapter 3 - Looking Inside Transformer LLMs</h1>
<i>An extensive look into the transformer architecture of generative LLMs</i>

<a href="https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961"><img src="https://img.shields.io/badge/Buy%20the%20Book!-grey?logo=amazon"></a>
<a href="https://www.oreilly.com/library/view/hands-on-large-language/9781098150952/"><img src="https://img.shields.io/badge/O'Reilly-white.svg?logo=data:image/svg%2bxml;base64,PHN2ZyB3aWR0aD0iMzQiIGhlaWdodD0iMjciIHZpZXdCb3g9IjAgMCAzNCAyNyIgZmlsbD0ibm9uZSIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj4KPGNpcmNsZSBjeD0iMTMiIGN5PSIxNCIgcj0iMTEiIHN0cm9rZT0iI0Q0MDEwMSIgc3Ryb2tlLXdpZHRoPSI0Ii8+CjxjaXJjbGUgY3g9IjMwLjUiIGN5PSIzLjUiIHI9IjMuNSIgZmlsbD0iI0Q0MDEwMSIvPgo8L3N2Zz4K"></a>
<a href="https://github.com/HandsOnLLM/Hands-On-Large-Language-Models"><img src="https://img.shields.io/badge/GitHub%20Repository-black?logo=github"></a>
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/HandsOnLLM/Hands-On-Large-Language-Models/blob/main/chapter03/Chapter%203%20-%20Looking%20Inside%20LLMs.ipynb)

---

This notebook is for Chapter 3 of the [Hands-On Large Language Models](https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961) book by [Jay Alammar](https://www.linkedin.com/in/jalammar) and [Maarten Grootendorst](https://www.linkedin.com/in/mgrootendorst/).

---

<a href="https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961">
<img src="https://raw.githubusercontent.com/HandsOnLLM/Hands-On-Large-Language-Models/main/images/book_cover.png" width="350"/></a>

### [OPTIONAL] - Installing Packages on <img src="https://colab.google/static/images/icons/colab.png" width=100>

If you are viewing this notebook on Google Colab (or any other cloud vendor), you need to **uncomment and run** the following codeblock to install the dependencies for this chapter:

---

💡 **NOTE**: We will want to use a GPU to run the examples in this notebook. In Google Colab, go to
**Runtime > Change runtime type > Hardware accelerator > GPU > GPU type > T4**.

---


In [1]:
!pip uninstall transformers --yes

Found existing installation: transformers 4.41.2
Uninstalling transformers-4.41.2:
  Successfully uninstalled transformers-4.41.2


In [2]:
%%capture
!pip install transformers==4.41.2 accelerate>=0.31.0

In [3]:
!pip install opencv-python



# Loading the LLM

In [6]:
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")

model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-4k-instruct",
    device_map="cuda",
    torch_dtype="auto",
    trust_remote_code=True,
)

# Create a pipeline
generator = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    return_full_text=False,
    max_new_tokens=200,
    do_sample=False,
)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

# The Inputs and Outputs of a Trained Transformer LLM


In [11]:
instruction_prompt = "Write an email apologizing to Sarah for the gardening mishap. Explain how it happened."
format_prompt = " Make sure the format is a bullet-point format"
tone_prompt ="Your tone should be very informal as Sarah is a good friend. Street wise and informal tone."

prompt = instruction_prompt + format_prompt + tone_prompt

output = generator(prompt)

email_text = output[0]['generated_text']
#print(output)
print(email_text)



Subject: Oopsie Daisy! 🌼

Hey Sarah!

I've gotta spill the beans about the gardening fiasco that happened last weekend. I'm so sorry, but here's the lowdown:

- I was all set to surprise you with a freshly planted rose bush in your garden.
- I got a bit carried away and accidentally dug up your favorite tulip bulbs instead.
- I know how much those tulips meant to you, and I feel like a total klutz.

I've already ordered some new bulbs and I'm planning to replant them as soon as they arrive. I promise to be extra careful this time and make it up to you.

I hope you can forgive my gardening blunder. I'm really sorry for the mix-up and the disappointment it caused


In [8]:
#inspect the model
print(model)

Phi3ForCausalLM(
  (model): Phi3Model(
    (embed_tokens): Embedding(32064, 3072, padding_idx=32000)
    (embed_dropout): Dropout(p=0.0, inplace=False)
    (layers): ModuleList(
      (0-31): 32 x Phi3DecoderLayer(
        (self_attn): Phi3Attention(
          (o_proj): Linear(in_features=3072, out_features=3072, bias=False)
          (qkv_proj): Linear(in_features=3072, out_features=9216, bias=False)
          (rotary_emb): Phi3RotaryEmbedding()
        )
        (mlp): Phi3MLP(
          (gate_up_proj): Linear(in_features=3072, out_features=16384, bias=False)
          (down_proj): Linear(in_features=8192, out_features=3072, bias=False)
          (activation_fn): SiLU()
        )
        (input_layernorm): Phi3RMSNorm()
        (resid_attn_dropout): Dropout(p=0.0, inplace=False)
        (resid_mlp_dropout): Dropout(p=0.0, inplace=False)
        (post_attention_layernorm): Phi3RMSNorm()
      )
    )
    (norm): Phi3RMSNorm()
  )
  (lm_head): Linear(in_features=3072, out_features=3206

# Choosing a single token from the probability distribution (sampling / decoding)

In [8]:
prompt = "The capital of France is"

# Tokenize the input prompt
input_ids = tokenizer(prompt, return_tensors="pt").input_ids

# Tokenize the input prompt
input_ids = input_ids.to("cuda")

# Get the output of the model before the lm_head
model_output = model.model(input_ids)

# Get the output of the lm_head
lm_head_output = model.lm_head(model_output[0])

In [9]:
token_id = lm_head_output[0,-1].argmax(-1)
tokenizer.decode(token_id)

'Paris'

In [10]:
model_output[0].shape

torch.Size([1, 5, 3072])

In [11]:
lm_head_output.shape

torch.Size([1, 5, 32064])

## Exercise: Chain of Thought Prompting

In [16]:
question_prompt ="How many letters 'r' in the word strawberry?"
instruction_prompt ="Simply ask the question. Keep your answer as short as possible."

prompt = question_prompt + instruction_prompt

output = generator(prompt)

email_text = output[0]['generated_text']

print(email_text)



The word "strawberry" contains 2 letters 'r'.


In [17]:
question_prompt ="How many letters 'r' in the word strawberry?"
chain_of_thought_prompt ="Think good and deep before you answer. Help me think step by step."

prompt = question_prompt + chain_of_thought_prompt

output = generator(prompt)

email_text = output[0]['generated_text']

print(email_text)



To find the number of 'r's in the word "strawberry," we can follow these steps:

1. Write down the word: strawberry
2. Identify the letter 'r' in the word.
3. Count the number of times 'r' appears in the word.

Now, let's perform the steps:

1. The word is "strawberry."
2. The letter 'r' appears in the word.
3. Counting the 'r's, we find that there are three 'r's in "strawberry."

So, there are three letters 'r' in the word "strawberry."
