<a href="https://colab.research.google.com/github/bhaskarfx/TextAnalytics/blob/master/LLM_HandsOn_Part_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

---

💡 **NOTE**: We will use a GPU to run the examples in this notebook. In Google Colab, go to
**Runtime > Change runtime type > GPU type T4**.

---

# **Large Language Models (LLM)**

# Overview:


## 1.   load model onto the GPU for faster inference
## 2.   Comparing Trained LLM Tokenizers
## 3.   Contextualized Word Embeddings From a Language Model (Like BERT)
## 4.   Looking Inside Transformer LLMs



https://github.com/bhaskarfx/TextAnalytics/blob/master/LLM_HandsOn.ipynb

### [OPTIONAL] - Installing Packages on <img src="https://colab.google/static/images/icons/colab.png" width=100>

If you are viewing this notebook on Google Colab (or any other cloud vendor), you need to **uncomment and run** the following codeblock to install the dependencies for this chapter:

In [None]:
# %%capture
# !pip install transformers>=4.40.1 accelerate>=0.27.2

# Step 1: load model onto the GPU for faster inference.
Note that we load the model and tokenizer separately (although that isn't always necessary).

There are two types of language modeling,
* causal  language modeling and
* masked  language modeling

###Auto Classes
In many cases, the architecture you want to use can be guessed from the name or the path of the pretrained model you are supplying to the **from_pretrained()** method. AutoClasses are here to do this job for you so that you automatically retrieve the relevant model given the name/path to the pretrained weights/config/vocabulary.

* Instantiating one of **AutoConfig, AutoModel, and AutoTokenizer** will directly create a class of the relevant architecture. For instance
* **AutoModelForCausalLM** This is a generic model class that will be instantiated as one of the model classes of the library (with a causal language modeling head) when created with the **from_pretrained()** class method or the **from_config()** class method.

`model = AutoModel.from_pretrained("google-bert/bert-base-cased")`

### Phi-3

*   The **Phi-3-Mini-4K-Instruct** is a *3.8B parameters, lightweight, state-of-the-art open model*
*   trained with the **Phi-3 datasets** that includes both synthetic data and the filtered publicly available websites data
*  two variants 4K and 128K which is the context length (in tokens) that it can support.

visit: https://azure.microsoft.com/en-us/products/phi/



In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model and tokenizer
#cuda is used to set up and run CUDA (GPU) operations
model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-4k-instruct",
    device_map="cuda",
    torch_dtype="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

# Create pipeline
<i> a pipeline is a series of steps that ensures data is properly prepared for building applications.</i>

Although we can now use the model and tokenizer directly, it's much easier to wrap it in a `pipeline` object:

In [None]:
from transformers import pipeline

# Create a pipeline
generator = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    return_full_text=False,
    max_new_tokens=50,
    do_sample=False
)

Device set to use cuda


#create prompt as a user and give it to the model:

A prompt is a text or instruction that is given to a large language model (LLM) to generate an output. The quality of the prompt determines the quality and relevance of the response from the LLM.

In [None]:
# The prompt (user input / query)
messages = [
    {"role": "user", "content": "Create a funny joke about chickens."}
]

# Generate output
output = generator(messages)
print(output[0]["generated_text"])

The `seen_tokens` attribute is deprecated and will be removed in v4.41. Use the `cache_position` model input instead.
`get_max_cache()` is deprecated for all Cache classes. Use `get_max_cache_shape()` instead. Calling `get_max_cache()` will raise error from v4.48


 Why did the chicken join the band? Because it had the drumsticks!


In [None]:
# The prompt (user input / query)
messages = [
    {"role": "user", "content": "Write a poem on school bus."}
]

# Generate output
output = generator(messages)
print(output[0]["generated_text"])

 The school bus rolls down the road,
With a red and yellow hue,
It's a sight that always brings a smile,
As it carries children to and from school.

The engine hums a gentle t


In [None]:
# The prompt (user input / query)
messages = [
    {"role": "user", "content": "How trump used tarrif."}
]

# Generate output
output = generator(messages)
print(output[0]["generated_text"])

 I'm sorry, but I cannot provide real-time or current information on political matters. As an AI developed by Microsoft, I don't have the ability to browse the internet or access real-time data. My training only includes


# Tokens and Token Embeddings

### [OPTIONAL] - Installing Packages on <img src="https://colab.google/static/images/icons/colab.png" width=100>

If you are viewing this notebook on Google Colab (or any other cloud vendor), you need to **uncomment and run** the following codeblock to install the dependencies for this chapter:

In [None]:
# %%capture
# !pip install transformers>=4.41.2 sentence-transformers>=3.0.1 gensim>=4.3.2 scikit-learn>=1.5.0 accelerate>=0.31.0

In [None]:
prompt = "Write an email apologizing to Sarah for the tragic gardening mishap. Explain how it happened.<|assistant|>"

# Tokenize the input prompt
#If set, will return tensors instead of list of python integers
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to("cuda")

# Generate the text
generation_output = model.generate(
  input_ids=input_ids,
  max_new_tokens=100
)

# Print the output
print(tokenizer.decode(generation_output[0]))

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


Write an email apologizing to Sarah for the tragic gardening mishap. Explain how it happened.<|assistant|> Subject: Sincere Apologies for the Gardening Mishap


Dear Sarah,


I hope this message finds you well. I am writing to express my deepest apologies for the unfortunate incident that occurred in your garden yesterday.


As you know, I have always admired the beauty and tranquility of your garden. It was with great disappointment that I witnessed the accidental damage caused to your beloved rose bushes


In [None]:
prompt = "Write an leave application to the school as you have bad health condition.<|assistant|>"

# Tokenize the input prompt
#If set, will return tensors instead of list of python integers
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to("cuda")

# Generate the text
generation_output = model.generate(
  input_ids=input_ids,
  max_new_tokens=200
)

# Print the output
print(tokenizer.decode(generation_output[0]))

Write an leave application to the school as you have bad health condition.<|assistant|> [Your Name]
[Your Address]
[City, State, Zip Code]
[Email Address]
[Phone Number]
[Date]

[Principal's Name]
[School Name]
[School Address]
[City, State, Zip Code]

Subject: Leave Application Due to Health Condition

Respected [Principal's Name],

I hope this letter finds you in good health and high spirits. I am writing to formally request a leave of absence from [School Name] due to a sudden and severe health condition that requires immediate attention and rest.

I have been experiencing [briefly describe your health condition] for the past few days, and despite my best efforts to manage it, my condition has worsened. I have consulted with my doctor, who has advised me to take a break from my regular activities and focus on my recovery.




In [None]:
#The input_ids represent the indices of tokens in the tokenizer's vocabulary
print(input_ids)

tensor([[14350,   385,  5967,  2280,   304,   278,  3762,   408,   366,   505,
          4319,  9045,  4195, 29889, 32001]], device='cuda:0')


In [None]:
for id in input_ids[0]:
   print(tokenizer.decode(id))

Write
an
leave
application
to
the
school
as
you
have
bad
health
condition
.
<|assistant|>


In [None]:
generation_output

tensor([[14350,   385,  5967,  2280,   304,   278,  3762,   408,   366,   505,
          4319,  9045,  4195, 29889, 32001,   518, 10858,  4408, 29962,    13,
         29961, 10858, 16428, 29962,    13, 29961, 16885, 29892,  4306, 29892,
           796,   666,  5920, 29962,    13, 29961,  9823, 16428, 29962,    13,
         29961,  9861,  9681, 29962,    13, 29961,  2539, 29962,    13,    13,
         29961,  4040, 26706, 29915, 29879,  4408, 29962,    13, 29961,  4504,
          1507,  4408, 29962,    13, 29961,  4504,  1507, 16428, 29962,    13,
         29961, 16885, 29892,  4306, 29892,   796,   666,  5920, 29962,    13,
            13, 20622, 29901,   951,  1351,  8427, 16809,   304, 15202, 11790,
           654,    13,    13,  1666,  6021,   518,  4040, 26706, 29915, 29879,
          4408,  1402,    13,    13, 29902,  4966,   445,  5497, 14061,   366,
           297,  1781,  9045,   322,  1880, 26829, 29889,   306,   626,  5007,
           304, 28269,  2009,   263,  5967,   310, 1

In [None]:
print(tokenizer.decode(4408))
print(tokenizer.decode([14350, 385]))

Name
Write an


# Comparing Trained LLM Tokenizers


In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer

colors_list = [
    '102;194;165', '252;141;98', '141;160;203',
    '231;138;195', '166;216;84', '255;217;47'
]

def show_tokens(sentence, tokenizer_name):
    tokenizer = AutoTokenizer.from_pretrained(tokenizer_name)
    token_ids = tokenizer(sentence).input_ids
    for idx, t in enumerate(token_ids):
        print(
            f'\x1b[0;30;48;2;{colors_list[idx % len(colors_list)]}m' +
            tokenizer.decode(t) +
            '\x1b[0m',
            end=' '
        )

In [None]:
text = """
English and CAPITALIZATION
🎵 鸟
show_tokens False None elif == >= else: two tabs:"    " Three tabs: "       "
12.0*50=600
"""

### Use bert-base-uncased tokenizer
When a BERT-base-uncased tokenizer generates a "##" symbol before a subword, it signifies that the subword is part of a larger word and is not a standalone word on its own; essentially indicating that the word was split into smaller pieces during tokenization, a key feature of BERT's subword-based tokenization approach using the WordPiece algorithm.
* [CLS]: special start token
* [SEP]: special end token
* [UNK]: unknown

In [None]:
show_tokens(text, "bert-base-uncased")

[0;30;48;2;102;194;165m[CLS][0m [0;30;48;2;252;141;98menglish[0m [0;30;48;2;141;160;203mand[0m [0;30;48;2;231;138;195mcapital[0m [0;30;48;2;166;216;84m##ization[0m [0;30;48;2;255;217;47m[UNK][0m [0;30;48;2;102;194;165m[UNK][0m [0;30;48;2;252;141;98mshow[0m [0;30;48;2;141;160;203m_[0m [0;30;48;2;231;138;195mtoken[0m [0;30;48;2;166;216;84m##s[0m [0;30;48;2;255;217;47mfalse[0m [0;30;48;2;102;194;165mnone[0m [0;30;48;2;252;141;98meli[0m [0;30;48;2;141;160;203m##f[0m [0;30;48;2;231;138;195m=[0m [0;30;48;2;166;216;84m=[0m [0;30;48;2;255;217;47m>[0m [0;30;48;2;102;194;165m=[0m [0;30;48;2;252;141;98melse[0m [0;30;48;2;141;160;203m:[0m [0;30;48;2;231;138;195mtwo[0m [0;30;48;2;166;216;84mtab[0m [0;30;48;2;255;217;47m##s[0m [0;30;48;2;102;194;165m:[0m [0;30;48;2;252;141;98m"[0m [0;30;48;2;141;160;203m"[0m [0;30;48;2;231;138;195mthree[0m [0;30;48;2;166;216;84mtab[0m [0;30;48;2;255;217;47m##s[0m [0;30;48;2;102;194;165m:[0m [0;30;48;2;25

### Use bert-base-cased tokenizer

In [None]:
show_tokens(text, "bert-base-cased")

[0;30;48;2;102;194;165m[CLS][0m [0;30;48;2;252;141;98mEnglish[0m [0;30;48;2;141;160;203mand[0m [0;30;48;2;231;138;195mCA[0m [0;30;48;2;166;216;84m##PI[0m [0;30;48;2;255;217;47m##TA[0m [0;30;48;2;102;194;165m##L[0m [0;30;48;2;252;141;98m##I[0m [0;30;48;2;141;160;203m##Z[0m [0;30;48;2;231;138;195m##AT[0m [0;30;48;2;166;216;84m##ION[0m [0;30;48;2;255;217;47m[UNK][0m [0;30;48;2;102;194;165m[UNK][0m [0;30;48;2;252;141;98mshow[0m [0;30;48;2;141;160;203m_[0m [0;30;48;2;231;138;195mtoken[0m [0;30;48;2;166;216;84m##s[0m [0;30;48;2;255;217;47mF[0m [0;30;48;2;102;194;165m##als[0m [0;30;48;2;252;141;98m##e[0m [0;30;48;2;141;160;203mNone[0m [0;30;48;2;231;138;195mel[0m [0;30;48;2;166;216;84m##if[0m [0;30;48;2;255;217;47m=[0m [0;30;48;2;102;194;165m=[0m [0;30;48;2;252;141;98m>[0m [0;30;48;2;141;160;203m=[0m [0;30;48;2;231;138;195melse[0m [0;30;48;2;166;216;84m:[0m [0;30;48;2;255;217;47mtwo[0m [0;30;48;2;102;194;165mta[0m [0;30;48;2;252;1

### Use gpt2 tokenizer

In [None]:
show_tokens(text, "gpt2")

[0;30;48;2;102;194;165m
[0m [0;30;48;2;252;141;98mEnglish[0m [0;30;48;2;141;160;203m and[0m [0;30;48;2;231;138;195m CAP[0m [0;30;48;2;166;216;84mITAL[0m [0;30;48;2;255;217;47mIZ[0m [0;30;48;2;102;194;165mATION[0m [0;30;48;2;252;141;98m
[0m [0;30;48;2;141;160;203m�[0m [0;30;48;2;231;138;195m�[0m [0;30;48;2;166;216;84m�[0m [0;30;48;2;255;217;47m �[0m [0;30;48;2;102;194;165m�[0m [0;30;48;2;252;141;98m�[0m [0;30;48;2;141;160;203m
[0m [0;30;48;2;231;138;195mshow[0m [0;30;48;2;166;216;84m_[0m [0;30;48;2;255;217;47mt[0m [0;30;48;2;102;194;165mok[0m [0;30;48;2;252;141;98mens[0m [0;30;48;2;141;160;203m False[0m [0;30;48;2;231;138;195m None[0m [0;30;48;2;166;216;84m el[0m [0;30;48;2;255;217;47mif[0m [0;30;48;2;102;194;165m ==[0m [0;30;48;2;252;141;98m >=[0m [0;30;48;2;141;160;203m else[0m [0;30;48;2;231;138;195m:[0m [0;30;48;2;166;216;84m two[0m [0;30;48;2;255;217;47m tabs[0m [0;30;48;2;102;194;165m:"[0m [0;30;48;2;252;141;98m [0m 

In [None]:
show_tokens(text, "google/flan-t5-small")

[0;30;48;2;102;194;165mEnglish[0m [0;30;48;2;252;141;98mand[0m [0;30;48;2;141;160;203mCA[0m [0;30;48;2;231;138;195mPI[0m [0;30;48;2;166;216;84mTAL[0m [0;30;48;2;255;217;47mIZ[0m [0;30;48;2;102;194;165mATION[0m [0;30;48;2;252;141;98m[0m [0;30;48;2;141;160;203m<unk>[0m [0;30;48;2;231;138;195m[0m [0;30;48;2;166;216;84m<unk>[0m [0;30;48;2;255;217;47mshow[0m [0;30;48;2;102;194;165m_[0m [0;30;48;2;252;141;98mto[0m [0;30;48;2;141;160;203mken[0m [0;30;48;2;231;138;195ms[0m [0;30;48;2;166;216;84mFal[0m [0;30;48;2;255;217;47ms[0m [0;30;48;2;102;194;165me[0m [0;30;48;2;252;141;98mNone[0m [0;30;48;2;141;160;203m[0m [0;30;48;2;231;138;195me[0m [0;30;48;2;166;216;84ml[0m [0;30;48;2;255;217;47mif[0m [0;30;48;2;102;194;165m=[0m [0;30;48;2;252;141;98m=[0m [0;30;48;2;141;160;203m>[0m [0;30;48;2;231;138;195m=[0m [0;30;48;2;166;216;84melse[0m [0;30;48;2;255;217;47m:[0m [0;30;48;2;102;194;165mtwo[0m [0;30;48;2;252;141;98mtab[0m [0;30;48;2;141

In [None]:
# The official is `tiktoken` but this the same tokenizer on the HF platform
show_tokens(text, "Xenova/gpt-4")

[0;30;48;2;102;194;165m
[0m [0;30;48;2;252;141;98mEnglish[0m [0;30;48;2;141;160;203m and[0m [0;30;48;2;231;138;195m CAPITAL[0m [0;30;48;2;166;216;84mIZATION[0m [0;30;48;2;255;217;47m
[0m [0;30;48;2;102;194;165m�[0m [0;30;48;2;252;141;98m�[0m [0;30;48;2;141;160;203m�[0m [0;30;48;2;231;138;195m �[0m [0;30;48;2;166;216;84m�[0m [0;30;48;2;255;217;47m�[0m [0;30;48;2;102;194;165m
[0m [0;30;48;2;252;141;98mshow[0m [0;30;48;2;141;160;203m_tokens[0m [0;30;48;2;231;138;195m False[0m [0;30;48;2;166;216;84m None[0m [0;30;48;2;255;217;47m elif[0m [0;30;48;2;102;194;165m ==[0m [0;30;48;2;252;141;98m >=[0m [0;30;48;2;141;160;203m else[0m [0;30;48;2;231;138;195m:[0m [0;30;48;2;166;216;84m two[0m [0;30;48;2;255;217;47m tabs[0m [0;30;48;2;102;194;165m:"[0m [0;30;48;2;252;141;98m   [0m [0;30;48;2;141;160;203m "[0m [0;30;48;2;231;138;195m Three[0m [0;30;48;2;166;216;84m tabs[0m [0;30;48;2;255;217;47m:[0m [0;30;48;2;102;194;165m "[0m [0;30;48;2

In [None]:
# You need to request access before being able to use this tokenizer
show_tokens(text, "bigcode/starcoder2-15b")

[0;30;48;2;102;194;165m
[0m [0;30;48;2;252;141;98mEnglish[0m [0;30;48;2;141;160;203m and[0m [0;30;48;2;231;138;195m CAPITAL[0m [0;30;48;2;166;216;84mIZATION[0m [0;30;48;2;255;217;47m
[0m [0;30;48;2;102;194;165m�[0m [0;30;48;2;252;141;98m�[0m [0;30;48;2;141;160;203m�[0m [0;30;48;2;231;138;195m [0m [0;30;48;2;166;216;84m�[0m [0;30;48;2;255;217;47m�[0m [0;30;48;2;102;194;165m
[0m [0;30;48;2;252;141;98mshow[0m [0;30;48;2;141;160;203m_[0m [0;30;48;2;231;138;195mtokens[0m [0;30;48;2;166;216;84m False[0m [0;30;48;2;255;217;47m None[0m [0;30;48;2;102;194;165m elif[0m [0;30;48;2;252;141;98m ==[0m [0;30;48;2;141;160;203m >=[0m [0;30;48;2;231;138;195m else[0m [0;30;48;2;166;216;84m:[0m [0;30;48;2;255;217;47m two[0m [0;30;48;2;102;194;165m tabs[0m [0;30;48;2;252;141;98m:"[0m [0;30;48;2;141;160;203m   [0m [0;30;48;2;231;138;195m "[0m [0;30;48;2;166;216;84m Three[0m [0;30;48;2;255;217;47m tabs[0m [0;30;48;2;102;194;165m:[0m [0;30;48;2;25

In [None]:
show_tokens(text, "facebook/galactica-1.3b")

[0;30;48;2;102;194;165m
[0m [0;30;48;2;252;141;98mEnglish[0m [0;30;48;2;141;160;203m and[0m [0;30;48;2;231;138;195m CAP[0m [0;30;48;2;166;216;84mITAL[0m [0;30;48;2;255;217;47mIZATION[0m [0;30;48;2;102;194;165m
[0m [0;30;48;2;252;141;98m�[0m [0;30;48;2;141;160;203m�[0m [0;30;48;2;231;138;195m�[0m [0;30;48;2;166;216;84m�[0m [0;30;48;2;255;217;47m �[0m [0;30;48;2;102;194;165m�[0m [0;30;48;2;252;141;98m�[0m [0;30;48;2;141;160;203m
[0m [0;30;48;2;231;138;195mshow[0m [0;30;48;2;166;216;84m_[0m [0;30;48;2;255;217;47mtokens[0m [0;30;48;2;102;194;165m False[0m [0;30;48;2;252;141;98m None[0m [0;30;48;2;141;160;203m elif[0m [0;30;48;2;231;138;195m [0m [0;30;48;2;166;216;84m==[0m [0;30;48;2;255;217;47m [0m [0;30;48;2;102;194;165m>[0m [0;30;48;2;252;141;98m=[0m [0;30;48;2;141;160;203m else[0m [0;30;48;2;231;138;195m:[0m [0;30;48;2;166;216;84m two[0m [0;30;48;2;255;217;47m t[0m [0;30;48;2;102;194;165mabs[0m [0;30;48;2;252;141;98m:[0m [

In [None]:
show_tokens(text, "microsoft/Phi-3-mini-4k-instruct")

[0;30;48;2;102;194;165m[0m [0;30;48;2;252;141;98m
[0m [0;30;48;2;141;160;203mEnglish[0m [0;30;48;2;231;138;195mand[0m [0;30;48;2;166;216;84mC[0m [0;30;48;2;255;217;47mAP[0m [0;30;48;2;102;194;165mIT[0m [0;30;48;2;252;141;98mAL[0m [0;30;48;2;141;160;203mIZ[0m [0;30;48;2;231;138;195mATION[0m [0;30;48;2;166;216;84m
[0m [0;30;48;2;255;217;47m�[0m [0;30;48;2;102;194;165m�[0m [0;30;48;2;252;141;98m�[0m [0;30;48;2;141;160;203m�[0m [0;30;48;2;231;138;195m[0m [0;30;48;2;166;216;84m�[0m [0;30;48;2;255;217;47m�[0m [0;30;48;2;102;194;165m�[0m [0;30;48;2;252;141;98m
[0m [0;30;48;2;141;160;203mshow[0m [0;30;48;2;231;138;195m_[0m [0;30;48;2;166;216;84mto[0m [0;30;48;2;255;217;47mkens[0m [0;30;48;2;102;194;165mFalse[0m [0;30;48;2;252;141;98mNone[0m [0;30;48;2;141;160;203melif[0m [0;30;48;2;231;138;195m==[0m [0;30;48;2;166;216;84m>=[0m [0;30;48;2;255;217;47melse[0m [0;30;48;2;102;194;165m:[0m [0;30;48;2;252;141;98mtwo[0m [0;30;48;2;141;16

# Contextualized Word Embeddings From a Language Model (Like BERT)

If set, will return tensors instead of list of python integers. Acceptable values are:

* 'tf': Return TensorFlow tf.constant objects.

* 'pt': Return PyTorch torch.Tensor objects.

* 'np': Return Numpy np.ndarray objects.

https://huggingface.co/transformers/v3.5.1/main_classes/tokenizer.html

In [None]:
from transformers import AutoModel, AutoTokenizer

# Load a tokenizer
tokenizer = AutoTokenizer.from_pretrained("microsoft/deberta-base")

# Load a language model
model = AutoModel.from_pretrained("microsoft/deberta-v3-xsmall")

# Tokenize the sentence
tokens = tokenizer('Text Analytics with Deep Learning !', return_tensors='pt')

# Process the tokens
output = model(**tokens)[0]

In [None]:
output.shape

torch.Size([1, 8, 384])

In [None]:
for token in tokens['input_ids'][0]:
    print(tokenizer.decode(token))

[CLS]
Text
 Analytics
 with
 Deep
 Learning
 !
[SEP]


In [None]:
output

tensor([[[-3.4459e+00, -2.7856e-02, -1.4205e-01,  ..., -2.6476e-01,
          -3.5781e-01, -1.5352e-01],
         [-1.3499e-01,  6.2390e-02,  1.0815e+00,  ...,  1.4171e-01,
          -6.7426e-01,  3.3413e-01],
         [ 7.9560e-02,  3.2425e-01,  1.3875e-01,  ..., -1.3018e-01,
          -1.9369e-01,  1.5896e+00],
         ...,
         [-6.4458e-01,  4.3024e-01,  4.3158e-01,  ...,  3.5890e-01,
           1.6078e-03,  1.3760e+00],
         [-3.6660e-01,  5.2652e-02,  4.1864e-01,  ..., -8.9194e-01,
          -7.4023e-01,  3.5470e-01],
         [-3.0120e+00,  2.4662e-01,  2.7717e-02,  ..., -3.1854e-01,
          -3.8882e-01, -6.3865e-01]]], grad_fn=<NativeLayerNormBackward0>)

# Text Embeddings (For Sentences and Whole Documents)

https://huggingface.co/sentence-transformers/all-mpnet-base-v2

In [None]:
from sentence_transformers import SentenceTransformer

# Load model
model = SentenceTransformer('sentence-transformers/all-mpnet-base-v2')

# Convert text to text embeddings
vector = model.encode("Best movie ever!")

In [None]:
vector.shape

(768,)

# Word Embeddings Beyond LLMs


In [None]:
import gensim.downloader as api

# Download embeddings (66MB, glove, trained on wikipedia, vector size: 50)
# Other options include "word2vec-google-news-300"
# More options at https://github.com/RaRe-Technologies/gensim-data
model = api.load("glove-wiki-gigaword-50")

In [None]:
model.most_similar([model['king']], topn=11)

[('king', 1.0000001192092896),
 ('prince', 0.8236179351806641),
 ('queen', 0.7839043140411377),
 ('ii', 0.7746230363845825),
 ('emperor', 0.7736247777938843),
 ('son', 0.766719400882721),
 ('uncle', 0.7627150416374207),
 ('kingdom', 0.7542161345481873),
 ('throne', 0.7539914846420288),
 ('brother', 0.7492411136627197),
 ('ruler', 0.7434253692626953)]

In [None]:
model.most_similar([model['mango']], topn=11)

[('mango', 1.0000001192092896),
 ('pineapple', 0.9108651280403137),
 ('guava', 0.8952961564064026),
 ('papaya', 0.8908727169036865),
 ('pear', 0.854493260383606),
 ('coconut', 0.8482269048690796),
 ('plum', 0.8423119187355042),
 ('pomegranate', 0.8314303159713745),
 ('avocado', 0.8304489254951477),
 ('peach', 0.8288800120353699),
 ('apricot', 0.8190183639526367)]

In [None]:
model.most_similar([model['cat']], topn=11)

[('cat', 1.0),
 ('dog', 0.9218004941940308),
 ('rabbit', 0.8487821221351624),
 ('monkey', 0.8041081428527832),
 ('rat', 0.7891963124275208),
 ('cats', 0.7865270376205444),
 ('snake', 0.7798910737037659),
 ('dogs', 0.7795814871788025),
 ('pet', 0.7792249321937561),
 ('mouse', 0.7731667757034302),
 ('bite', 0.7728800177574158)]

In [None]:
model.most_similar([model['bird']], topn=3)

[('bird', 1.0), ('birds', 0.8536388874053955), ('animals', 0.8020079135894775)]

# Recommending songs by embeddings

In [None]:
import pandas as pd
from urllib import request

# Get the playlist dataset file
data = request.urlopen('https://storage.googleapis.com/maps-premium/dataset/yes_complete/train.txt')

# Parse the playlist dataset file. Skip the first two lines as
# they only contain metadata
lines = data.read().decode("utf-8").split('\n')[2:]

# Remove playlists with only one song
playlists = [s.rstrip().split() for s in lines if len(s.split()) > 1]

# Load song metadata
songs_file = request.urlopen('https://storage.googleapis.com/maps-premium/dataset/yes_complete/song_hash.txt')
songs_file = songs_file.read().decode("utf-8").split('\n')
songs = [s.rstrip().split('\t') for s in songs_file]
songs_df = pd.DataFrame(data=songs, columns = ['id', 'title', 'artist'])
songs_df = songs_df.set_index('id')

In [None]:
print( 'Playlist #1:\n ', playlists[0], '\n')
print( 'Playlist #2:\n ', playlists[1])

Playlist #1:
  ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35', '36', '37', '38', '39', '40', '41', '2', '42', '43', '44', '45', '46', '47', '48', '20', '49', '8', '50', '51', '52', '53', '54', '55', '56', '57', '25', '58', '59', '60', '61', '62', '3', '63', '64', '65', '66', '46', '47', '67', '2', '48', '68', '69', '70', '57', '50', '71', '72', '53', '73', '25', '74', '59', '20', '46', '75', '76', '77', '59', '20', '43'] 

Playlist #2:
  ['78', '79', '80', '3', '62', '81', '14', '82', '48', '83', '84', '17', '85', '86', '87', '88', '74', '89', '90', '91', '4', '73', '62', '92', '17', '53', '59', '93', '94', '51', '50', '27', '95', '48', '96', '97', '98', '99', '100', '57', '101', '102', '25', '103', '3', '104', '105', '106', '107', '47', '108', '109', '110', '111', '112', '113', '25', '63', '62', '114', '115', '84', '116', '117',

In [None]:
from gensim.models import Word2Vec

# Train our Word2Vec model
model = Word2Vec(
    playlists, vector_size=32, window=20, negative=50, min_count=1, workers=4
)

In [None]:
song_id = 2172

# Ask the model for songs similar to song #2172
model.wv.most_similar(positive=str(song_id))

[('2849', 0.9987022280693054),
 ('6626', 0.9972557425498962),
 ('5586', 0.9970019459724426),
 ('5549', 0.9965572357177734),
 ('2704', 0.9947277903556824),
 ('3126', 0.994402289390564),
 ('3094', 0.9943212866783142),
 ('2014', 0.9942256212234497),
 ('3116', 0.9940886497497559),
 ('11502', 0.9940714240074158)]

In [None]:
print(songs_df.iloc[2172])

title     Fade To Black
artist        Metallica
Name: 2172 , dtype: object


In [None]:
import numpy as np

def print_recommendations(song_id):
    similar_songs = np.array(
        model.wv.most_similar(positive=str(song_id),topn=5)
    )[:,0]
    return  songs_df.iloc[similar_songs]

# Extract recommendations
print_recommendations(2172)

Unnamed: 0_level_0,title,artist
id,Unnamed: 1_level_1,Unnamed: 2_level_1
2849,Run To The Hills,Iron Maiden
6626,Blackout,Scorpions
5586,The Last In Line,Dio
5549,November Rain,Guns N' Roses
2704,Over The Mountain,Ozzy Osbourne


In [None]:
print_recommendations(2172)

Unnamed: 0_level_0,title,artist
id,Unnamed: 1_level_1,Unnamed: 2_level_1
2849,Run To The Hills,Iron Maiden
6626,Blackout,Scorpions
5586,The Last In Line,Dio
5549,November Rain,Guns N' Roses
2704,Over The Mountain,Ozzy Osbourne


In [None]:
print_recommendations(842)

Unnamed: 0_level_0,title,artist
id,Unnamed: 1_level_1,Unnamed: 2_level_1
196,I'll Be Missing You,Puff Daddy & The Family
12205,Give It Up To Me,Sean Paul
27081,"Give Me Everything (w\/ Ne-Yo, Afrojack & Nayer)",Pitbull
5668,How We Do (w\/ 50 Cent),The Game
5676,Cyclone (w\/ T-Pain),Baby Bash


# 3 - Looking Inside Transformer LLMs

# Loading the LLM

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")

model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-4k-instruct",
    device_map="cuda",
    torch_dtype="auto",
    trust_remote_code=True,
)

# Create a pipeline
generator = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    return_full_text=False,
    max_new_tokens=50,
    do_sample=False,
)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Device set to use cuda


# The Inputs and Outputs of a Trained Transformer LLM


In [None]:
prompt = "Write an email apologizing to Sarah for the tragic gardening mishap. Explain how it happened."

output = generator(prompt)

print(output[0]['generated_text'])

 Mention the steps you're taking to prevent it in the future.

Dear Sarah,

I hope this message finds you well. I am writing to express my deepest apologies for the unfortunate incident that occurred in


In [None]:
print(model)

Phi3ForCausalLM(
  (model): Phi3Model(
    (embed_tokens): Embedding(32064, 3072, padding_idx=32000)
    (embed_dropout): Dropout(p=0.0, inplace=False)
    (layers): ModuleList(
      (0-31): 32 x Phi3DecoderLayer(
        (self_attn): Phi3Attention(
          (o_proj): Linear(in_features=3072, out_features=3072, bias=False)
          (qkv_proj): Linear(in_features=3072, out_features=9216, bias=False)
          (rotary_emb): Phi3RotaryEmbedding()
        )
        (mlp): Phi3MLP(
          (gate_up_proj): Linear(in_features=3072, out_features=16384, bias=False)
          (down_proj): Linear(in_features=8192, out_features=3072, bias=False)
          (activation_fn): SiLU()
        )
        (input_layernorm): Phi3RMSNorm()
        (resid_attn_dropout): Dropout(p=0.0, inplace=False)
        (resid_mlp_dropout): Dropout(p=0.0, inplace=False)
        (post_attention_layernorm): Phi3RMSNorm()
      )
    )
    (norm): Phi3RMSNorm()
  )
  (lm_head): Linear(in_features=3072, out_features=3206

# Choosing a single token from the probability distribution (sampling / decoding)

In [None]:
prompt = "The capital of France is"

# Tokenize the input prompt
input_ids = tokenizer(prompt, return_tensors="pt").input_ids

# Tokenize the input prompt
input_ids = input_ids.to("cuda")

# Get the output of the model before the lm_head
model_output = model.model(input_ids)

# Get the output of the lm_head
lm_head_output = model.lm_head(model_output[0])

In [None]:
token_id = lm_head_output[0,-1].argmax(-1)
tokenizer.decode(token_id)

'Paris'

In [None]:
model_output[0].shape

torch.Size([1, 5, 3072])

In [None]:
lm_head_output.shape

torch.Size([1, 5, 32064])

# Speeding up generation by caching keys and values


In [None]:
prompt = "Write a very long email apologizing to Sarah for the tragic gardening mishap. Explain how it happened."

# Tokenize the input prompt
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
input_ids = input_ids.to("cuda")

In [None]:
%%timeit -n 1
# Generate the text
generation_output = model.generate(
  input_ids=input_ids,
  max_new_tokens=100,
  use_cache=True
)

6.94 s ± 2.49 s per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [None]:
print(tokenizer.decode(generation_output[0]))

Write an leave application to the school as you have bad health condition.<|assistant|> [Your Name]
[Your Address]
[City, State, Zip Code]
[Email Address]
[Phone Number]
[Date]

[Principal's Name]
[School Name]
[School Address]
[City, State, Zip Code]

Subject: Leave Application Due to Health Condition

Respected [Principal's Name],

I hope this letter finds you in good health and high spirits. I am writing to formally request a leave of absence from [School Name] due to a sudden and severe health condition that requires immediate attention and rest.

I have been experiencing [briefly describe your health condition] for the past few days, and despite my best efforts to manage it, my condition has worsened. I have consulted with my doctor, who has advised me to take a break from my regular activities and focus on my recovery.




In [None]:
%%timeit -n 1
# Generate the text
generation_output = model.generate(
  input_ids=input_ids,
  max_new_tokens=100,
  use_cache=False
)

31.4 s ± 728 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [None]:
print(tokenizer.decode(generation_output[0]))

Write an leave application to the school as you have bad health condition.<|assistant|> [Your Name]
[Your Address]
[City, State, Zip Code]
[Email Address]
[Phone Number]
[Date]

[Principal's Name]
[School Name]
[School Address]
[City, State, Zip Code]

Subject: Leave Application Due to Health Condition

Respected [Principal's Name],

I hope this letter finds you in good health and high spirits. I am writing to formally request a leave of absence from [School Name] due to a sudden and severe health condition that requires immediate attention and rest.

I have been experiencing [briefly describe your health condition] for the past few days, and despite my best efforts to manage it, my condition has worsened. I have consulted with my doctor, who has advised me to take a break from my regular activities and focus on my recovery.


