
# Experimenting with HuggingFace - Text Generation
**We will explore text generation using a GPT-2 model, which was trained to predict next words on 40GB of Internet text data.**
**In this notebook, we will explore different decoding methods like Beam search, Top-K sampling, and Top-P sampling, demonstrating their performance along the way**

In [1]:
config = {
    "SEED" : 34 ,
    "MAX_LEN" : 70 
}

**A language model is a machine learning model that can look at part of a sentence and predict the next word/sequence of words. Much like the autofill features , GPT-2 is capable of next word prediction on a much larger and more sophisticated scale. For reference, the smallest available GPT-2 has 117 million parameters, whereas the largest one (invisible to the public) has over 1.5 billion parameters. The largest one available for public use is half the size of their main GPT-2 mode**

In [2]:
!pip install -U flash-attn --no-build-isolation

Collecting flash-attn
  Downloading flash_attn-2.7.0.post2.tar.gz (2.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.7/2.7 MB[0m [31m27.3 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25h  Preparing metadata (setup.py) ... [?25ldone
Collecting einops (from flash-attn)
  Downloading einops-0.8.0-py3-none-any.whl.metadata (12 kB)
Downloading einops-0.8.0-py3-none-any.whl (43 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.2/43.2 kB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
[?25hBuilding wheels for collected packages: flash-attn
  Building wheel for flash-attn (setup.py) ... [?25ldone
[?25h  Created wheel for flash-attn: filename=flash_attn-2.7.0.post2-cp310-cp310-linux_x86_64.whl size=183279716 sha256=c1021a1c990422f49f554a42308c4dbfb14501d6c51bb6b595c3784ef8b86acf
  Stored in directory: /root/.cache/pip/wheels/bf/e3/ed/5e845387d52f2debd1bafb847bf3d774d3f0a3c8e31b1dc948
Successfully built flash-attn
Installing collected packages: e

In [2]:
import torch # import torch
from transformers import AutoTokenizer, AutoModelForCausalLM # from transformers 
device = "cuda" if torch.cuda.is_available() else "cpu" # set device based on your machine

model_name = "gpt2" # load gpt-2 model 


tokenizer = AutoTokenizer.from_pretrained(model_name) # load tokenizer 
model = AutoModelForCausalLM.from_pretrained(model_name).to(device) 

print(f"Model architacture : \n {model} ")

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

Model architacture : 
 GPT2LMHeadModel(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2SdpaAttention(
          (c_attn): Conv1D(nf=2304, nx=768)
          (c_proj): Conv1D(nf=768, nx=768)
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D(nf=3072, nx=768)
          (c_proj): Conv1D(nf=768, nx=3072)
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=768, out_features=50257, bias=False)
) 


**Wow, we just imported a deep learning model with more than 774 million parameters in just a couple lines of code with HuggingFace , that is the power of HuggingFace !!**
**Now let’s generate some text! Although Transformers provides a generate() func‐
tion for autoregressive models like GPT-2, we’ll implement this decoding method**

## 2. Different Decoding Methods
### 2.1 Greedy Search Decoding

In [3]:
import torch

input_txt = "Transformers are the"

input_ids = tokenizer(input_txt, return_tensors="pt")["input_ids"].to(device)  # return tensor
iters = []  # To store the step-by-step generation details

n_steps = 8
choices_per_step = 5  # Define our k variable (top-k sampling)

with torch.no_grad():
    for step in range(n_steps):
        iter = dict()
        iter["Input"] = tokenizer.decode(input_ids[0])

        # Model output
        output = model(input_ids=input_ids)

        # Select the last token logits for the first batch
        next_token_logits = output.logits[0, -1, :]
        next_token_prob = torch.softmax(next_token_logits, dim=-1)

        # Sort the outputs from the highest probs to the lowest
        sorted_ids = torch.argsort(next_token_prob, dim=-1, descending=True)

        # Get k tokens with the highest probs
        for choice_idx in range(choices_per_step):
            token_id = sorted_ids[choice_idx]  # Get token id
            token_prob = next_token_prob[token_id].cpu().item()  # Get prob of the target token
            token_choice = (
                f"{tokenizer.decode(token_id)} ({100 * token_prob:.2f}%)"
            )
            iter[f"Choice {choice_idx + 1}"] = token_choice

        # Update input_ids with the highest-probability token
        input_ids = torch.cat([input_ids, sorted_ids[None, choice_idx, None]], dim=-1)
        
        # Store this iteration's details
        iters.append(iter)

# Example: Display the results
for step_idx, iteration in enumerate(iters):
    print(f"Step {step_idx + 1}:")
    for key, value in iteration.items():
        print(f"{key}: {value}")
    print()

Step 1:
Input: Transformers are the
Choice 1:  most (9.76%)
Choice 2:  same (2.94%)
Choice 3:  only (2.87%)
Choice 4:  best (2.38%)
Choice 5:  first (1.77%)

Step 2:
Input: Transformers are the first
Choice 1:  to (12.16%)
Choice 2:  of (4.14%)
Choice 3:  and (3.95%)
Choice 4:  class (2.70%)
Choice 5:  generation (1.88%)

Step 3:
Input: Transformers are the first generation
Choice 1:  of (57.15%)
Choice 2:  to (1.53%)
Choice 3: , (1.07%)
Choice 4:  in (0.93%)
Choice 5: . (0.68%)

Step 4:
Input: Transformers are the first generation.
Choice 1: 
 (21.86%)
Choice 2:  They (9.65%)
Choice 3:  The (9.13%)
Choice 4:  This (2.14%)
Choice 5: 

 (2.13%)

Step 5:
Input: Transformers are the first generation.


Choice 1: 
 (99.96%)
Choice 2: . (0.01%)
Choice 3:  ( (0.00%)
Choice 4: 

 (0.00%)
Choice 5: , (0.00%)

Step 6:
Input: Transformers are the first generation.

,
Choice 1:  the (5.36%)
Choice 2:  and (4.23%)
Choice 3:  which (2.85%)
Choice 4:  a (2.17%)
Choice 5: 
 (2.09%)

Step 7:
Input: Tr

In [5]:
# put results in dataframe
import pandas as pd
pd.DataFrame(iters)

Unnamed: 0,Input,Choice 1,Choice 2,Choice 3,Choice 4,Choice 5
0,Transformers are the,most (9.76%),same (2.94%),only (2.87%),best (2.38%),first (1.77%)
1,Transformers are the first,to (12.16%),of (4.14%),and (3.95%),class (2.70%),generation (1.88%)
2,Transformers are the first generation,of (57.15%),to (1.53%),", (1.07%)",in (0.93%),. (0.68%)
3,Transformers are the first generation.,\n (21.86%),They (9.65%),The (9.13%),This (2.14%),\n\n (2.13%)
4,Transformers are the first generation.\n\n,\n (99.96%),. (0.01%),( (0.00%),\n\n (0.00%),", (0.00%)"
5,"Transformers are the first generation.\n\n,",the (5.36%),and (4.23%),which (2.85%),a (2.17%),\n (2.09%)
6,"Transformers are the first generation.\n\n,\n",\n (90.90%),The (0.81%),\n\n (0.46%),""" (0.42%)",This (0.19%)
7,"Transformers are the first generation.\n\n,\nThis",is (22.42%),means (2.18%),mod (1.79%),will (1.73%),was (1.38%)


In [6]:
max_length = 128
input_txt = """In a shocking finding, scientist discovered \
a herd of unicorns living in a remote, previously unexplored \
valley, in the Andes Mountains. Even more surprising to the \
researchers was the fact that the unicorns spoke perfect English.\n\n
"""
input_ids = tokenizer(input_txt, return_tensors="pt")["input_ids"].to(device)
output_greedy = model.generate(input_ids, max_length=max_length,
do_sample=False)
print(tokenizer.decode(output_greedy[0]))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.


"The unicorns were very intelligent, and they were very intelligent," said Dr. David S. Siegel, a professor of anthropology at the University of California, Berkeley. "They were very intelligent, and they were very intelligent, and they were very intelligent, and they were very intelligent, and they were very intelligent, and they were very intelligent, and they were very intelligent, and they were very


## 2.2 Beam Search Decoding

In [7]:
output_beam = model.generate(input_ids, max_length=max_length, num_beams=5,
do_sample=False)
print(tokenizer.decode(output_beam[0]))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.


The researchers, from the University of California, San Diego, and the University of California, Santa Cruz, found that the unicorns were able to communicate with each other in a way that was similar to that of human speech.


"The unicorns were able to communicate with each other in a way that was similar to that of human speech," said study co-lead author Dr. David J.


## 2.3 Top-k Sampling

In [8]:
output_topk = model.generate(input_ids, max_length=max_length, do_sample=True,
top_k=50)
print(tokenizer.decode(output_topk[0]))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.


The group's findings were published in the Dec. 1 issue of Nature Communications, among other things.


The researchers describe their findings in a video released this morning by the Smithsonian Park Zoo and help researchers prepare for the next major study.


"Familiarity with the spoken-a language does make it easy and easy to learn other languages," said senior author Tom M. Hovann,


## 2.4 Top-P sampling

In [9]:
output_topp = model.generate(input_ids, max_length=max_length, do_sample=True,
top_p=0.90)
print(tokenizer.decode(output_topp[0]))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.


The researchers, who had been studying the unicorn population and then studying their behaviour, found that the unicorns could recognize and interact with an unknown type of animal as well.

'The finding is the first evidence that wild unicorns have a language. They may even have an ancestral language,' explained Professor Jochen Weich, a researcher in the Department of Natural Science and Technology at the University of


## 3.BLEU Metrice
![image](https://miro.medium.com/v2/resize:fit:1400/1*Vm5DgYvNfl6hrqpVD3n7OA.png)

In [11]:
!pip install evaluate

  pid, fd = os.forkpty()
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Collecting evaluate
  Downloading evaluate-0.4.3-py3-none-any.whl.metadata (9.2 kB)
Downloading evaluate-0.4.3-py3-none-any.whl (84 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.0/84.0 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: evaluate
Successfully installed evaluate-0.4.3


In [14]:
!pip install sacrebleu

  pid, fd = os.forkpty()
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Collecting sacrebleu
  Downloading sacrebleu-2.4.3-py3-none-any.whl.metadata (51 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m51.8/51.8 kB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting portalocker (from sacrebleu)
  Downloading portalocker-3.0.0-py3-none-any.whl.metadata (8.5 kB)
Downloading sacrebleu-2.4.3-py3-none-any.whl (103 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m104.0/104.0 kB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading portalocker-3.0.0-py3-none-any.whl (19 kB)
Installing collected packages: portalocker, sacrebleu
Successfully installed portalocker-3.0.0 sacrebleu-2.4.3


In [12]:
!pip install rouge_score

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Collecting rouge_score
  Downloading rouge_score-0.1.2.tar.gz (17 kB)
  Preparing metadata (setup.py) ... [?25ldone
Building wheels for collected packages: rouge_score
  Building wheel for rouge_score (setup.py) ... [?25ldone
[?25h  Created wheel for rouge_score: filename=rouge_score-0.1.2-py3-none-any.whl size=24934 sha256=6247ca3381e4b78a5bc36f81b7fd9fe0690aa2ebe904da7f4bf3fb2c425260d3
  Stored in directory: /root/.cache/pip/wheels/5f/dd/89/461065a73be61a532ff8599a28e9beef17985c9e9c31e541b4
Successfully built rouge_score
Installing collected packages: rouge_score
Successfully installed rouge_score-0.1.2


In [15]:
from evaluate import load  # Modern API for metrics

# Load the SacreBLEU metric
bleu_metric = load("sacrebleu")

In [16]:
import pandas as pd
import numpy as np

prediction="the cat is on mat" 
reference=["the cat is on the mat"]

bleu_metric.add(prediction = prediction, reference = reference)
results = bleu_metric.compute(smooth_method="floor", smooth_value=0)
results["precisions"] = [np.round(p, 2) for p in results["precisions"]]
pd.DataFrame.from_dict(results, orient="index", columns=["Value"])

Unnamed: 0,Value
score,57.893007
counts,"[5, 3, 2, 1]"
totals,"[5, 4, 3, 2]"
precisions,"[100.0, 75.0, 66.67, 50.0]"
bp,0.818731
sys_len,5
ref_len,6


## ROUGE Evaluation 

In [17]:
rouge_metric = load("rouge")

Downloading builder script:   0%|          | 0.00/6.27k [00:00<?, ?B/s]

In [18]:
prediction="the cat is on mat" 
reference=["the cat is on the mat"]
records = []
rouge_names = ["rouge1", "rouge2", "rougeL", "rougeLsum"]
rouge_metric.add(prediction=prediction, reference=reference[0])
score = rouge_metric.compute()
rouge_dict = dict((rn, score[rn]) for rn in rouge_names)
records.append(rouge_dict)
pd.DataFrame.from_records(records)

Unnamed: 0,rouge1,rouge2,rougeL,rougeLsum
0,0.909091,0.666667,0.909091,0.909091
