<a href="https://colab.research.google.com/github/chethana613/LLM-Readings-and-Assignments/blob/main/LLM1_Bloom.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**`Q1. Bloom vs Bloomz`**

1. Bloom-Z models have been trained on even larger datasets compared to Bloom models. This means they've been exposed to more diverse language patterns and nuances where they store more information and understand better about language.
2. Bloom-Z models generate stories that make more sense. Their sentences flow together better, like a coherent book and fewer grammatical errors while Bloom models might sometimes jump around a bit.
3. Bloom-Z models use fancier words and have a bigger pool of words to choose from. So, the generated stories sound more varied compared to Bloom models, which might sound a bit simpler.
4. Bloom-Z models are typically larger in terms of model parameters and require more computational resources (like GPU memory and processing power) compared to Bloom models.



**`Q2. Generate a story of 200 words that starts with the words “Once upon
a time ” using each of these models.`**

`a)Process for both bloom and bloomz models: `


Started by importing all the required libraries -> Initialize the tokenizer and the model using pre-trained models from Hugging Face ->  Encode the prompt input "Once upon a time" by breaking it down into tokens using the initialized tokenizer of the model, which returns us a tensor using PyTorch -> The model then takes this tensor and generates text based on its pre-training vocabulary/corpora(The quality of the generated text can vary depending on the parameters used in the generate method) -> After the model generates the text for the input prompt, we use the same tokenizer to decode it -> Finally the generated text is Tokenized into readable form using the model tokenizer.


✅ Challenges:

1. Despite the minimum and maximum length specified as 200, the model generated shorter responses that is fewer than 200 words in some cases.

1. In several runs, especially without repetition penalties, the models tend to generate highly repetitive sequences where certain phrases or words are repeated excessively or make no sense.


✅ Variations:

1. Introducing repetition penalty did help mitigate the issue of repetitive sequences in the generated text across all models. But i found that the repetition_penality value higher than 1 didn't really impact on the diversity/randomness of the generated text. Eg: in bloom 1b7 and bloomz 1b7 models, The text generated with repetition penalty 2 is found same as the text generated with repetition penalty 1.5

2. I found that bloomz is generating different/diverse context text in differet runs while bloom models are giving the same text all over again with n runs.


**Parameters Explored:**

▶do_sample=True -> Enables sampling-based decoding, increasing diversity in generation.
 do_sample=False -> Model uses greedy decoding, it selects the token with the highest probability at each step.

▶num_return_sequences -> Controls the number of independent sequences generated.

  Greedy- > num_return_sequences = 1 -> Involves selecting the token with the highest probability at each step, resulting in a single output sequence.

  Beam ->num_return_sequences = n -> After decoding is complete, the algorithm returns the top num_return_sequences sequences from the final beam as the output.

▶temperature -> adjusts the level of randomness in the sampling process.
  
  ->Temperature parameter will have no effect when there is no sampling as token selection is deterministic.

  Lower temperatures (Closer to 0) for yielded conservative, high-quality outputs and higher temperatures did more diverse outputs.

▶repetition_penalty -> Is a parameter used to penalize the model from repeating the same token excessively within generated sequences.Applied during decoding.  Higher repetition_penalty value leads to less repetition in the generated text but may also result in less coherent outputs.


---

 **`c) Conduct a qualitative analysis of the text generations, highlighting any patterns or notable variations between Bloom and Bloom-Z outputs. Share
your observations on the creativity, coherence, and overall quality of the
generated stories.`**




**For Eg:** Among the models without penalty, bloom-560m has the lowest perplexity, followed by bloomz-560m and bloom-1b7.
Among these models with penalty, bloomz-560m and bloomz-1b7 have the lowest perplexity,



▶Bloom Models:


The generated stories are mostly found to be dark and deep. There are stories generated by bloom 1b7 that talk about death, breakup, monster etc., Shown creativity through conventional storytelling elements, often rooted in myth. Lots of repetition in context for smaller models and found better to have improved in larger models.


▶Bloomz Models:

 Found the generated stories more like everyday conversations, touching on modern, environmental and personal experiences. While mostly coherent, there are moments where the stories feel a bit disconnected. The stories here are of good quality and engaging.

Overall, I found the Bloom models has shown timeless wisdom, and the Bloom-Z model mostly shown its contemporary flair.



---


**`Q3. You experimented with two types of models of 3 different sizes.
Investigate the relationship between your results (type-token ratio and perplexity) and the size of these models. What trends do you notice? Analyze the correlation between model size and the computed metrics. Discuss the implications of these findings in terms of model scalability and performance `**



In [None]:
 #pip install -q transformers


In [None]:
#! pip show transformers

In [1]:
#Base Model of Bloom-560m, bloom-1b1, bloom-1b7, bloomz-560m, bloomz-1b1, bloomz-1b7

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

#tokenizer1 = AutoTokenizer.from_pretrained("bigscience/bloom-560m")
#model1 = AutoModelForCausalLM.from_pretrained("bigscience/bloom-560m")

#tokenizer1 = AutoTokenizer.from_pretrained("bigscience/bloom-1b1")
#model1 = AutoModelForCausalLM.from_pretrained("bigscience/bloom-1b1")

#tokenizer1 = AutoTokenizer.from_pretrained("bigscience/bloom-1b7")
#model1 = AutoModelForCausalLM.from_pretrained("bigscience/bloom-1b7")

#tokenizer1 = AutoTokenizer.from_pretrained("bigscience/bloomz-560m")
#model1 = AutoModelForCausalLM.from_pretrained("bigscience/bloomz-560m")

#tokenizer1 = AutoTokenizer.from_pretrained("bigscience/bloomz-1b1")
#model1 = AutoModelForCausalLM.from_pretrained("bigscience/bloomz-1b1")

tokenizer1 = AutoTokenizer.from_pretrained("bigscience/bloomz-1b7")
model1 = AutoModelForCausalLM.from_pretrained("bigscience/bloomz-1b7")

inputs1 = tokenizer1.encode("Once upon a time ", return_tensors="pt")
outputs1 = model1.generate(inputs1,min_length = 200, max_length=200, repetition_penalty=1.5)
generated_text = tokenizer1.decode(outputs1[0])
print("bloom-1b7:" + str(generated_text))
tokens = tokenizer1.tokenize(generated_text)
ttr = len(set(tokens)) / len(tokens)
ttr

input_ids = tokenizer1.encode(generated_text, return_tensors="pt")
logits = model1(input_ids).logits


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


bloom-1b7:Once upon a time  there was an old man who lived in the woods. He had two sons, one of whom died when he got too big to walk and another that grew up into him as his son-in-law.  The older boy wanted nothing more than for himself but didn't want anyone else's attention so decided not even bothering with school or work until after he'd grown out enough.   His father told this story about how they were both very young at first because their parents never let them have any other children before having just married.    They would always be together forever...until death do us part! One day while walking through some brushy woodland on foot,  I saw something unusual: A large black bear cub standing alone by itself near where my feet touched it!  It looked like someone from yesterday - yet somehow seemed far away now....I wondered what happened? Wasn't she still alive?  So much longer then I'd thought!! She must've been dead long ago ...but her body wasn't found anywhere nearby


**Run 1: bloom-560m:(Without Penalty)**

Once upon a time  I was a man of the world
And I was a man of the world
And I was a man of the world
And I was a man of the world
And I was a man of the world
And I was a man of the world
And I was a man of the world
And I was a man of the world
And I was a man of the world
And I was a man of the world
And I was a man of the world
And I was a man of the world
And I was a man of the world
And I was a man of the world
And I was a man of the world
And I was a man of the world
And I was a man of the world
And I was a man of the world
And I was a man of the world
And I was a man of the world
And I was a man of the world
And I was a man of the
Type-Token Ratio (TTR): 0.065
Perplexity: 138688.921875

**Run 2:bloom-560m(with repetition_penalty-2.0):**

 Once upon a time  I was in the midst of my
life, and it seemed as if all that had happened to me were just an act.
I could not help thinking how much more wonderful things might happen,
when we should be able at once. It would seem so easy for us both; but what is this?
It seems like nothing.

"There are many reasons why you must have been born with such powers.  You may well say your father gave them up when he died--but there can only come one reason: they will never cease until after death has passed away from our side! We cannot live without them; yet even then their power shall remain unshaken by any change which takes place within ourselves or between themselves!  They do indeed continue whatever changes take effect on those who know about it; nor does anything else affect these beings except through some influence brought into existence during life itself!

"I am sure no man ever knows whether his own mind remains free before him because its thoughts become conscious
Type-Token Ratio (TTR): 1.0
Perplexity: 39998.82421875

Type-Token Ratio (TTR): 0.065
Perplexity: 2403552.0

==================================================

**Run 1: bloom-1b1: Without Penalty**

Once upon a time  I was a young man
I was a young man
I was a young man
I was a young man
I was a young man
I was a young man
I was a young man
I was a young man
I was a young man
I was a young man
I was a young man
I was a young man
I was a young man
I was a young man
I was a young man
I was a young man
I was a young man
I was a young man
I was a young man
I was a young man
I was a young man
I was a young man
I was a young man
I was a young man
I was a young man
I was a young man
I was a young man
I was a young man
I was a young man
I was a young man
I was a young man
I was a young man
I was a
Type-Token Ratio (TTR): 0.055
Perplexity: 9450163.0

**Run 2:bloom-1b1(with repetition penalty-2.0)**

Once upon a time  I was in the army.
I had to fight for my country, and it wasn't easy.  But now I'm back home,
and I've got some money that can help me out of this hole where we are
now.

"I'm glad you came along with us. It's nice having someone who knows what he is doing! And it's good you're here because if there were no one like yourself I'd be stuck at work all day long trying not only survive but thrive on life itself!
But that's just how things have been going so far since we've met!  We haven't seen each other much lately; we're both busy working our way through college classes or whatever else has happened over these past few years...but we'll see about getting together again soon!

"I hope you'll come by sometime when you've finished your studies--maybe even before then--and I'll show up as often whenever possible while there's still room left open between two trips into town...

"And don't forget: If anything happens during those visits we'd love
Type-Token Ratio (TTR): 1.0
Perplexity: 64790.7734375

==================================================

**Run 1: bloom-1b7: Without penalty**

Once upon a time  I was a little girl
And I was a little girl
I was a little girl
I was a little girl
I was a little girl
I was a little girl
I was a little girl
I was a little girl
I was a little girl
I was a little girl
I was a little girl
I was a little girl
I was a little girl
I was a little girl
I was a little girl
I was a little girl
I was a little girl
I was a little girl
I was a little girl
I was a little girl
I was a little girl
I was a little girl
I was a little girl
I was a little girl
I was a little girl
I was a little girl
I was a little girl
I was a little girl
I was a little girl
I was a little girl
I was a little girl
I was a little girl
I was
Type-Token Ratio (TTR): 0.06
Perplexity: 156083.15625

**Run 2:bloom-1b7:(with repetition penalty-2.0)**

Once upon a time  I was in love with you
I thought that we could be happy forever.
But now, it's over and gone...
And I'm not sure what to do anymore!
I'm sorry for everything I've done,
but this is the end of me... my life as well."

The man who had been so close at hand suddenly disappeared from sight.

"I don't know how long I'll have left here before they find out about it! " he cried desperately.  "I can't even tell them where we're going or why we've come back there!  It's too much trouble already...  And if anyone finds us they'll kill both our families!!   But please help me: Please let someone else take care o'me instead!!
Please give up your dream: Let go all those feelings you've got inside yourself; just leave behind these memories alone - because they're no good any more.. !
It's only when you're dead will you'll understand what's really important :)
Don't worry,  we'll get through it: We'll survive together again !

Type-Token Ratio (TTR): 1.0
Perplexity: 41492.79296875

**Run 2:bloom-1b7:(with repetition penalty-1.4)**

Once upon a time  I was in love with you
I thought that we could be happy forever.
But now, it's over and gone...
And I'm not sure what to do anymore!
I'm sorry for everything I've done,
but this is the end of me... my life as well."

The man who had been so close at hand suddenly disappeared from sight.

"I don't know how long I'll have left here before they find out about it! " he cried desperately.  "I can't even tell them where we're going or why we've come back there!  It's too much trouble already...  And if anyone finds us they'll kill both our families!!   But please help me: Please let someone else take care o'me instead!!
Please give up your dream: Let go all those feelings you've got inside yourself; just leave behind these memories alone - because they're no good any more.. !
It's only when you're dead will you'll understand what's really important :)
Don't worry,  we'll get through it: We'll survive together again !

Type-Token Ratio (TTR): 1.0
Perplexity: 41492.79296875

==================================================

**Run 1: bloomz-560m: Without Penalty**

Once upon a time  when the world was in a state of peace, the world was in a state of peace. The world was in a state of peace. The world was in a state of peace. The world was in a state of peace. The world was in a state of peace. The world was in a state of peace. The world was in a state of peace. The world was in a state of peace. The world was in a state of peace. The world was in a state of peace. The world was in a state of peace. The world was in a state of peace. The world was in a state of peace. The world was in a state of peace. The world was in a state of peace. The world was in a state of peace. The world was in a state of peace. The world was in a state of peace. The world was in a state of peace. The world was in a state of peace. The world was in a
Type-Token Ratio (TTR): 0.08
Perplexity: 241132.46875

**Run 2:(with repetition_penalty):**

Once upon a time  when the world was in its infancy, there were many people who had no idea what they would be doing. They did not know how to cook or dress for dinner and even if you knew it all about cooking food that day your family might never eat anything from them again until later on... but this is true of most children . Children are always learning new things every year , so it's important we keep our kids entertained as much ... like any other child .
Children learn by watching movies with their parents while playing games at home during school hours - which can make life easier than ever before ! But don't forget fun activities too! If you're having trouble finding something interesting outside playtime then try some video game nights outdoors where everyone has access (or use) computers instead. . Kids love music; especially rock bands such as: The Beatles & Rolling Stones have been popular since childhood because they're easy listening tunes without being distracted.  You could also listen upbeat songs like: pop
Type-Token Ratio (TTR): 1.0
Perplexity: 15950.44921875

==================================================

**Run 1:bloomz-1b1:Without penalty**

Once upon a time  I was a student of the University of Oklahoma. I was a student of the University of Oklahoma. I was a student of the University of Oklahoma. I was a student of the University of Oklahoma. I was a student of the University of Oklahoma. I was a student of the University of Oklahoma. I was a student of the University of Oklahoma. I was a student of the University of Oklahoma. I was a student of the University of Oklahoma. I was a student of the University of Oklahoma. I was a student of the University of Oklahoma. I was a student of the University of Oklahoma. I was a student of the University of Oklahoma. I was a student of the University of Oklahoma. I was a student of the University of Oklahoma. I was a student of the University of Oklahoma. I was a student of the University of Oklahoma. I was a student of the University of Oklahoma. I was a student of the University of Oklahoma. I was a student of
Type-Token Ratio (TTR): 0.065
Perplexity: 2403552.0

**Run 2: bloomz-1b1 with penalty:**

Once upon a time  I was in the middle of an argument with my wife. She had been arguing for hours about whether or not to have her baby, and she wanted me back home so that we could talk it over together.  We were both very tired from our long day at work but decided on going out after dinner because there would be no one else around except us two.   After walking downstairs into their living room they started talking again as if nothing happened.
I asked them why this conversation continued when he said something like "I've got some news you should know" (which is what most people say before someone says anything). They answered "I don't want any more information until you've told your husband everything! You can tell him now - it's too late anyway...and then he'll never get used up by all these questions....so let's just go away tonight ...if you're still interested I'll let myself off later today..but I'm sure you'll understand soon enough!!!!! If you'd rather stay here we'll discuss
Type-Token Ratio (TTR): 1.0
Perplexity: 49968.9296875

==================================================

**Run 1:bloomz-1b7 - Without penalty:**

bloomz-1b7:Once upon a time  there was a man named John  who was a farmer  and a farmer's son  John was a farmer  and a farmer's son  John was a farmer  and a farmer's son  John was a farmer  and a farmer's son  John was a farmer  and a farmer's son  John was a farmer  and a farmer's son  John was a farmer  and a farmer's son  John was a farmer  and a farmer's son  John was a farmer  and a farmer's son  John was a farmer  and a farmer's son  John was a farmer  and a farmer's son  John was a farmer  and a farmer's son  John was a farmer  and a farmer's son  John was a farmer  and a farmer's son  John was a farmer  and a farmer's son  John was a farmer  and a farmer's son  John was a farmer  and a farmer's son  John
Type-Token Ratio (TTR): 0.075
Perplexity: 301375.0625


**Run 2:bloomz-1b7: with Penalty(2.0):**

Once upon a time  there was an old man who lived in the woods. He had two sons, one of whom died when he got too big to walk and another that grew up into him as his son-in-law.  The older boy wanted nothing more than for himself but didn't want anyone else's attention so decided not even bothering with school or work until after he'd grown out enough.   His father told this story about how they were both very young at first because their parents never let them have any other children before having just married.    They would always be together forever...until death do us part! One day while walking through some brushy woodland on foot,  I saw something unusual: A large black bear cub standing alone by itself near where my feet touched it!  It looked like someone from yesterday - yet somehow seemed far away now....I wondered what happened? Wasn't she still alive?  So much longer then I'd thought!! She must've been dead long ago ...but her body wasn't found anywhere nearby
Type-Token Ratio (TTR): 1.0
Perplexity: 26967.810546875

**Run 2:bloomz-1b7: with Penalty(1.5):**

Once upon a time  there was an old man who lived in the woods. He had two sons, one of whom died when he got too big to walk and another that grew up into him as his son-in-law.  The older boy wanted nothing more than for himself but didn't want anyone else's attention so decided not even bothering with school or work until after he'd grown out enough.   His father told this story about how they were both very young at first because their parents never let them have any other children before having just married.    They would always be together forever...until death do us part! One day while walking through some brushy woodland on foot,  I saw something unusual: A large black bear cub standing alone by itself near where my feet touched it!  It looked like someone from yesterday - yet somehow seemed far away now....I wondered what happened? Wasn't she still alive?  So much longer then I'd thought!! She must've been dead long ago ...but her body wasn't found anywhere nearby
Type-Token Ratio (TTR): 1.0
Perplexity: 26967.810546875


In [None]:
#Tuning Parameters-Sampling, temperature, num_return_

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
import gc

# Clearing memory
gc.collect()
torch.cuda.empty_cache()

# List of model names to iterate over
'''model_names = [
    "bigscience/bloom-560m",
    "bigscience/bloom-1b1",
    "bigscience/bloom-1b7",
    "bigscience/bloomz-560m",
    "bigscience/bloomz-1b1",
    "bigscience/bloomz-1b7",
]'''
model_names = [
    "bigscience/bloom-1b7"
]

def generate_text(model_name):
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForCausalLM.from_pretrained(model_name).cuda()  # Moving the model to GPU

    prompt = "Once upon a time"
    inputs = tokenizer.encode(prompt, return_tensors="pt").cuda()

    outputs = model.generate(inputs, min_length=200, max_length=200, do_sample=True, temperature=0.7)
    generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)

    tokens = tokenizer.tokenize(generated_text)
    ttr = len(set(tokens)) / len(tokens)

    input_ids = tokenizer.encode(generated_text, return_tensors="pt").cuda()
    logits = model(input_ids).logits
    loss = torch.nn.functional.cross_entropy(logits.view(-1, logits.size(-1)), input_ids.view(-1))
    perplexity = torch.exp(loss)

    # Converting perplexity to base 10 scale for simplicity
    perplexity_base10 = torch.log10(perplexity)

    return model_name, generated_text, ttr, perplexity_base10.item()

for model_name in model_names:
    print(f"Model: {model_name}")
    model_name, generated_text, ttr, perplexity_base10 = generate_text(model_name)
    print("Generated Text:", generated_text)
    print("Type-Token Ratio (TTR):", ttr)
    print("Perplexity (Base 10):", perplexity_base10)
    print("="*50)


Model: bigscience/bloom-1b7


tokenizer_config.json:   0%|          | 0.00/222 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/14.5M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/85.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/715 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/3.44G [00:00<?, ?B/s]

Adjusting Nucleus sampling to add penality and avoid repetitions

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer4 = AutoTokenizer.from_pretrained("bigscience/bloomz-1b7")
model4 = AutoModelForCausalLM.from_pretrained("bigscience/bloomz-1b7")

inputs4 = tokenizer4.encode("Once upon a time ", return_tensors="pt")

nucleus_sampling = {
    "do_sample": True,
    "max_length": 200,
    "min_length": 200,
    "temperature": 0.7,
    "top_p": 0.92,
    "top_k": 50,
    "repetition_penalty": 1.2,
    "num_return_sequences": 2
}

outputs4 = model4.generate(inputs4, **nucleus_sampling)
generated_text = tokenizer4.decode(outputs4[0], skip_special_tokens=True)

tokens = tokenizer4.tokenize(generated_text)
ttr = len(set(tokens)) / len(tokens)
ttr

with torch.no_grad():
    input_ids = tokenizer4.encode(generated_text, return_tensors="pt")
    logits = model4(input_ids).logits
    loss = torch.nn.functional.cross_entropy(logits.view(-1, logits.size(-1)), input_ids.view(-1))
    perplexity = torch.log10(torch.exp(loss))

print("Generated Text: ", generated_text)
print("Type-Token Ratio (TTR):", ttr)
print("Perplexity:", perplexity.item())
