# DeciLM-6B Tutorial -  Generating Text with a base LLM

### Alright, let's dive into text generation with DeciLM-6B and the 🤗 HuggingFace `transformers` library! 🚀

LLMs, or Large Language Models, are the superheroes behind text generation.

Imagine them as giant brains trained to predict the next word in a sequence.

But, they don't just spit out words willy-nilly.

They use a method called autoregressive generation, where they keep calling themselves with their own outputs until they've crafted a masterpiece (or at least a coherent sentence).

### 2. Setting Up
Before you start summoning words from the ether, make sure you've got the right tools:

In [2]:
!python --version

Python 3.10.11


In [3]:
# %%capture
# !pip install huggingface_hub
# !pip install transformers
# !pip install accelerate
# !pip install bitsandbytes>=0.39.0 -q

In [4]:
import huggingface_hub
import transformers
import accelerate

print("huggingface version : ",huggingface_hub.__version__)
print("transformers version : ", transformers.__version__)
print("accelerate version : ", accelerate.__version__)


huggingface version :  0.16.4
transformers version :  4.34.0.dev0
accelerate version :  0.22.0


Use the following token to log in:
`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`

In [None]:
!huggingface-cli login

In [None]:
# os.environ['HUGGINGFACEHUB_API_TOKEN'] = ""

In [None]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, TextStreamer

# 👩🏾‍💻 Let's Code!

### a. Load the Model

First, we need to download and summon our LLM.

This could take a few minutes, depending on your internet connection. Grab a coffee or water, have a stretch, and check back in about one to three minutes.

Here's the incantation:

In [None]:
# On launch day the model will be
# model_id = 'Deci/DeciLM-6B'

model_id = 'Deci/test_decilm'

model = AutoModelForCausalLM.from_pretrained(model_id,
                                             device_map="auto",
                                             trust_remote_code=True
                                             )

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

### b. Preprocess with a Tokenizer

Computers don't understand words in the way humans do. They prefer numbers. Tokenizers help bridge this gap by breaking down text and mapping each token to a unique number.

So, before feeding our model, we need to translate our text into a language it understands:

In [None]:
tokenizer = AutoTokenizer.from_pretrained(model_id)

tokenizer.pad_token = tokenizer.eos_token

model_inputs = tokenizer("A list of colors: red, blue", return_tensors="pt").to("cuda")

### c. Generate Text

Now, let the magic happen:

In [None]:
generated_ids = model.generate(**model_inputs,
                               max_new_tokens=40,
                               num_beams=5,
                               early_stopping=True
                               )

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


After a model processes input text, it often returns a sequence of token IDs.

These token IDs are numerical representations of the tokens (words, subwords, or characters) that the tokenizer originally produced.

To make sense of these numbers and convert them back to human-readable text, we need to "decode" them.

### d. Decode

After processing, we often want to convert the tokens (or their machine-friendly number representations) back into human-readable text. This reverse process is called detokenization.

`tokenizer.batch_decode` converts the numerical outputs of a model back into meaningful, human-readable sentences, making the results interpretable and usable.

In [None]:
print(tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0])

A list of colors: red, blue, green, yellow, orange, purple, pink, black, white, brown, gray, silver, and gold.


# More examples

Let's create a simple function which will take our prompt and number of new tokens to be generated and gives us the generated text.

## How to control text generation

Text generation with Hugging Face Transformers is both an art and a science.

While model architecture and training data lay the foundation, parameters like `num_beams`, `no_repeat_ngram_size`, and `early_stopping` serve as fine-tuning knobs.

By understanding and adeptly adjusting these parameters, you can significantly enhance the quality of your model's generated text.

Experiment, iterate, and find the perfect balance for your unique application!

I'll briefly describe how these three in particular influence the quality and characteristics of generated text:

1) `num_beams`

2) `no_repeat_ngram_size`

3) `early_stopping`


#### 1. 🔦 `num_beams`: The Power of Beam Search

**What it does:** `num_beams` defines the number of sequences (or "beams") the model considers in parallel during generation. It essentially widens the search space, allowing the model to explore multiple possibilities before settling on the final sequence.

**Effect on text:** A higher `num_beams` value often results in more coherent and contextually relevant sequences. However, it can also increase the computational overhead, as the model now has to track and update more sequences.

**Practical tip:** If your model's output lacks fluency or seems off-context, consider increasing `num_beams`. Just be mindful of the trade-off between quality and computation time.

#### 2. 🙅🏽 `no_repeat_ngram_size`: Curbing Repetitiveness

**What it does:** This parameter prevents the repetition of n-grams. An n-gram is a contiguous sequence of 'n' items from a text. Setting `no_repeat_ngram_size` to 2, for instance, ensures that the same set of two tokens doesn't appear more than once.

**Effect on text:** It significantly reduces repetitiveness in the output. Especially in longer sequences, preventing repeated n-grams can make the output more readable and less redundant.

**Practical tip:** If your generated text feels "looped" or tautological (ie, "saying the same thing twice"), tweaking `no_repeat_ngram_size` can be a game-changer.

#### 3. 🛑 `early_stopping`: Knowing When to Stop

**What it does:** `early_stopping` determines if the generation process should cease when all sequences in the beam reach the end-of-sequence token.

**Effect on text:** By enabling `early_stopping`, you can ensure that the generation halts once a logical endpoint is reached, even if the `max_length` hasn't been attained. This often leads to more concise and relevant outputs.

**Practical tip:** If your outputs seem unnecessarily lengthy or start drifting off-topic, enabling `early_stopping` can bring them back on track.

In [None]:
def generate_text(prompt:str, max_new_tokens:int, temperature:float) -> str:
    model_inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
    generated_ids = model.generate(**model_inputs,
                                   max_new_tokens=max_new_tokens,
                                   temperature=temperature,
                                   num_beams=5,
                                   no_repeat_ngram_size=4,
                                   early_stopping=True
                                   )
    decoded_generation = tokenizer.batch_decode(generated_ids, skip_special_tokens=False)[0]
    return print(decoded_generation)

In [None]:
prompt = """In this blog post, we're going to talk about why waking up is"""
generate_text(prompt, 500, 0.25)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


<s> In this blog post, we're going to talk about why waking up is one of the most important things you can do for your health and well-being. We're also going to give you some tips on how to make sure you're getting the most out of your mornings. So, if you're ready to start your day off on the right foot, let's get started!
Why is waking up important?
Waking up is important because it sets the tone for the rest of your day. If you wake up feeling groggy and unrested, it's going to be difficult to stay focused and productive throughout the day. On the other hand, waking up early and getting a good night's sleep can help you feel more energized and ready to take on the day.
How can I make sure I'm waking up on time?
If you're having trouble waking up in the morning, there are a few things you can try. One option is to set an alarm that goes off 15 minutes earlier than your usual wake-up time. This will give you a little bit of extra time to wake up and get ready for the day. Another opt

In [None]:
prompt = """Dear recruiter, I write this letter of recommendation for my toddler
son for his application to the Hogwarts School of Monster Trucks and Classic Cars.
He has over 100 monster trucks and this is beyond an obsession
"""
generate_text(prompt, 500, 0.7)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


<s> Dear recruiter, I write this letter of recommendation for my toddler 
son for his application to the Hogwarts School of Monster Trucks and Classic Cars. 
He has over 100 monster trucks and this is beyond an obsession
it's a way of life for him. 
His favorite truck is a 1969 Ford F-100 pickup truck with a 427 cubic inch 
V-8 engine and a 4-speed manual transmission. 
The truck has a 12-inch lift kit, 38-inch tires, and a 10-inch suspension lift. 
It also has a 500-cubic inch V-8 engine, a 6-speed automatic transmission, 
and a 3-inch exhaust system. 
This truck can go from 0 to 60 mph in less than 3 seconds and has a top speed 
of over 200 mph. 
My son loves this truck so much that he sleeps with it every night. 
When he wakes up in the morning, the first thing he does is check to see 
if the truck is still there. 
If it isn't, he gets really upset and starts crying. 
I've tried to explain to him that the truck isn't real, but he doesn't 
believe me. 
Even when I show him pictures o

In [None]:
prompt = """It was a clear dark night, a clear white moon. Warren G was on the street trying to consume"""
generate_text(prompt, 500, 0.7)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


<s> It was a clear dark night, a clear white moon. Warren G was on the street trying to consume as much alcohol as he possibly could. He was on his way to a party, but he wasn't really looking forward to it. He just wanted to get drunk and forget about his problems.
Warren had a lot of problems. His girlfriend was cheating on him, his boss was a jerk, and he didn't know what he was going to do with his life. He was 25 years old, and he had no idea what he wanted to do with the rest of his life.
He was walking down the street when he saw a man standing on the corner. The man was holding a sign that said, "Will work for food." Warren was hungry, so he stopped to talk to the man.
The man told Warren that he was homeless, and that he had been living on the street for the past few months. He said that he was looking for a job, but that no one would hire him. Warren felt sorry for the man, and he offered to buy him a meal.
The two of them went to a nearby restaurant, and Warren ordered a mea