# DeciLM-6B Starter
- 15x faster than llama in inference


In [None]:
# setting up
!pip install huggingface_hub --q
!pip install transformers --q
!pip install accelerate --q
!pip install bitsandbytes>=0.30.0 --q

In [None]:
# libararies
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, TextStreamer

## Load the model

In [None]:
model_id = "Deci/DeciLM-6b"
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map = 'auto',
    trust_remote_code= True
)

## Preprocess with a Tokenizer

In [None]:
tokenizer = AutoTokenizer.from_pretrained(model_id)

tokenizer.pad_token = tokenizer.eos_token

model_inputs = tokenizer("A list of colors: red, blue", return_tensors = 'pt').to("cuda")

## Generate Text

In [None]:
generated_ids = model.generate(**model_inputs,
                               max_new_tokens = 40,
                              num_beams = 5,
                              early_stopping = True)

## decode
- decoding the output tensors

In [None]:
print(tokenizer.batch_decode(generated_ids, skip_special_tokens = True)[0])

## How to Control Text Generation

Text generation with Hugging Face Transformers is both an art and a science.

While model architecture and training data lay the foundation, parameters like num_beams, no_repeat_ngram_size, and early_stopping serve as fine-tuning knobs.

By understanding and adeptly adjusting these parameters, you can significantly enhance the quality of your model's generated text.

Experiment, iterate, and find the perfect balance for your unique application!

how these three in particular influence the quality and characteristics of generated text:

1) num_beams

2) no_repeat_ngram_size

3) early_stopping

In [None]:
def generate_text(prompt : str, max_new_token: int , temperature : float) -> str:
    model_inputs = tokenizer(prompt, return_tensors = 'pt').to('cuda')
    generated_ids = model.generate(**model_inputs, 
                                 max_new_tokens = max_new_token,
                                 num_beams = 5,
                                 no_repeat_ngram_size = 4,
                                 early_stopping = True)
    
    decode_generation = tokenizer.batch_decode(generated_ids, skip_special_tokens = False)[0]
    return print(decode_generation)

In [None]:
prompt = """In this blog post, we're going to talk about why waking up is"""
generate_text(prompt, 500, 0.25)

In [None]:
prompt = 'organic chemistry is'
generate_text(prompt, 1000,0.25)

In [None]:
prompt = """Dear recruiter, I write this letter of recommendation for my toddler
son for his application to the Hogwarts School of Monster Trucks and Classic Cars.
He has over 100 monster trucks and this is beyond an obsession
"""
generate_text(prompt, 500, 0.7)

In [None]:
prompt = """It was a clear dark night, a clear white moon. Warren G was on the street trying to consume"""
generate_text(prompt, 500, 0.7)