# Exploring How a Decoder Works

This notebook explores the text generation capabilities of early decoder-only Transformer models, such as GPT-2.

## Hugging Face Pipelines

### GPT-2: Hugging Face Pipelines

Hugging Face facilitates inference for common NLP tasks with its Pipelines. This includes text generation with popular open-source models, such as GPT-2.

In [None]:
from transformers import pipeline

text = 'An example of a prime number is'
generator = pipeline('text-generation', 'gpt2')

response = generator(text)
response[0]['generated_text']

### GPT-2: Random Responses

Decoder-based models usually have a random component in their responses. As a result, the same input can produce multiple different responses.

In [None]:
responses = generator(text, num_return_sequences=10)
responses = [response['generated_text'][0:50] for response in responses]
responses

### GPT-2: Greedy Generation

Apart from setting seeds, another way to make the prediction deterministic is to generate tokens greedily and remove the random component. This method selects the most probable token at each step instead of sampling from a distribution.

In [None]:
response = generator(text, do_sample=False)
response[0]['generated_text']

### GPT-2: Limit Length 

Decoder-based models can keep generating tokens (and hallucinating). A very simple way to address this is by limiting the number of tokens generated.

In [None]:
response = generator(text, do_sample=False, max_new_tokens=5)
response[0]['generated_text']

## Hugging Face AutoTokenizer and AutoModelForCausalLM

### GPT-2: Pipelines Under the Hood

But Hugging Face Pipelines are just an easy way to make predictions. Under the hood, they use a tokenizer and weights from a trained decoder-based model.

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained('gpt2')
model = AutoModelForCausalLM.from_pretrained('gpt2')

inputs = tokenizer("Hello, how are", return_tensors='pt')
outputs = model.generate(**inputs, do_sample=False, max_new_tokens=10)
tokenizer.decode(outputs[0])

### GPT-2: Tokenization

First, raw input text is transformed into a sequence of discrete token IDs, which are then used to look up dense embedding vectors fed into the layers of the model.

In [None]:
inputs = tokenizer("Hello, how are", return_tensors='pt')
print(inputs)

idx_to_text = {_id.item(): tokenizer.decode(_id) for _id in inputs['input_ids'][0]}
idx_to_text

### GPT-2: Predictions

The output of a models is another sequence of token IDs, which can be mapped to text.

In [None]:
outputs = model.generate(**inputs, max_new_tokens=10)[0]
idx_to_text = {_id.item(): tokenizer.decode(_id) for _id in outputs}
idx_to_text