# Applying the Transformer - Introduction to GPT-2 (Chapter 16 Application) ü§ñ

---

This notebook transitions from manually building the self-attention mechanism (covered in the previous file) to the practical application of a **state-of-the-art, large-scale Transformer model**: **GPT-2 (Generative Pre-trained Transformer 2)**. It showcases how industry tools simplify the use of models built upon the concepts of **Chapter 16: Transformers**.

### 1. Leveraging the Hugging Face `transformers` Library üì¶

The notebook primarily uses the highly popular **Hugging Face `transformers`** library, which abstracts away the complexity of managing gigabytes of model weights and configuration:

* **High-Level `pipeline`:** The fastest way to run the model is demonstrated using the `pipeline('text-generation', model='gpt2')` function. This single function handles tokenization, forward pass, sampling, and decoding the output text.
* **Reproducibility:** The use of `set_seed(42)` ensures that the generated text output is consistent across runs, a best practice for demonstrating model behavior.

### 2. The GPT-2 Architecture and Causal Attention üß†

This section provides insight into the specific Transformer variant being used:

* **Decoder-Only Model:** GPT-2 is an **Autoregressive Language Model** built solely on the **Transformer Decoder** architecture. 
* **Causal Masking:** The notebook implicitly uses **Causal Self-Attention** (or **Masked Self-Attention**). This crucial mechanism ensures that when calculating the attention for a word, the model **only looks at words that came before it** in the sequence, preventing information leakage from the future and making sequential generation possible.

### 3. Practical Application: Text Generation ‚úçÔ∏è

The main demonstration of the notebook is showing the model's ability to generate coherent, novel text:

* **Seed Text (Prompt):** A short initial prompt ("Hello readers! today is my anniversary, so I wanted to share some of the awesome stuff I have discovered along the way.") is given to the model.
* **Sampling:** The model continues the text, using the probabilities output by its final layer to **sample** the next token repeatedly until a stopping condition (like maximum length) is met. This showcases the generative power derived from the attention mechanism.

### 4. Component-Level Inspection

For users who want to fine-tune or inspect the model more deeply, the notebook shows how to access the individual components:

* **Tokenizer (`GPT2Tokenizer`):** Demonstrates loading the specific GPT-2 tokenizer and encoding a text sequence into numerical IDs.
* **Model (`GPT2Model`):** Shows loading the raw model object.
* **Output Hidden State:** By running the model with encoded input, the notebook inspects the output shape (e.g., `torch.Size([1, 5, 768])`). This confirms that for every input token, the Transformer produces a contextualized **hidden state vector** (of dimension 768), which summarizes that token's meaning based on its relationship with all other preceding tokens in the sequence.

In [25]:
import transformers
from transformers import pipeline, set_seed

generator = pipeline('text-generation', model='gpt2')
set_seed(42)

Device set to use cuda:0


In [26]:
text_generated = generator("Hello readers! today is", max_new_tokens= 20, num_return_sequences= 3, truncation= True)
# print(text_generated)
text_1, text_2, text_3 = text_generated[0]['generated_text'], text_generated[1]['generated_text'], text_generated[2]['generated_text']
print(text_1, '\n', text_2, '\n', text_3)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Hello readers! today is my anniversary, so I wanted to share some of the awesome stuff I have discovered along the way. 
 Hello readers! today is a special one as we present the first ever installment of the 'Survivor Legacy' podcast. 
 Hello readers! today is the day!

In a post I wrote about the idea of a "couple of people


In [27]:
from transformers import GPT2Tokenizer

In [32]:
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
text = 'Let us encode this text'
encoded_input = tokenizer(text, return_tensors= 'pt')
encoded_input

{'input_ids': tensor([[ 5756,   514, 37773,   428,  2420]]), 'attention_mask': tensor([[1, 1, 1, 1, 1]])}

In [33]:
from transformers import GPT2Model
model = GPT2Model.from_pretrained('gpt2')
output = model(**encoded_input)

In [36]:
output.last_hidden_state.shape

torch.Size([1, 5, 768])