# Transformers: Advanced Tutorial

**Transformers** are a deep learning architecture based on self-attention mechanisms. They are foundational to modern NLP models such as BERT, GPT, and T5.

## 1. Install and Import Required Libraries

In [None]:
!pip install -q transformers datasets

import torch
from transformers import BertTokenizer, BertModel
from datasets import load_dataset
import matplotlib.pyplot as plt


## 2. Load and Tokenize Text Data

In [None]:
dataset = load_dataset("ag_news", split="train[:100]")
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

sample_text = dataset[0]["text"]
tokens = tokenizer(sample_text, return_tensors="pt", padding=True, truncation=True)

print("Sample text:", sample_text)
print("Tokenized input IDs:", tokens["input_ids"])


## 3. Load Pretrained BERT Model and Get Embeddings

In [None]:
model = BertModel.from_pretrained("bert-base-uncased")
with torch.no_grad():
    outputs = model(**tokens)
    last_hidden_state = outputs.last_hidden_state

print("Shape of last hidden state:", last_hidden_state.shape)


## 4. Visualize Embeddings (First Token CLS)

In [None]:
cls_embedding = last_hidden_state[:, 0, :].squeeze().numpy()

plt.plot(cls_embedding[:50])
plt.title("First 50 dimensions of CLS token embedding")
plt.xlabel("Dimension")
plt.ylabel("Value")
plt.grid(True)
plt.show()


## 5. Summary

- Transformers rely on **self-attention** instead of recurrence
- Pretrained models like BERT generate **contextual embeddings**
- `CLS` token is commonly used for classification tasks
- Tokenizers convert raw text into IDs to feed into the model