# NLP with Hugging Face Transformers 🧠🤖

In this notebook, you will get hands-on practice with two of the most widely used pre-trained models in Natural Language Processing (NLP): **BERT** and **T5**.  
The goal is to familiarize yourself with how to load, tokenize, run inference, and decode predictions using the [🤗 Transformers library](https://huggingface.co/docs/transformers/index).


## What to expect
By completing these exercises, you will:
- Understand how to use Hugging Face tokenizers and models.
- Practice encoding raw text into model inputs.
- Run models in inference mode and interpret their outputs.
- Explore how masked language modeling (BERT) and sequence-to-sequence generation (T5) work.


👉 Work through the TODOs in each code cell, and run them to check your results.  
Try experimenting with different input sentences/prompts once you get the basic version working!

---


## Exercise 1: Masked Language Modeling with BERT
BERT is a **bidirectional transformer** trained using a **masked language modeling** objective.  
You will:
- Load the pre-trained `bert-base-uncased` model and tokenizer.
- Feed an input sentence containing a `[MASK]` token.
- Predict the missing word using BERT’s output logits.

In [None]:
import torch
from transformers import BertTokenizer, BertForMaskedLM

# TODO: Load pre-trained BERT tokenizer and model ("bert-base-uncased")
tokenizer = ...
model = ...

# Example input with a [MASK] token
text = "The capital of France is [MASK]."

# TODO: Encode the input text into model inputs
inputs = ...

# TODO: Run the model in inference mode
with torch.no_grad():
    outputs = ...
    logits = ...

# TODO: Locate the index of the [MASK] token
mask_token_index = ...

# TODO: Select the most likely prediction for the [MASK]
predicted_token_id = ...
predicted_word = ...

print("Input:    ", text)
print("Prediction:", predicted_word)


## Exercise 2: Sequence-to-Sequence Generation with T5
T5 (Text-to-Text Transfer Transformer) treats every NLP problem as a text-to-text task.  
In this exercise, you will:
- Load the pre-trained `t5-small` model and tokenizer.
- Encode a translation prompt (English → German).
- Generate the translated sentence using T5’s sequence generation.

In [None]:
from transformers import T5Tokenizer, T5ForConditionalGeneration

# TODO: Load pre-trained T5 tokenizer and model ("t5-small")
tokenizer = ...
model = ...

# Example: translation task prompt
text = "translate English to German: The house is wonderful."

# TODO: Encode input text for the model
inputs = ...

# TODO: Generate output sequence from the model
outputs = ...

# TODO: Decode the generated tokens into a string
print("Input:    ", text)
print("Output:   ", ...)


## MCQ 

### 1.1. BART Hybrid Design  

BART combines which two ideas?  

A. Bidirectional encoder + Autoregressive decoder<br>  
B. Masked LM + Next Sentence Prediction<br>  
C. LSTM + CNN<br>  
D. Word2Vec + Transformers<br>  

**Answer:** 

---

### 1.2. BART vs BERT  

Which of the following is TRUE about BART compared to BERT?  

A. BART cannot handle generative tasks<br>  
B. BART performs worse on classification tasks<br>  
C. BART can do both understanding and generation<br>  
D. BART uses only unidirectional encoding<br>  

**Answer:** 

---

### 1.3. BART Architecture  

What architecture does BART use?  

A. Encoder-only<br>  
B. Decoder-only<br>  
C. Encoder-decoder with cross-attention<br>  
D. RNN-based encoder<br>  

**Answer:** 

---

### 1.4. BART Advantage  

Why does BART outperform BERT on some tasks?  

A. It has fewer parameters<br>  
B. It was trained on significantly more data<br>  
C. It does not use positional embeddings<br>  
D. It uses static word embeddings<br>  

**Answer:**

---

### 1.5. BART Corruption Strategies  

Which of the following corruption strategies can BART use during pretraining?  

A. Masking tokens randomly<br>  
B. Permuting sentences<br>  
C. Replacing tokens with random ones<br>  
D. All of the above<br>  

**Answer:** 

---

### 2.1. T5 Full Form  

What does T5 stand for?  

A. Text-to-Text Transfer Transformer<br>  
B. Transformer for 5 tasks<br>  
C. Transferable Transformer Training Technique<br>  
D. Text Transformer for Translation<br>  

**Answer:** 

---

### 2.2. T5 Philosophy  

What is the key design philosophy of T5?  

A. Every NLP task can be framed as a text-to-text problem<br>  
B. Using unidirectional LSTMs<br>  
C. Using static embeddings<br>  
D. Relying only on classification tasks<br>  

**Answer:** 

---

### 2.3. T5 Pretraining  

Which pretraining objective did T5 use?  

A. Masked language modeling (like BERT)<br>  
B. Next sentence prediction<br>  
C. Denoising span corruption (span masking)<br>  
D. Causal LM (like GPT)<br>  

**Answer:** 

---

### 2.4. T5 Architecture  

Which type of transformer architecture does T5 employ?  

A. Encoder-only<br>  
B. Decoder-only<br>  
C. Encoder-decoder<br>  
D. Hybrid LSTM-Transformer<br>  

**Answer:** 

---

### 2.5. T5 Limitations  

Which of the following is NOT a task T5 can handle natively in its framework?  

A. Machine translation<br>  
B. Summarization<br>  
C. Image classification<br>  
D. Question answering<br>  

**Answer:**
