# BERT Model Loading and Testing with HuggingFace Transformers

## Introduction
**BERT (Bidirectional Encoder Representations from Transformers)** is a groundbreaking pre-trained language model developed by Google. It has revolutionized many Natural Language Processing (NLP) tasks by providing highly effective contextualized word embeddings.

The **HuggingFace Transformers library** has become the de-facto standard for working with state-of-the-art transformer models like BERT, GPT, T5, and many more. It provides easy-to-use interfaces to load pre-trained models and their corresponding tokenizers, facilitating rapid experimentation and deployment.

In this assignment, you will learn how to load a BERT model and its tokenizer, understand its inputs and outputs, and perform basic inference to get contextualized embeddings.

---

## Learning Objectives
Upon completion of this assignment, you should be able to:
- Install and set up the HuggingFace Transformers library.
- Load pre-trained BERT models and their tokenizers using `AutoModel` and `AutoTokenizer`.
- Understand the tokenization process for BERT, including special tokens (`[CLS]`, `[SEP]`) and attention masks.
- Prepare input tensors for BERT models.
- Perform basic forward passes (inference) through a BERT model to obtain outputs like `last_hidden_state` and `pooler_output`.
- Handle batching for multiple input sentences.
- Discuss different BERT variants and their applications.

---

## Setup and Prerequisites
Before you begin, ensure you have the necessary libraries installed. If not, uncomment and run the following cells:

```bash
# pip install transformers torch # or tensorflow
```

---

In [None]:
import torch
from transformers import AutoTokenizer, AutoModel

print(f"PyTorch Version: {torch.__version__}")
print(f"Transformers Version: {transformers.__version__}")

# Check if GPU is available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

---

## Assignment Questions

---

### Question 1: Loading BERT Model and Tokenizer
The `AutoTokenizer` and `AutoModel` classes from HuggingFace Transformers are convenient ways to load the correct tokenizer and model for a given pre-trained checkpoint name (e.g., `'bert-base-uncased'`).

1.  **Load Tokenizer:** Load the tokenizer for the `'bert-base-uncased'` model.
2.  **Load Model:** Load the model for `'bert-base-uncased'`. Move the model to the appropriate `device` (GPU if available, otherwise CPU).
3.  **Inspect:** Print the type of the loaded tokenizer and model. Print the first few layers/modules of the model's structure to get an idea of its architecture (e.g., `model.encoder.layer[0]`).

---

---

### Question 2: Tokenization Process
BERT models process text after it has been converted into numerical token IDs. The tokenizer handles this conversion, along with adding special tokens and creating attention masks.

1.  **Sample Sentence:** Define a sample sentence, e.g., `"Hello, how are you today? I hope you are having a great time learning about BERT." `
2.  **Tokenize:** Tokenize the sentence using your loaded tokenizer. Store the output.
3.  **Examine Output:** Print the following components from the tokenizer's output:
    * `input_ids` (the token IDs)
    * `token_type_ids` (segment IDs)
    * `attention_mask`
4.  **Decode Tokens:** Convert the `input_ids` back into human-readable tokens using `tokenizer.convert_ids_to_tokens()`. Explain the purpose of the special tokens `[CLS]` and `[SEP]` based on your observation.
5.  **Explain Masks:** Briefly explain the purpose of `attention_mask` and `token_type_ids` in BERT's input.

---

---

### Question 3: Basic Inference (Getting Embeddings)
Once the input is tokenized and prepared, you can pass it through the BERT model to obtain contextualized embeddings.

1.  **Prepare Input Tensors:** Take the `input_ids`, `attention_mask`, and `token_type_ids` from Question 2, convert them into PyTorch tensors, and move them to your `device`.
2.  **Forward Pass:** Pass these tensors as arguments to your loaded BERT model. Ensure you disable gradient calculation with `torch.no_grad()` as we're only doing inference.
3.  **Inspect Output:** Print the shapes of the `last_hidden_state` and `pooler_output` from the model's output.
4.  **Explain Outputs:** Briefly explain what `last_hidden_state` represents (the contextualized embeddings for each token) and what `pooler_output` typically represents (the aggregate representation of the entire sequence, often corresponding to the `[CLS]` token's representation after a linear layer and tanh activation).
5.  **CLS Token Embedding:** How would you specifically extract the embedding for the `[CLS]` token from `last_hidden_state`? (Hint: it's typically the first token).

---

---

### Question 4: Handling Multiple Sentences / Batching
In real-world scenarios, you'll often process multiple sentences simultaneously (in batches) for efficiency. HuggingFace tokenizers handle this automatically.

1.  **Sample Sentences:** Define a list of 2-3 short sentences.
2.  **Batch Tokenization:** Tokenize this list of sentences. Remember to set `padding=True` and `truncation=True` to ensure all sequences in the batch have the same length.
3.  **Prepare Batched Input:** Convert the tokenized output to PyTorch tensors and move them to your `device`.
4.  **Batched Forward Pass:** Pass the batched input through the BERT model.
5.  **Output Shapes:** Print the shapes of `last_hidden_state` and `pooler_output` for the batched input. Explain how the shapes differ from the single-sentence case and what each dimension represents.

---

---

### Question 5: Exploring Different BERT Variants (Discussion/Code)
The HuggingFace Hub contains hundreds of pre-trained models. BERT has several variants, each with different properties (size, language, training objective).

1.  **Choose a Variant:** Select one *different* BERT variant from the HuggingFace model hub (e.g., `bert-large-uncased`, `distilbert-base-uncased`, `bert-base-multilingual-cased`, `bert-base-uncased-whole-word-masking`).
2.  **Load and Inspect (Optional):** Load the tokenizer and model for your chosen variant. You can optionally print its total number of parameters (e.g., `model.num_parameters()`).
3.  **Discuss Differences:** Briefly discuss what makes your chosen variant different from `bert-base-uncased` (e.g., size, language, training technique, specific application).

---

---

### Question 6: Applications of BERT (Discussion)
BERT's ability to generate rich contextualized embeddings has made it suitable for a wide array of NLP tasks.

1.  **List 3 NLP Tasks:** Name at least three distinct NLP tasks where BERT models are commonly used. For each task, briefly explain how BERT's capabilities (e.g., contextual embeddings, bidirectional understanding) contribute to its effectiveness.
2.  **Fine-tuning Concept:** Briefly explain the concept of "fine-tuning" a pre-trained BERT model for a downstream task. Why is this approach so powerful?

---

## Submission Guidelines
- Ensure your notebook runs without errors from top to bottom.
- Save your notebook as `your_name_bert_huggingface_assignment.ipynb`.
- Clearly answer all questions and provide explanations where requested in Markdown cells.
- Feel free to add additional code cells or markdown cells for clarity or experimentation.

---