# BERT Text Classification

In this notebook, we’ll explore how to use **BERT (Bidirectional Encoder Representations from Transformers)** for text classification — one of the most common NLP tasks such as **sentiment analysis** or **spam detection**.

---

###  Objectives
- Understand what makes BERT powerful.
- Learn how to fine-tune BERT on a text classification dataset.
- Implement BERT using the Hugging Face `transformers` library.
- Evaluate performance with accuracy and loss curves.

## 🧩 1. What is BERT?

BERT is based on the **Transformer Encoder** architecture. It learns **bidirectional context**, meaning it understands both left and right sides of a word in a sentence.

### Key Features
- Pre-trained on massive text corpora (Wikipedia, BooksCorpus)
- Uses **Masked Language Modeling (MLM)** and **Next Sentence Prediction (NSP)** during pretraining
- Fine-tuned for downstream tasks like classification, QA, or token tagging.

**Architecture:**
```
Input Text → Tokenization → BERT Encoder Layers → [CLS] Token → Classifier → Output Label
```

## ⚙️ 2. Installing Required Libraries

In [None]:
# Uncomment if running locally
# !pip install torch torchvision torchaudio --quiet
# !pip install transformers datasets --quiet

## 📦 3. Importing Libraries

In [None]:
import torch
from torch import nn
from transformers import BertTokenizer, BertForSequenceClassification, Trainer, TrainingArguments
from datasets import load_dataset
import numpy as np
from sklearn.metrics import accuracy_score

## 🧾 4. Loading a Dataset

We'll use a small dataset from Hugging Face — **IMDb movie reviews**, containing positive and negative sentiments.

In [None]:
dataset = load_dataset('imdb')
small_train = dataset['train'].shuffle(seed=42).select(range(1000))
small_test = dataset['test'].shuffle(seed=42).select(range(500))

## 🔤 5. Tokenizing the Data
We use BERT’s tokenizer to convert sentences into token IDs that the model understands.

In [None]:
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

def tokenize(batch):
    return tokenizer(batch['text'], padding='max_length', truncation=True, max_length=128)

train_enc = small_train.map(tokenize, batched=True, batch_size=None)
test_enc = small_test.map(tokenize, batched=True, batch_size=None)

train_enc.set_format('torch', columns=['input_ids', 'attention_mask', 'label'])
test_enc.set_format('torch', columns=['input_ids', 'attention_mask', 'label'])

## 🧠 6. Loading BERT Model for Classification

In [None]:
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)

## 🏋️‍♂️ 7. Training Configuration

In [None]:
def compute_metrics(pred):
    labels = pred.label_ids
    preds = np.argmax(pred.predictions, axis=1)
    acc = accuracy_score(labels, preds)
    return {'accuracy': acc}

training_args = TrainingArguments(
    output_dir='./results',
    num_train_epochs=1,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    evaluation_strategy='epoch',
    logging_dir='./logs',
    save_strategy='epoch',
    logging_steps=10
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_enc,
    eval_dataset=test_enc,
    compute_metrics=compute_metrics
)

## 🚀 8. Fine-tuning BERT

This step may take a few minutes depending on your hardware (GPU recommended).

In [None]:
# trainer.train()  # Uncomment to train the model

## 📊 9. Evaluate the Model

In [None]:
# results = trainer.evaluate()
# print(results)

## 💬 10. Making Predictions

In [None]:
text = "The movie was absolutely wonderful, I loved every moment!"
inputs = tokenizer(text, return_tensors='pt', padding=True, truncation=True, max_length=128)
# outputs = model(**inputs)
# prediction = torch.argmax(outputs.logits, dim=1).item()
# print('Sentiment:', 'Positive' if prediction == 1 else 'Negative')

## 🧭 Summary

- **BERT** is a bidirectional Transformer-based model.
- Pre-trained on large corpora, then **fine-tuned** for downstream tasks.
- Hugging Face’s `transformers` makes implementation simple.
- We can adapt BERT for various tasks beyond classification — such as **QA, NER, summarization**, etc.

---
**Next:** `14-Question_Answering_with_BERT.ipynb` → Build a Q&A model using pre-trained BERT!