# BERT: Bidirectional Encoder Representations from Transformers
This lecture covers the BERT model, its architecture, and a hands-on example for using BERT in a Question Answering (Q&A) system.

In this notebook, we will learn about BERT and implement a Q&A system using the Hugging Face `transformers` library.

## What is BERT?
- BERT is a **transformer-based model** developed by Google AI in 2018.
- It stands for **Bidirectional Encoder Representations from Transformers**.
- BERT is pre-trained on a large corpus of text and uses **bidirectional training** to better understand context.

## Key Innovations of BERT
1. **Bidirectional Context**: Unlike previous models, BERT reads text both **left-to-right** and **right-to-left** simultaneously.
2. **Transformer Architecture**: Uses a transformer model with **self-attention** mechanisms.
3. **Pre-training and Fine-tuning**: BERT is pre-trained on a large dataset and can be fine-tuned for specific NLP tasks like classification, Q&A, etc.

## BERT Model Architecture
- Uses **stacked transformer layers**.
- **Encoder-only model** (unlike GPT, which uses a decoder).
- Each layer applies **self-attention** and **feed-forward networks**.

## How Does BERT Work?
BERT uses two main pre-training tasks:
1. **Masked Language Modeling (MLM)**: Randomly masks words in a sentence and the model tries to predict them.
2. **Next Sentence Prediction (NSP)**: Determines if one sentence follows another.

## Fine-Tuning BERT for NLP Tasks
BERT can be fine-tuned for various NLP tasks:
- **Question Answering (Q&A)**
- **Sentiment Analysis**
- **Named Entity Recognition (NER)**
- **Text Classification**

## BERT for Question Answering
- BERT can extract **answers from a given context**.
- It does this by predicting the **start and end positions** of the answer span in the context.

In [None]:
# Install necessary libraries
!pip install transformers torch

## Importing Required Libraries

In [None]:
import torch
from transformers import BertTokenizer, BertForQuestionAnswering

## Loading Pre-trained BERT Model and Tokenizer

In [None]:
# Load the pre-trained BERT model for Question Answering
model_name = "bert-large-uncased-whole-word-masking-finetuned-squad"
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertForQuestionAnswering.from_pretrained(model_name)

## Define Context and Questions

In [None]:
context = """
OpenAI is an artificial intelligence research laboratory consisting of researchers and engineers. 
It was founded by Elon Musk, Sam Altman, and others with the mission to ensure that artificial general intelligence (AGI) benefits all of humanity. 
OpenAI has developed several state-of-the-art AI technologies, including the GPT series of models and ChatGPT, which are used for natural language understanding and generation.
"""

questions = [
    "Who founded OpenAI?",
    "What is the mission of OpenAI?",
    "What technologies has OpenAI developed?"
]


## Define Question Answering Function

In [None]:
def answer_question(question, context):
    """Given a question and context, return the answer using BERT."""
    inputs = tokenizer(question, context, return_tensors="pt")
    input_ids = inputs["input_ids"].tolist()[0]
    outputs = model(**inputs)
    start_scores = outputs.start_logits
    end_scores = outputs.end_logits

    start_index = torch.argmax(start_scores)
    end_index = torch.argmax(end_scores) + 1
    answer = tokenizer.convert_tokens_to_string(
        tokenizer.convert_ids_to_tokens(input_ids[start_index:end_index])
    )
    return answer


## Running the Q&A System

In [None]:
for question in questions:
    print(f"Question: {question}")
    answer = answer_question(question, context)
    print(f"Answer: {answer}\n")


## Advantages of BERT in Q&A
- **Contextual Understanding**: BERT’s bidirectional nature allows it to better understand the context.
- **Pre-trained on Large Data**: Fine-tuning on specific datasets (like SQuAD) makes it highly effective for Q&A tasks.
- **Versatile**: Can be fine-tuned for a wide range of NLP tasks beyond Q&A.

## Limitations of BERT
- **Computationally Expensive**: Requires significant hardware resources (GPU/TPU).
- **Long Context Limitation**: Standard BERT models have a maximum sequence length of 512 tokens.
- **Not Always Perfect**: May struggle with ambiguous or complex questions.

## Conclusion
- BERT is a powerful transformer model that has transformed NLP.
- Its bidirectional nature, combined with pre-training, allows it to understand text contextually.
- Fine-tuning BERT for Q&A systems is straightforward and highly effective.