# Unit 1 Assignment: The Model Benchmark Challenge

This notebook benchmarks BERT, RoBERTa, and BART on tasks that highlight architectural differences.

## Install/Import Transformers


In [None]:
# If running in a fresh environment, install transformers
# !pip install transformers --quiet

from transformers import pipeline, set_seed


## Models to Test
- **BERT** (`bert-base-uncased`) – Encoder-only
- **RoBERTa** (`roberta-base`) – Encoder-only
- **BART** (`facebook/bart-base`) – Encoder-decoder


## Experiment 1: Text Generation
Prompt: **"The future of Artificial Intelligence is"**


In [None]:
set_seed(42)

generation_prompt = "The future of Artificial Intelligence is"

text_generation_models = {
    "BERT": "bert-base-uncased",
    "RoBERTa": "roberta-base",
    "BART": "facebook/bart-base",
}

text_generation_results = {}

for name, model_id in text_generation_models.items():
    try:
        generator = pipeline("text-generation", model=model_id)
        output = generator(generation_prompt, max_length=40, num_return_sequences=1)
        text_generation_results[name] = output[0]["generated_text"]
    except Exception as exc:
        text_generation_results[name] = f"Failed: {exc}"

text_generation_results


**Hypothesis:** Encoder-only models (BERT, RoBERTa) should struggle with text generation because they are not trained for autoregressive decoding.


## Experiment 2: Masked Language Modeling (Missing Word)
Sentence: **"The goal of Generative AI is to [MASK] new content."**


In [None]:
fill_mask_sentence = "The goal of Generative AI is to [MASK] new content."

mask_models = {
    "BERT": "bert-base-uncased",
    "RoBERTa": "roberta-base",
    "BART": "facebook/bart-base",
}

fill_mask_results = {}

for name, model_id in mask_models.items():
    try:
        masker = pipeline("fill-mask", model=model_id)
        masked_input = fill_mask_sentence.replace("[MASK]", masker.tokenizer.mask_token)
        output = masker(masked_input)
        fill_mask_results[name] = output[0]["token_str"].strip()
    except Exception as exc:
        fill_mask_results[name] = f"Failed: {exc}"

fill_mask_results


**Hypothesis:** BERT and RoBERTa should perform well because they are trained with masked language modeling.


## Experiment 3: Question Answering
Question: **"What are the risks?"**
Context: **"Generative AI poses significant risks such as hallucinations, bias, and deepfakes."**


In [None]:
qa_question = "What are the risks?"
qa_context = "Generative AI poses significant risks such as hallucinations, bias, and deepfakes."

qa_models = {
    "BERT": "bert-base-uncased",
    "RoBERTa": "roberta-base",
    "BART": "facebook/bart-base",
}

qa_results = {}

for name, model_id in qa_models.items():
    try:
        qa_pipeline = pipeline("question-answering", model=model_id)
        output = qa_pipeline(question=qa_question, context=qa_context)
        qa_results[name] = output.get("answer")
    except Exception as exc:
        qa_results[name] = f"Failed: {exc}"

qa_results


**Note:** These base models are not fine-tuned for QA, so answers may be poor or random.


## Observation Table
Fill in the outputs you observe after running each cell.

| Experiment | BERT (bert-base-uncased) | RoBERTa (roberta-base) | BART (facebook/bart-base) | Observations |
| --- | --- | --- | --- | --- |
| Text Generation |  |  |  |  |
| Masked LM |  |  |  |  |
| Question Answering |  |  |  |  |
