In [1]:
!pip install -q transformers torch accelerate

from transformers import pipeline
import pandas as pd # To help display our Observation Table at the end

In [2]:
models = {
    "BERT": "bert-base-uncased",
    "RoBERTa": "roberta-base",
    "BART": "facebook/bart-base"
}

# Dictionary to store our results for the final table
results = []

Experiment 1 - Text Generation

In [4]:
prompt = "The future of Artificial Intelligence is"

print("--- Experiment 1: Text Generation Results ---")
for name, model_path in models.items():
    print(f"\n[Testing Model: {name}]")
    try:
        # Initializing the pipeline for text-generation
        # Note: BERT/RoBERTa aren't natively 'causal' so they might warn you
        generator = pipeline("text-generation", model=model_path)

        # Generating text
        output = generator(prompt, max_new_tokens=15, truncation=True)
        generated_text = output[0]['generated_text']

        # Print the result immediately
        print(f"Result: {generated_text}")
        results.append({"Model": name, "Task": "Text Gen", "Output": generated_text})

    except Exception as e:
        error_msg = f"FAILED (Architecture Mismatch: {str(e)[:50]}...)"
        print(f"Result: {error_msg}")
        results.append({"Model": name, "Task": "Text Gen", "Output": "ERROR"})

print("\n--- Experiment 1 Complete ---")

If you want to use `BertLMHeadModel` as a standalone, add `is_decoder=True.`


--- Experiment 1: Text Generation Results ---

[Testing Model: BERT]


Device set to use cuda:0
If you want to use `RobertaLMHeadModel` as a standalone, add `is_decoder=True.`


Result: The future of Artificial Intelligence is...............

[Testing Model: RoBERTa]


Device set to use cuda:0


Result: The future of Artificial Intelligence is

[Testing Model: BART]


Some weights of BartForCausalLM were not initialized from the model checkpoint at facebook/bart-base and are newly initialized: ['lm_head.weight', 'model.decoder.embed_tokens.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Device set to use cuda:0
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


Result: The future of Artificial Intelligence is salad salad hangs thorough thorough Bry cue thorough thorough thoroughificeificeryptionryption

--- Experiment 1 Complete ---


Experiment 2: Masked Language Modeling (Missing Word)

In [6]:
# Experiment 2: Masked Language Modeling (Fill-Mask)
# Task: Predict the [MASK] token.
print("Experiment 2: Masked Language Modeling")
prompt_mask = "The goal of Generative AI is to [MASK] new content."

for name, model_id in models.items():
    print(f"\nTesting {name} ({model_id})...")
    try:
        # BERT uses [MASK], RoBERTa/BART use <mask>, so we adjust automatically.
        if "roberta" in model_id or "bart" in model_id:
            current_prompt = prompt_mask.replace("[MASK]", "<mask>")
        else:
            current_prompt = prompt_mask

        mask_pipe = pipeline('fill-mask', model=model_id)
        result = mask_pipe(current_prompt)
        print(f"Top prediction: {result[0]['token_str']} (Score: {result[0]['score']:.4f})")
    except Exception as e:
        print(f"Result: FAILED. Error: {e}")

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Experiment 2: Masked Language Modeling

Testing BERT (bert-base-uncased)...


Device set to use cuda:0


Top prediction: create (Score: 0.5397)

Testing RoBERTa (roberta-base)...


Device set to use cuda:0


Top prediction:  generate (Score: 0.3711)

Testing BART (facebook/bart-base)...


Device set to use cuda:0


Top prediction:  create (Score: 0.0746)


Experiment 3: Question Answering


In [7]:
# Experiment 3: Question Answering
# Task: Extract answer from context.
print("Experiment 3: Question Answering")
qa_context = "Generative AI poses significant risks such as hallucinations, bias, and deepfakes."
qa_question = "What are the risks?"

for name, model_id in models.items():
    print(f"\nTesting {name} ({model_id})...")
    try:
        qa_pipe = pipeline('question-answering', model=model_id)
        result = qa_pipe(question=qa_question, context=qa_context)
        print(f"Answer: '{result['answer']}' (Score: {result['score']:.4f})")
    except Exception as e:
        print(f"Result: FAILED or LOW CONFIDENCE. Error: {e}")

Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Experiment 3: Question Answering

Testing BERT (bert-base-uncased)...


Device set to use cuda:0
Some weights of RobertaForQuestionAnswering were not initialized from the model checkpoint at roberta-base and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Answer: ', and deepfakes' (Score: 0.0120)

Testing RoBERTa (roberta-base)...


Device set to use cuda:0
Some weights of BartForQuestionAnswering were not initialized from the model checkpoint at facebook/bart-base and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Answer: 'as hallucinations, bias, and deepfakes' (Score: 0.0080)

Testing BART (facebook/bart-base)...


Device set to use cuda:0


Answer: 'Generative' (Score: 0.0196)


| Task | Model | Classification (Success/Failure) | Observation (What actually happened?) | Why did this happen? (Architectural Reason) |
| :--- | :--- | :--- | :--- | :--- |
| **Generation** | BERT | *Failure* | Generated a repetitive string of dots: `...............` | BERT is an Encoder; it isn't trained to predict the next word in a sequence (lacks causal masking). |
| | RoBERTa | *Failure* | Returned an empty string/only the prompt. | RoBERTa is an Encoder optimized for understanding; it cannot autoregressively generate new tokens. |
| | BART | *Partial Success* | Generated a "word salad": `salad salad hangs thorough...` | BART is an Encoder-Decoder designed for generation, but the "base" weights are un-tuned for coherence. |
| **Fill-Mask** | BERT | *Success* | Predicted '**create**' (Score: 0.5397). | BERT is natively trained on Masked Language Modeling (MLM). |
| | RoBERTa | *Success* | Predicted '**generate**' (Score: 0.3711). | RoBERTa is an optimized Encoder specifically built for high-performance MLM. |
| | BART | *Success* | Predicted '**create**' (Score: 0.0746). | Its Seq2Seq objective includes denoising (filling gaps), though it is less specialized for this than pure Encoders. |
| **QA** | BERT | *Failure* | Returned a fragment: `', and deepfakes'` (Score: 0.0120). | Encoder weights are newly initialized for QA; requires fine-tuning on a dataset like SQuAD to locate answer spans. |
| | RoBERTa | *Failure* | Returned a partial phrase: `'as hallucinations, bias, and deepfakes'` (Score: 0.0080). | Lacks the specific fine-tuning required to map questions to exact context offsets despite its strong understanding. |
| | BART | *Failure* | Returned a single irrelevant word: `'Generative'` (Score: 0.0196). | The Decoder-side weights for extraction are not trained in the base model, leading to random or poor extraction. |