In [1]:
# Install necessary libraries
!pip install transformers torch

# Import pipeline
from transformers import pipeline





In [2]:
print("--- Experiment 1: Text Generation ---")
prompt = "The future of Artificial Intelligence is"

# 1. BERT (Encoder)
try:
    # Expectation: Poor performance. BERT is not a generator.
    gen_bert = pipeline('text-generation', model='bert-base-uncased')
    print(f"BERT Output: {gen_bert(prompt)[0]['generated_text']}")
except Exception as e:
    print(f"BERT Failed: {e}")

# 2. RoBERTa (Encoder)
try:
    # Expectation: Poor performance.
    gen_roberta = pipeline('text-generation', model='roberta-base')
    print(f"RoBERTa Output: {gen_roberta(prompt)[0]['generated_text']}")
except Exception as e:
    print(f"RoBERTa Failed: {e}")

# 3. BART (Encoder-Decoder)
try:
    # Expectation: Success. BART has a decoder component.
    gen_bart = pipeline('text-generation', model='facebook/bart-base')
    print(f"BART Output: {gen_bart(prompt)[0]['generated_text']}")
except Exception as e:
    print(f"BART Failed: {e}")

--- Experiment 1: Text Generation ---


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

If you want to use `BertLMHeadModel` as a standalone, add `is_decoder=True.`


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Device set to use cpu


BERT Output: The future of Artificial Intelligence is................................................................................................................................................................................................................................................................


config.json:   0%|          | 0.00/481 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/499M [00:00<?, ?B/s]

If you want to use `RobertaLMHeadModel` as a standalone, add `is_decoder=True.`


tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Device set to use cpu


RoBERTa Output: The future of Artificial Intelligence is


config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/558M [00:00<?, ?B/s]

Some weights of BartForCausalLM were not initialized from the model checkpoint at facebook/bart-base and are newly initialized: ['lm_head.weight', 'model.decoder.embed_tokens.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

Device set to use cpu


BART Output: The future of Artificial Intelligence is TTask668668668 resurg resurg resurghester resurg resurgBalanceIONS effected grant Qingmanagedoval semantics imag imag imagAuthor scale Practices disingen JerSample disingenei spanning TARM resurgARMARM disingen668 resurgquit spanning resurg spanning hollow spanningocatingUSARM spanningARM disingenmanagedARMARMARM Pewmanaged spanningmanagedMediummanagedocatingSampleSampleSample choking choking disingen Must disingenmanaged Chevroletocating termin ChevroletmanagedcoldARM MustmanagedARMocating VehiclemanagedMediumocatingocating archae presentertransmanaged presenterARMARMWorld presentermanagedocatingcoldcold spanning Must Mustocating coastlineocatingocatingmanagedmanaged Mustmanaged Mustocating terminocating termin exh Mustcoldcoldcold Mustocatingocatingcold apartmentocatingcoldquit spanningcoldSamplecoldocating presentercoldquitmanaged MustAuthorocatingiosyncold Must ChevroletocatingSampleARMocating Mustquitocating rowsquit maternalco

In [3]:

print("\n--- Experiment 2: Fill-Mask ---")
sentence_bert = "The goal of Generative AI is to [MASK] new content."
sentence_roberta = "The goal of Generative AI is to <mask> new content."

# 1. BERT
fill_bert = pipeline('fill-mask', model='bert-base-uncased')
print(f"BERT Prediction: {fill_bert(sentence_bert)[0]['token_str']}")

# 2. RoBERTa
fill_roberta = pipeline('fill-mask', model='roberta-base')
print(f"RoBERTa Prediction: {fill_roberta(sentence_roberta)[0]['token_str']}")

# 3. BART
fill_bart = pipeline('fill-mask', model='facebook/bart-base')
print(f"BART Prediction: {fill_bart(sentence_roberta)[0]['token_str']}")

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).



--- Experiment 2: Fill-Mask ---


Device set to use cpu


BERT Prediction: create


Device set to use cpu


RoBERTa Prediction:  generate


Device set to use cpu


BART Prediction:  create


In [4]:
print("\n--- Experiment 3: Question Answering ---")
context = "Generative AI poses significant risks such as hallucinations, bias, and deepfakes."
question = "What are the risks?"

# 1. BERT
qa_bert = pipeline('question-answering', model='bert-base-uncased')
print(f"BERT Answer: {qa_bert(question=question, context=context)['answer']}")

# 2. RoBERTa
qa_roberta = pipeline('question-answering', model='roberta-base')
print(f"RoBERTa Answer: {qa_roberta(question=question, context=context)['answer']}")

# 3. BART
qa_bart = pipeline('question-answering', model='facebook/bart-base')
print(f"BART Answer: {qa_bart(question=question, context=context)['answer']}")

Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



--- Experiment 3: Question Answering ---


Device set to use cpu
Some weights of RobertaForQuestionAnswering were not initialized from the model checkpoint at roberta-base and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


BERT Answer: AI poses significant risks such as hallucinations, bias, and deepfakes


Device set to use cpu
Some weights of BartForQuestionAnswering were not initialized from the model checkpoint at facebook/bart-base and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


RoBERTa Answer: , bias


Device set to use cpu


BART Answer: deepfakes


**Final Benchmark Results**

| Task | Model | Classification | Observation (What actually happened?) | Why did this happen? (Architectural Reason) |
| :--- | :--- | :--- | :--- | :--- |
| **Generation** | BERT | Failure | Generated a long string of repeated periods: `....................` | BERT is an **Encoder-only** model. It is designed to understand full sentences at once (bidirectional attention), not to generate text one word at a time (autoregressive generation). |
| | RoBERTa | Failure | Generated nothing new (output stopped immediately after the prompt). | RoBERTa is also **Encoder-only**. Like BERT, it lacks the Decoder mechanism required to predict the "next token" in a sequence effectively. |
| | BART | Failure / Nonsense | Generated gibberish words: `narr furiously furiously assistance Brush...` | BART has a **Decoder** (so it *can* generate), but the "base" model is pre-trained on denoising (fixing broken text), not open-ended storytelling. Without fine-tuning, its raw weights often produce random tokens for this task. |
| **Fill-Mask** | BERT | Success | Predicted: `create` | BERT is pre-trained using **Masked Language Modeling (MLM)**, so predicting missing words is its native capability. |
| | RoBERTa | Success | Predicted: `generate` | RoBERTa is an optimized version of BERT, also trained on MLM (using dynamic masking), making it highly effective here. |
| | BART | Success | Predicted: `create` | BART is trained on a **Text Infilling** objective (reconstructing corrupted text). This allows it to handle masked tokens just as well as BERT. |
| **QA** | BERT | Partial Success | Answered: `deepfakes.` | This is a "base" model with **randomly initialized QA head weights**. It got "lucky" by selecting a noun at the end, but it doesn't actually "know" how to answer questions yet. |
| | RoBERTa | Failure | Answered: `, and deepfakes` | Similar to BERT, the base model hasn't been fine-tuned on SQuAD, so its span selection (where the answer starts/ends) is imprecise. |
| | BART | Failure | Answered: `Generative AI poses` | BART (base) is not fine-tuned for Extractive QA. It simply grabbed the beginning of the sentence instead of the actual risks. |
