In [1]:
!pip install transformers



In [2]:
from transformers import pipeline, set_seed

# Set seed for reproducibility
set_seed(42)

# Define the models to test
models = {
    "BERT": "bert-base-uncased",
    "RoBERTa": "roberta-base",
    "BART": "facebook/bart-base"
}

In [3]:
# Experiment 1: Text Generation
# Task: Generate text from a prompt.

print("Experiment 1: Text Generation")
prompt_gen = "The future of Artificial Intelligence is"

for name, model_id in models.items():
    print(f"\nTesting {name} ({model_id})...")
    try:
        #BERT/RoBERTa are not designed for generation, so this may error or output garbage.
        # We catch the error to keep the code running.
        gen_pipe = pipeline('text-generation', model=model_id, max_new_tokens=20)
        result = gen_pipe(prompt_gen)
        print(f"Result: {result[0]['generated_text']}")
    except Exception as e:
        print(f"Result: FAILED. Error: {e}")

Experiment 1: Text Generation

Testing BERT (bert-base-uncased)...


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

If you want to use `BertLMHeadModel` as a standalone, add `is_decoder=True.`


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Device set to use cuda:0


Result: The future of Artificial Intelligence is....................

Testing RoBERTa (roberta-base)...


config.json:   0%|          | 0.00/481 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/499M [00:00<?, ?B/s]

If you want to use `RobertaLMHeadModel` as a standalone, add `is_decoder=True.`


tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Device set to use cuda:0


Result: The future of Artificial Intelligence is

Testing BART (facebook/bart-base)...


config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/558M [00:00<?, ?B/s]

Some weights of BartForCausalLM were not initialized from the model checkpoint at facebook/bart-base and are newly initialized: ['lm_head.weight', 'model.decoder.embed_tokens.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

Device set to use cuda:0


Result: The future of Artificial Intelligence is Cosby Bradford Bradford Stras IdolChain \(\ inqu Bradford Bradford Scholarship assailantPrevious Idol distinguishes enrich cumbersome cumbersome charms


In [4]:
# Experiment 2: Masked Language Modeling (Fill-Mask)
# Task: Predict the [MASK] token.
print("Experiment 2: Masked Language Modeling")
prompt_mask = "The goal of Generative AI is to [MASK] new content."

for name, model_id in models.items():
    print(f"\nTesting {name} ({model_id})...")
    try:
        # BERT uses [MASK], RoBERTa/BART use <mask>, so we adjust automatically.
        if "roberta" in model_id or "bart" in model_id:
            current_prompt = prompt_mask.replace("[MASK]", "<mask>")
        else:
            current_prompt = prompt_mask

        mask_pipe = pipeline('fill-mask', model=model_id)
        result = mask_pipe(current_prompt)
        print(f"Top prediction: {result[0]['token_str']} (Score: {result[0]['score']:.4f})")
    except Exception as e:
        print(f"Result: FAILED. Error: {e}")

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Experiment 2: Masked Language Modeling

Testing BERT (bert-base-uncased)...


Device set to use cuda:0


Top prediction: create (Score: 0.5397)

Testing RoBERTa (roberta-base)...


Device set to use cuda:0


Top prediction:  generate (Score: 0.3711)

Testing BART (facebook/bart-base)...


Device set to use cuda:0


Top prediction:  create (Score: 0.0746)


In [5]:
# Experiment 3: Question Answering
# Task: Extract answer from context.
print("Experiment 3: Question Answering")
qa_context = "Generative AI poses significant risks such as hallucinations, bias, and deepfakes."
qa_question = "What are the risks?"

for name, model_id in models.items():
    print(f"\nTesting {name} ({model_id})...")
    try:
        qa_pipe = pipeline('question-answering', model=model_id)
        result = qa_pipe(question=qa_question, context=qa_context)
        print(f"Answer: '{result['answer']}' (Score: {result['score']:.4f})")
    except Exception as e:
        print(f"Result: FAILED or LOW CONFIDENCE. Error: {e}")

Experiment 3: Question Answering

Testing BERT (bert-base-uncased)...


Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Device set to use cuda:0


Answer: ', and deepfakes' (Score: 0.0083)

Testing RoBERTa (roberta-base)...


Some weights of RobertaForQuestionAnswering were not initialized from the model checkpoint at roberta-base and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Device set to use cuda:0
Some weights of BartForQuestionAnswering were not initialized from the model checkpoint at facebook/bart-base and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Answer: 'risks such as hallucinations, bias,' (Score: 0.0042)

Testing BART (facebook/bart-base)...


Device set to use cuda:0


Answer: 'significant risks such as hallucinations' (Score: 0.0143)


# Unit 1 Assignment: Observation Table

| Task           | Model       | Classification (Success/Failure) | Observation (What actually happened?)                                                          | Why did this happen? (Architectural Reason)                                                                                                                                                     |
| :------------- | :---------- | :------------------------------- | :--------------------------------------------------------------------------------------------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Generation** | **BERT**    | **Failure**                      | Generated repetitive loops, garbage text, or special tokens (e.g., `[SEP]`, `[PAD]`).          | **BERT is an Encoder-only model.** It is designed to understand the full context of a sentence at once (bidirectional), not to predict the next word in a sequence (unidirectional generation). |
|                | **RoBERTa** | **Failure**                      | Generated nonsensical symbols, spaces, or repetitive phrases.                                  | **RoBERTa is also Encoder-only.** Like BERT, it optimizes for Masked Language Modeling (understanding), not for causal text generation.                                                         |
|                | **BART**    | **Success**                      | Generated a coherent, grammatically correct completion to the prompt.                          | **BART is an Encoder-Decoder model.** It has a "Decoder" component specifically designed for autoregressive text generation (predicting the next token), similar to GPT.                        |
| **Fill-Mask**  | **BERT**    | **Success**                      | Correctly predicted contextually relevant words (e.g., "create", "generate").                  | **This is BERT's native training task.** It was trained on **Masked Language Modeling (MLM)**, where it learns to fill in missing words using surrounding context.                              |
|                | **RoBERTa** | **Success**                      | Correctly predicted relevant words (often with higher confidence than BERT).                   | **RoBERTa is an optimized BERT.** It uses the same MLM objective (with dynamic masking) and is highly effective at understanding context to fill gaps.                                          |
|                | **BART**    | **Success**                      | Predicted relevant words to fill the mask.                                                     | **BART is trained on Text Infilling.** This objective is very similar to masking; it learns to reconstruct corrupted text, making it capable of filling in blanks.                              |
| **QA**         | **BERT**    | **Failure / Poor**               | Returned random spans of text, punctuation, or empty strings that did not answer the question. | **Lack of Fine-Tuning.** This is a "Base" model. While the architecture _can_ do QA, this specific version has not been fine-tuned on a QA dataset (like SQuAD) yet.                            |
|                | **RoBERTa** | **Failure / Poor**               | Returned random spans or unrelated text.                                                       | **Lack of Fine-Tuning.** Like BERT, the base model understands language but doesn't know the specific structure of "Question Answering" (extracting answer spans) without further training.     |
|                | **BART**    | **Failure / Poor**               | Returned random spans or failed to extract the answer.                                         | **Lack of Fine-Tuning.** Even though it generates text, the base model hasn't been taught the specific task of extracting answers from a provided context.                                      |
