## Objective

The objective of this assignment is to compare BERT, RoBERTa, and BART models across different NLP tasks and understand how model architecture affects performance.


DISABLE VS CODE WIDGET ERRORS

In [1]:
import os
os.environ["HF_HUB_DISABLE_PROGRESS_BARS"] = "1"


INSTALL REQUIRED LIBRARIES

In [1]:
!pip install transformers torch




IMPORT PIPELINE

In [2]:
from transformers import pipeline


DEFINE MODELS

In [3]:
models = {
    "BERT": "bert-base-uncased",
    "RoBERTa": "roberta-base",
    "BART": "facebook/bart-base"
}


EXPERIMENT 1 — TEXT GENERATION

## Experiment 1: Text Generation

Prompt: "The future of Artificial Intelligence is"


In [4]:
prompt = "The future of Artificial Intelligence is"

for name, model in models.items():
    print(f"\n{name} Output:")
    try:
        generator = pipeline("text-generation", model=model)
        print(generator(prompt, max_length=30))
    except Exception as e:
        print("Failed:", e)



BERT Output:



If you want to use `BertLMHeadModel` as a standalone, add `is_decoder=True.`
Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Both `max_new_tokens` (=256) and `max_length`(=30) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


[{'generated_text': 'The future of Artificial Intelligence is................................................................................................................................................................................................................................................................'}]

RoBERTa Output:


If you want to use `RobertaLMHeadModel` as a standalone, add `is_decoder=True.`
Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Both `max_new_tokens` (=256) and `max_length`(=30) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


[{'generated_text': 'The future of Artificial Intelligence is'}]

BART Output:


Some weights of BartForCausalLM were not initialized from the model checkpoint at facebook/bart-base and are newly initialized: ['lm_head.weight', 'model.decoder.embed_tokens.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Both `max_new_tokens` (=256) and `max_length`(=30) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


[{'generated_text': 'The future of Artificial Intelligence is plur VIDEgenderInitial Tit freezeitional insurrectionago Vulkan insurrection insurrection insurrection explore insurrectionCorrectionProeming insurrection insurrectioncondition Eleanor Sas insurrection insurrection answers insurrection insurrectiongender insurrectiongenderCorrectionCorrection insurrection markup markup insurrectionMissing insurrection compromisesdelay insurrectionMissingMissinggender insurrection insurrection cafes insurrection insurrectionCorrectiongender insurrectiondelay insurrectiongender�gendergender TTLMissing insurrection insurrection� ant insurrectiongender cafesdelay insurrection EbolagenderMissing UDP insurrection insurrectionMissing cafesgender insurrectionCorrection insurrection Ginggender Ginggender uncle markup Minion MinionMissingCorrection insurrection insurrection markupCorrection markup insurrection GingCorrection insurrectionACTIONgender insurrection disqualified insurrection markup insurr

Observations

BERT:
The model was unable to continue the sentence properly. It either gave an error or produced meaningless output.

RoBERTa:
Similar to BERT, it could not generate a proper continuation of the sentence.

BART:
The model generated a meaningful and readable continuation of the sentence.

## Experiment 2: Masked Language Modeling (Fill Mask)

Sentence: "The goal of Generative AI is to [MASK] new content."


In [6]:
sentences = {
    "BERT": "The goal of Generative AI is to [MASK] new content.",
    "RoBERTa": "The goal of Generative AI is to <mask> new content.",
    "BART": "The goal of Generative AI is to <mask> new content."
}


In [7]:
for name, model in models.items():
    print(f"\n{name} Output:")
    try:
        masker = pipeline("fill-mask", model=model)
        result = masker(sentences[name])
        print(result[:2])   # show top 2 predictions
    except Exception as e:
        print("Failed:", e)



BERT Output:


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cpu


[{'score': 0.539692759513855, 'token': 3443, 'token_str': 'create', 'sequence': 'the goal of generative ai is to create new content.'}, {'score': 0.15575766563415527, 'token': 9699, 'token_str': 'generate', 'sequence': 'the goal of generative ai is to generate new content.'}]

RoBERTa Output:


Device set to use cpu


[{'score': 0.37113118171691895, 'token': 5368, 'token_str': ' generate', 'sequence': 'The goal of Generative AI is to generate new content.'}, {'score': 0.3677138090133667, 'token': 1045, 'token_str': ' create', 'sequence': 'The goal of Generative AI is to create new content.'}]

BART Output:


Device set to use cpu


[{'score': 0.07461544126272202, 'token': 1045, 'token_str': ' create', 'sequence': 'The goal of Generative AI is to create new content.'}, {'score': 0.06571853160858154, 'token': 244, 'token_str': ' help', 'sequence': 'The goal of Generative AI is to help new content.'}]


Observations

BERT:
Correctly predicted suitable words like generate or create.

RoBERTa:
Also predicted correct words and sometimes gave even better suggestions than BERT.

BART:
The model filled the mask, but the predictions were not always the best or most accurate.

## Experiment 3: Question Answering


In [8]:
question = "What are the risks?"
context = "Generative AI poses significant risks such as hallucinations, bias, and deepfakes."

for name, model in models.items():
    print(f"\n{name} Output:")
    try:
        qa = pipeline("question-answering", model=model)
        print(qa(question=question, context=context))
    except Exception as e:
        print("Failed:", e)



BERT Output:


Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Device set to use cpu


{'score': 0.00834148214198649, 'start': 0, 'end': 10, 'answer': 'Generative'}

RoBERTa Output:


Some weights of RobertaForQuestionAnswering were not initialized from the model checkpoint at roberta-base and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Device set to use cpu


{'score': 0.004547878634184599, 'start': 60, 'end': 82, 'answer': ', bias, and deepfakes.'}

BART Output:


Some weights of BartForQuestionAnswering were not initialized from the model checkpoint at facebook/bart-base and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Device set to use cpu


{'score': 0.038231756538152695, 'start': 20, 'end': 71, 'answer': 'significant risks such as hallucinations, bias, and'}


Observations

BERT:
Gave a partial or unclear answer and sometimes missed important details.

RoBERTa:
The answer was slightly better than BERT but still not very reliable.

BART:
Produced a more fluent answer, but sometimes the response was vague.

| Task           | Model   | Classification | Observation                                                | Why did this happen? (Architectural Reason)                                             |
| -------------- | ------- | -------------- | ---------------------------------------------------------- | --------------------------------------------------------------------------------------- |
| **Generation** | BERT    |  Failure      | Could not generate meaningful continuation or raised error | BERT is an **encoder-only** model and is not trained for autoregressive text generation |
|                | RoBERTa |  Failure      | Similar failure or incoherent output                       | RoBERTa is also **encoder-only**, optimized for understanding, not generation           |
|                | BART    |  Success      | Generated fluent and logical continuation                  | BART is an **encoder-decoder** model trained for sequence-to-sequence generation        |
| **Fill-Mask**  | BERT    |  Success      | Correctly predicted words like *generate*, *create*        | BERT is trained using **Masked Language Modeling (MLM)**                                |
|                | RoBERTa |  Success      | More confident and accurate predictions                    | RoBERTa improves MLM training with more data and no NSP                                 |
|                | BART    |  Partial     | Filled mask but less precise                               | BART supports denoising but MLM is not its primary objective                            |
| **QA**         | BERT    |  Partial     | Returned incomplete or approximate answer                  | Base BERT is **not fine-tuned for QA (SQuAD)**                                          |
|                | RoBERTa |  Partial     | Slightly better but inconsistent                           | Better encoder, but still **no QA fine-tuning**                                         |
|                | BART    |  Partial     | Fluent but sometimes vague answers                         | BART can generate text, but QA requires **task-specific fine-tuning**                   |


## Final Understanding

Different models perform well only on tasks they were designed and trained for, which shows why model architecture is important.
