In [1]:
!pip install transformers torch


Defaulting to user installation because normal site-packages is not writeable



[notice] A new release of pip is available: 25.2 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


In [2]:
from transformers import pipeline
models = {
    "BERT": "bert-base-uncased",
    "RoBERTa": "roberta-base",
    "BART": "facebook/bart-base"
}


Objective: To understand how model architecture affects performance. 
We are comparing 3 model ie 
BERT – Encoder only

RoBERTa  - Improved Encoder-only 

BART - Encoder–Decoder 

In [4]:
prompt = "Artificial Intelligence will change the world because"

for name, model in models.items():
    print(f"\nModel: {name}")
    try:
        gen = pipeline("text-generation", model=model)
        out = gen(prompt, max_length=40, num_return_sequences=1)
        print(out[0]['generated_text'])
    except Exception as e:
        print("Generation Failed:", e)



Model: BERT


If you want to use `BertLMHeadModel` as a standalone, add `is_decoder=True.`
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


Artificial Intelligence will change the world because...............................

Model: RoBERTa


If you want to use `RobertaLMHeadModel` as a standalone, add `is_decoder=True.`
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


Artificial Intelligence will change the world because

Model: BART


Some weights of BartForCausalLM were not initialized from the model checkpoint at facebook/bart-base and are newly initialized: ['decoder.embed_tokens.weight', 'lm_head.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


Artificial Intelligence will change the world because advocqualifiedrouchrouchrouch advocrouchrouch Laura Searshlhlhlrouchhlhlqualifiedqualifiedauthent CourtesyrouchrouchLeadLeadLeadworldlyLeadLeadplaced


For each model in the models Loaded a text-generation pipeline and Generated a continuation of the sentence 

In [6]:
for name, model in models.items():
    print(f"\nModel: {name}")
    fill = pipeline("fill-mask", model=model)
    
    # Use the correct mask token for that model
    masked = f"Generative AI will transform the {fill.tokenizer.mask_token} industry."
    
    predictions = fill(masked)
    for p in predictions[:3]:
        print(p['token_str'], ":", round(p['score'], 3))



Model: BERT


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


healthcare : 0.077
it : 0.058
software : 0.049

Model: RoBERTa
 tech : 0.107
 healthcare : 0.091
 entertainment : 0.069

Model: BART
 entire : 0.095
 health : 0.043
 healthcare : 0.028


For each model in models Load a fill-mask pipeline and Created a sentence with the model’s own mask token.

In [7]:
context = "Generative AI creates new content such as text, images, audio, and code."
question = "What does Generative AI create?"

for name, model in models.items():
    print(f"\nModel: {name}")
    try:
        qa = pipeline("question-answering", model=model)
        ans = qa(question=question, context=context)
        print("Answer:", ans['answer'])
    except Exception as e:
        print("QA Failed:", e)



Model: BERT


Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Answer: AI creates new content

Model: RoBERTa


Some weights of RobertaForQuestionAnswering were not initialized from the model checkpoint at roberta-base and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Answer: text, images, audio,

Model: BART


Some weights of BartForQuestionAnswering were not initialized from the model checkpoint at facebook/bart-base and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Answer: audio, and code


For each model in models Load a Question Answering pipeline and Asked the model to find the answer from the given context.

Encoder models (BERT, RoBERTa): Best for understanding tasks like classification, fill-mask, and QA.

Decoder models (GPT): Best for text generation and sentence continuation.

Encoder–Decoder models (BART, T5): Best for sequence-to-sequence tasks like summarization and translation.

| Task       | Model   | Classification (Success/Failure) | Observation (What actually happened?)                                                                         | Why did this happen? (Architectural Reason)                                                               |
| ---------- | ------- | -------------------------------- | ------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------- |
| Generation | BERT    | Failure                          | Did not generate meaningful continuation of the sentence.                                                     | BERT is an Encoder; it isn't trained to predict the next word.                                            |
|            | RoBERTa | Failure                          | Could not continue the prompt and produced no fluent output.                                                  | RoBERTa is also an Encoder-only model and cannot predict next tokens sequentially.                        |
|            | BART    | Partial Failure                  | The model produced meaningless, or random words instead of a smooth and logical continuation of the sentence. | BART is an Encoder–Decoder model trained mainly for seq2seq tasks, not causal language modeling like GPT. |
| Fill-Mask  | BERT    | Success                          | Predicted 'create', 'generate'.                                                                               | BERT is trained on Masked Language Modeling (MLM).                                                        |
|            | RoBERTa | Success                          | Produced highly accurate and contextually relevant predictions.                                               | RoBERTa is optimized for MLM with better pretraining.                                                     |
|            | BART    | Partial Success                  | Predictions were less accurate and less confident.                                                            | BART is not primarily designed for MLM tasks.                                                             |
| QA         | BERT    | Success                          | Returned correct but generic answer from the context.                                                         | Encoder models are strong at extractive Question Answering.                                               |
|            | RoBERTa | Success                          | Extracted precise answer spans from the context.                                                              | Better contextual encoding improves span selection.                                                       |
|            | BART    | Partial Success                  | Provided only partial answers from the context.                                                               | Seq2Seq models are not optimized for extractive span-based QA.                                            |


Conclusion : 
Encoder-only models like BERT and RoBERTa perform very well on understanding-based tasks such as Fill-Mask and Question Answering because they are trained using Masked Language Modeling and strong contextual encoding.
BART an Encoder–Decoder model, can generate text and handle sequence-to-sequence tasks, but it is not optimized for causal sentence continuation and its performance is weak compared to BERT and RoBERTa