In [1]:
!pip install transformers torch --quiet


In [2]:
from transformers import pipeline




In [3]:
models = {
    "BERT": "bert-base-uncased",
    "RoBERTa": "roberta-base",
    "BART": "facebook/bart-base"
}


## Experiment 1: Text Generation

In [4]:
prompt = "The future of Artificial Intelligence is"

for name, model in models.items():
    print(f"\n{name} Output:")
    try:
        generator = pipeline("text-generation", model=model)
        output = generator(prompt, max_length=40)
        print(output[0]["generated_text"])
    except Exception as e:
        print("FAILED:", e)



BERT Output:


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

If you want to use `BertLMHeadModel` as a standalone, add `is_decoder=True.`


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Both `max_new_tokens` (=256) and `max_length`(=40) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


The future of Artificial Intelligence is................................................................................................................................................................................................................................................................

RoBERTa Output:


config.json:   0%|          | 0.00/481 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/499M [00:00<?, ?B/s]

If you want to use `RobertaLMHeadModel` as a standalone, add `is_decoder=True.`


tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Both `max_new_tokens` (=256) and `max_length`(=40) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


The future of Artificial Intelligence is

BART Output:


config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/558M [00:00<?, ?B/s]

Some weights of BartForCausalLM were not initialized from the model checkpoint at facebook/bart-base and are newly initialized: ['lm_head.weight', 'model.decoder.embed_tokens.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Both `max_new_tokens` (=256) and `max_length`(=40) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


The future of Artificial Intelligence is Secure agreed Ben SlaughterOverall deficient norm sameaware identitiesIDAIDA Labphe mileage sacrifice chimpanzeesarbaware books alleging noisesarb QRaware compe chimpanzees slowrestling slow Dinosaur secession Running~~-------- sacrificePurardsards Virtual secession~~ secessionarbawarearbarb Kam Kam roadway sacrifice~~define chimpanzees reper bloss chimpanzeesdefine Virtual VirtualdefinedefineOurarb Virtual noises Running Runningcyclesards massiveards secession Virtualards pitcher chimpanzeespload hiringarbarbarb~~~~ hiring rampant Virtual chimpanzees graduation landscape~~ chimpanzees chimpanzeesarbarb def~~ Running~~atchatch sacrifice~~ sacrificeeni lararbarb tit noises secession~~ universities roadwayards lar Virtual Virtual Virtualatch~~~~~~ lar lar lar alleging secession dB allegingLind~~ hiringaware graduationaware Running graduationaware chimpanzeesardsatch Virtual Virtual rampant chimpanzees rampantorganisms alleging dB dBaware generalar

**Observation:**
BERT and RoBERTa fail to generate meaningful text because they are encoder-only architectures and lack a decoder for autoregressive text generation. BARTfollows an encoder–decoder architecture and is able to generate text, although the output is noisy due to lack of task-specific fine-tuning. This confirms that model architecture determines task suitability.


## Experiment 2: Fill-Mask (Masked Language Modeling)

In [5]:
sentence = "The goal of Generative AI is to [MASK] new content."

for name, model in models.items():
    print(f"\n{name} Output:")
    try:
        fill_mask = pipeline("fill-mask", model=model)
        output = fill_mask(sentence)
        print(output[:3])
    except Exception as e:
        print("FAILED:", e)



BERT Output:


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cpu


[{'score': 0.5396932363510132, 'token': 3443, 'token_str': 'create', 'sequence': 'the goal of generative ai is to create new content.'}, {'score': 0.15575720369815826, 'token': 9699, 'token_str': 'generate', 'sequence': 'the goal of generative ai is to generate new content.'}, {'score': 0.05405500903725624, 'token': 3965, 'token_str': 'produce', 'sequence': 'the goal of generative ai is to produce new content.'}]

RoBERTa Output:


Device set to use cpu


FAILED: No mask_token (<mask>) found on the input

BART Output:


Device set to use cpu


FAILED: No mask_token (<mask>) found on the input


**Observation:**
BERT successfully predicts appropriate words for the masked token because it is explicitly trained using the Masked Language Modeling objective. RoBERTa and BART fail in this experiment due to mask token incompatibility and different pre-training objectives. This highlights that MLM is a core strength of encoder-only models like BERT.


## Experiment 3: Question Answering

In [6]:
question = "What are the risks?"
context = "Generative AI poses significant risks such as hallucinations, bias, and deepfakes."

for name, model in models.items():
    print(f"\n{name} Output:")
    try:
        qa = pipeline("question-answering", model=model)
        output = qa(question=question, context=context)
        print(output)
    except Exception as e:
        print("FAILED:", e)


Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



BERT Output:


Device set to use cpu
Some weights of RobertaForQuestionAnswering were not initialized from the model checkpoint at roberta-base and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


{'score': 0.008687064051628113, 'start': 38, 'end': 60, 'answer': 'such as hallucinations'}

RoBERTa Output:


Device set to use cpu
Some weights of BartForQuestionAnswering were not initialized from the model checkpoint at facebook/bart-base and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


{'score': 0.012086488772183657, 'start': 72, 'end': 82, 'answer': 'deepfakes.'}

BART Output:


Device set to use cpu


{'score': 0.040478650480508804, 'start': 20, 'end': 71, 'answer': 'significant risks such as hallucinations, bias, and'}


**Observation:**
All three base models show weak or partial performance on the question answering task because they are not fine-tuned on QA datasets . BART performs relatively better due to its encoder–decoder structure, but accurate QA requires task-specific fine-tuning regardless of architecture.


In [None]:
## Conclusion:
##These experiments demonstrate that there is no universally “best” model. Encoder-only models like BERT and RoBERTa are best suited for understanding tasks such as masked word prediction, while encoder–decoder models like BART are capable of text generation but require fine-tuning for coherent output. Task performance is determined by both model architecture and training objectives.
