In [28]:
!pip install transformers




In [29]:
from transformers import pipeline



In [30]:
models = {
    "BERT": "bert-base-uncased",
    "RoBERTa": "roberta-base",
    "BART": "facebook/bart-base"
}


In [31]:
prompt = "The future of Artificial Intelligence is"

for name, model in models.items():
    print(f"\n{name}:")
    try:
        generator = pipeline("text-generation", model=model)
        output = generator(prompt, max_length=30)
        print(output)
    except Exception as e:
        print("Error:", e)


If you want to use `BertLMHeadModel` as a standalone, add `is_decoder=True.`



BERT:


Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Both `max_new_tokens` (=256) and `max_length`(=30) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
If you want to use `RobertaLMHeadModel` as a standalone, add `is_decoder=True.`


[{'generated_text': 'The future of Artificial Intelligence is................................................................................................................................................................................................................................................................'}]

RoBERTa:


Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Both `max_new_tokens` (=256) and `max_length`(=30) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


[{'generated_text': 'The future of Artificial Intelligence is'}]

BART:


Some weights of BartForCausalLM were not initialized from the model checkpoint at facebook/bart-base and are newly initialized: ['lm_head.weight', 'model.decoder.embed_tokens.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Both `max_new_tokens` (=256) and `max_length`(=30) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


[{'generated_text': 'The future of Artificial Intelligence isOtherwise ShakOtherwise sure Shak Shak323208 df empir squat Shak chuckms healer df MarxismOtherwisePatrick Walkdy Shak slipsreleased df df32 slips debugger Walk slips Shak denim Shak df df df Walk Drawn Drawn Drawn 361 slipsino Person Princeton Shak chuckSpoilerSpoiler slips df df initially dismant Drawn Crkas Drawn Drawn spotsenc Shak Drawn Drawnino Drawn Drawn opposition Shak workload slips Shak slips df Drawn DrawnEngland Drawn32 Drawn DrawnSel Drawn workload Drawn Drawn Output Drawn Drawn communism Drawn Drawn debugger spots spots dfPost df df workload Drawn df Drawn dfJe Drawn DrawnaxeFrames df kingdoms df Drawnlein Drawn Drawn slips Drawn Drawneller Drawn Drawn df spots spots game Drawn Drawn LT Drawn Drawn Alvin Drawn origins sure Drawn Drawnaze spots Drawn Drawn princip Drawn gameStatus Alvin df df Alvin df impacting impacting spots spotsaily princip futures df principAlert skysc df spots principStatus Drawn Drawn Bee

###Experiment 1

In [32]:
sentence = "The goal of Generative AI is to [MASK] new content."

for name, model in models.items():
    print(f"\n{name}:")
    try:
        fill = pipeline("fill-mask", model=model)
        output = fill(sentence)
        print(output[:3])
    except Exception as e:
        print("Error:", e)



BERT:


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cpu


[{'score': 0.5396937131881714, 'token': 3443, 'token_str': 'create', 'sequence': 'the goal of generative ai is to create new content.'}, {'score': 0.15575705468654633, 'token': 9699, 'token_str': 'generate', 'sequence': 'the goal of generative ai is to generate new content.'}, {'score': 0.05405480042099953, 'token': 3965, 'token_str': 'produce', 'sequence': 'the goal of generative ai is to produce new content.'}]

RoBERTa:


Device set to use cpu


Error: No mask_token (<mask>) found on the input

BART:


Device set to use cpu


Error: No mask_token (<mask>) found on the input


###Experiment 2

In [33]:
question = "What are the risks?"
context = "Generative AI poses significant risks such as hallucinations, bias, and deepfakes."

for name, model in models.items():
    print(f"\n{name}:")
    try:
        qa = pipeline("question-answering", model=model)
        output = qa(question=question, context=context)
        print(output)
    except Exception as e:
        print("Error:", e)


Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



BERT:


Device set to use cpu
Some weights of RobertaForQuestionAnswering were not initialized from the model checkpoint at roberta-base and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


{'score': 0.007581671699881554, 'start': 46, 'end': 81, 'answer': 'hallucinations, bias, and deepfakes'}

RoBERTa:


Device set to use cpu


{'score': 0.004133951384574175, 'start': 43, 'end': 82, 'answer': 'as hallucinations, bias, and deepfakes.'}

BART:


Some weights of BartForQuestionAnswering were not initialized from the model checkpoint at facebook/bart-base and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Device set to use cpu


{'score': 0.027780067175626755, 'start': 0, 'end': 19, 'answer': 'Generative AI poses'}


###Experiment 3

In [None]:
prompt = "Generative AI is a revolutionary technology that"

In [None]:
# Initialize the pipeline with the specific model
fast_generator = pipeline('text-generation', model='distilgpt2')

# Generate text
output_fast = fast_generator(prompt, max_length=50, num_return_sequences=1)
print(output_fast[0]['generated_text'])

config.json:   0%|          | 0.00/762 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/353M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=50) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


#Observation Table

| Task | Model | Classification (Success/Failure) | Observation (What actually happened?) | Why did this happen? (Architectural Reason) |
|------|--------|----------------------------------|----------------------------------------|---------------------------------------------|
| Generation | BERT | Failure | Generated repetitive dots instead of meaningful text. | BERT is an encoder-only model trained for masked token prediction, not for autoregressive text generation. |
| Generation | RoBERTa | Failure | Did not generate any continuation beyond the prompt. | RoBERTa is also encoder-only and lacks a decoder to generate new tokens sequentially. |
| Generation | BART | Partial Success | Generated additional text, but the output was incoherent and nonsensical. | BART has an encoder-decoder architecture that allows generation, but it is not fine-tuned for text generation tasks. |
| Fill-Mask | BERT | Success | Correctly predicted words such as "create", "generate", and "produce". | BERT is trained using Masked Language Modeling (MLM), making it well-suited for predicting missing words. |
| Fill-Mask | RoBERTa | Failure | Produced an error because no valid mask token was found in the input. | RoBERTa expects the mask token `<mask>` instead of `[MASK]`, so the input format was incompatible. |
| Fill-Mask | BART | Failure | Failed to perform masked word prediction. | BART is trained as a denoising autoencoder for sequence-to-sequence tasks rather than standard MLM. |
| QA | BERT | Partial Success | Extracted the correct phrase "hallucinations, bias, and deepfakes" but with a very low confidence score. | The base BERT model is not fine-tuned for question answering, so predictions are unreliable. |
| QA | RoBERTa | Partial Success | Returned a similar phrase "as hallucinations, bias, and deepfakes" with low confidence. | RoBERTa base is not trained specifically for QA tasks such as SQuAD. |
| QA | BART | Failure | Returned "Generative AI poses" instead of the actual risks. | BART base is not specialized for extractive question answering and lacks QA fine-tuning. |


## Conclusion

These experiments show that model architecture strongly influences task performance. Encoder-only models such as BERT and RoBERTa perform well on masked word prediction but fail at text generation. Encoder-decoder models such as BART can generate text but require task-specific fine-tuning for coherent output. Additionally, base models not fine-tuned for question answering produce low-confidence or incorrect answers, demonstrating the importance of both architecture and training objective.
