In [3]:
!pip install transformers torch --quiet


In [4]:
from transformers import pipeline


In [5]:
prompt = "The future of Artificial Intelligence is"


In [6]:
try:
    bert_generator = pipeline("text-generation", model="bert-base-uncased")
    bert_output = bert_generator(prompt, max_length=30)
    print("BERT Output:", bert_output)
except Exception as e:
    print("BERT Error:", e)


To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
If you want to use `BertLMHeadModel` as a standalone, add `is_decoder=True.`
Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Both `max_new_tokens` (=256) and `m

BERT Output: [{'generated_text': 'The future of Artificial Intelligence is................................................................................................................................................................................................................................................................'}]


In [13]:
try:
    roberta_generator = pipeline("text-generation", model="roberta-base")
    roberta_output = roberta_generator(prompt, max_length=30)
    print("RoBERTa Output:", roberta_output)
except Exception as e:
    print("RoBERTa Error:", e)

If you want to use `RobertaLMHeadModel` as a standalone, add `is_decoder=True.`
Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Both `max_new_tokens` (=256) and `max_length`(=30) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


RoBERTa Output: [{'generated_text': 'The future of Artificial Intelligence is'}]


In [10]:
bart_generator = pipeline("text-generation", model="facebook/bart-base")
bart_output = bart_generator(prompt, max_length=40)
print("BART Output:", bart_output)


Some weights of BartForCausalLM were not initialized from the model checkpoint at facebook/bart-base and are newly initialized: ['lm_head.weight', 'model.decoder.embed_tokens.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Both `max_new_tokens` (=256) and `max_length`(=40) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


BART Output: [{'generated_text': 'The future of Artificial Intelligence isidesINGExampleINGExampleAdds heated refine refineDetroitExampleExample heated Planetary clinics Dis stretch Raysenforcement Rays misled misled groom threat scientist revamped heated heated heated refine Defeat Stanford clinics existsoomenforcementenforcement Eye Gamer clinics clinics pepp clinics pepp Eye Eyeive Eye clinics clinics clinics rebel threat Proof clinics Eye Dong clinics clinics 1932 Eye Donn voters process Defeat clinics clinics pavement clinics clinics Kus clinics clinics reperc surf clinics clinics carniv clinics clinics tightly Ru Donn Donn Mirror clinics clinics blending Donn interception wasted clinics wastedergy wasted Mirror adjourn clinics blending Mirror milliseconds Donn process wasted clinics Def wasted Melt clinics wasted DN Featuring intelligent clinics process Eye Eye clinics adjourn Eye Eye Donnhour Ru Donn clinics clinics Donn Donn EyePD existsPD DonnPD intelligent Ru Allow adjourn bl

Experiment 2: Masked Language Modeling (Missing Word)

defining model specific prompts

In [14]:
bert_text = "The goal of Generative AI is to [MASK] new content."
roberta_text = "The goal of Generative AI is to <mask> new content."
bart_text = "The goal of Generative AI is to <mask> new content."


Run Fill-Mask Pipelines

BERT

In [15]:
bert_fm = pipeline("fill-mask", model="bert-base-uncased")
bert_preds = bert_fm(bert_text)

for p in bert_preds[:5]:
    print(p["token_str"], ":", round(p["score"], 4))


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cpu


create : 0.5397
generate : 0.1558
produce : 0.0541
develop : 0.0445
add : 0.0176


RoBERTa

In [16]:
roberta_fm = pipeline("fill-mask", model="roberta-base")
roberta_preds = roberta_fm(roberta_text)

for p in roberta_preds[:5]:
    print(p["token_str"], ":", round(p["score"], 4))


Device set to use cpu


 generate : 0.3711
 create : 0.3677
 discover : 0.0835
 find : 0.0213
 provide : 0.0165


BART

In [17]:
bart_fm = pipeline("fill-mask", model="facebook/bart-base")
bart_preds = bart_fm(bart_text)

for p in bart_preds[:5]:
    print(p["token_str"], ":", round(p["score"], 4))


Device set to use cpu


 create : 0.0746
 help : 0.0657
 provide : 0.0609
 enable : 0.0359
 improve : 0.0332


Experiment 3: Question Answering

In [18]:
question = "What are the risks?"
context = "Generative AI poses significant risks such as hallucinations, bias, and deepfakes."


In [19]:
bert_qa = pipeline("question-answering", model="bert-base-uncased")

bert_answer = bert_qa(
    question=question,
    context=context
)

print("BERT Answer:", bert_answer)


Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Device set to use cpu


BERT Answer: {'score': 0.007461190689355135, 'start': 46, 'end': 61, 'answer': 'hallucinations,'}


In [20]:
roberta_qa = pipeline("question-answering", model="roberta-base")

roberta_answer = roberta_qa(
    question=question,
    context=context
)

print("RoBERTa Answer:", roberta_answer)


Some weights of RobertaForQuestionAnswering were not initialized from the model checkpoint at roberta-base and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Device set to use cpu


RoBERTa Answer: {'score': 0.023195627378299832, 'start': 72, 'end': 81, 'answer': 'deepfakes'}


In [21]:
bart_qa = pipeline("question-answering", model="facebook/bart-base")

bart_answer = bart_qa(
    question=question,
    context=context
)

print("BART Answer:", bart_answer)


Some weights of BartForQuestionAnswering were not initialized from the model checkpoint at facebook/bart-base and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Device set to use cpu


BART Answer: {'score': 0.03156507387757301, 'start': 20, 'end': 67, 'answer': 'significant risks such as hallucinations, bias,'}


Observation Table

| Task           | Model   | Classification (Success/Failure) | Observation (What actually happened?)                                                                     | Why did this happen? (Architectural Reason)                                                                                                                |
| :------------- | :------ | :------------------------------- | :-------------------------------------------------------------------------------------------------------- | :--------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Generation** | BERT    | *Failure*                        | The model produced a long sequence of dots instead of meaningful text.                                    | because BERT is an encoder-only model trained for masked token prediction, not autoregressive next-token generation.                                       |
|                | RoBERTa | *Failure*                        | The output repeated the input prompt without generating any new tokens.                                   | RoBERTa is also encoder only and it lacks a decoder head for text generation.                                                                              |
|                | BART    | *Success (Low quality)*          | The model generated a long continuation, but the text was noisy, repetitive, and semantically incoherent. | BART has an encoder decoder architecture with an autoregressive decoder,  allowing it to generate text even with minimal pretraining for open-ended tasks. |
| **Fill-Mask**  | BERT    | *Success*                        | Predicted highly appropriate verbs such as “create” (0.54) and “generate” with strong confidence.         | BERT is trained using Masked Language Modeling (MLM), making it highly effective at predicting missing words.                                              |
|                | RoBERTa | *Success*                        | Accurately predicted “generate” and “create” with balanced, high confidence scores.                       | RoBERTa is an optimized encoder only model trained extensively on MLM with more data and better tuning.                                                    |
|                | BART    | *Partial Success*                | Predicted reasonable but less precise verbs with much lower confidence scores.                            | BART is trained as a denoising encoder-decoder model, it is not specifically optimized for single token MLM tasks.                                         |
| **QA**         | BERT    | *Partial Success*                | Extracted only a single risk (“hallucinations”) with a very low confidence score.                         | BERT supports extractive QA, but the base model is not fine-tuned on QA datasets like SQuAD.                                                               |
|                | RoBERTa | *Partial Success*                | Returned a different single risk (“deepfakes”), again with low confidence and incomplete coverage.        | RoBERTa can perform span extraction, but it does not have fine-tuning specifically for QA to choose full answer spans.                                     |
|                | BART    | *Partial Success*                | Extracted a longer phrase covering multiple risks, but it still incomplete and low confidence.            | BART can handle sequences, but it is designed for generative seq2seq tasks instead of extractive QA.                                                       |