In [None]:
!pip install transformers torch
from transformers import pipeline



# Experiment 1: Text Generation
Prompt : "The future of Artificial Intelligence is"

In [None]:
#BERT
gen_bert = pipeline("text-generation", model="bert-base-uncased")
gen_bert("The future of Artificial Intelligence is")


If you want to use `BertLMHeadModel` as a standalone, add `is_decoder=True.`
Device set to use cpu


[{'generated_text': 'The future of Artificial Intelligence is................................................................................................................................................................................................................................................................'}]

In [None]:
#RoBERTa
gen_roberta = pipeline("text-generation", model="roberta-base")
gen_roberta("The future of Artificial Intelligence is")


If you want to use `RobertaLMHeadModel` as a standalone, add `is_decoder=True.`
Device set to use cpu


[{'generated_text': 'The future of Artificial Intelligence is'}]

In [None]:
#BART
gen_bart = pipeline("text-generation", model="facebook/bart-base")
gen_bart("The future of Artificial Intelligence is", max_length=30)


Some weights of BartForCausalLM were not initialized from the model checkpoint at facebook/bart-base and are newly initialized: ['lm_head.weight', 'model.decoder.embed_tokens.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Both `max_new_tokens` (=256) and `max_length`(=30) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


[{'generated_text': 'The future of Artificial Intelligence is duck goodicho Assessment Christine ton 1200 Assessment Assessment Assessment laughterMKTemperature Genetic Assessment dramas Assessment Langreddederationederation Transactions Assessment Christine total good Inn Inn Inncient Assessment Assessment Tunnel AssessmentCurrent enslcheat«cheat offended Christine Christine Christine total Christine bitcoin toncheati Assessment teenager took Assessment 1929 Assessmentaging« Transactions Inn duck 570 Assessment Assessment Christineexcluding prank dashed teenager AUTH GeneticMK Inn Genetic cooperative Genetic scissorsRedd teenager tonii Genetic AssessmentGirl Genetic toni hat Assessment Assessment 1929 ton teenager AUTH ton WITHOUT Assessment Inn Inn insists toni Assessment Inn teenager vaping ton Assessment teenager Assessmentvere Inn Assessment bitcoin Inn Assessment scarf insists ton teenager bedrocki Genetici Assessment== teenager bitcoin tonantha smear Assessment 1929iMK ton teena

None of the tested models produced accurate free-form text generation, as their architectures and training objectives are not optimized for causal language modeling.

# Experiment 2: Fill-Mask (Masked Language Modeling)
Sentence : "The goal of Generative AI is to [MASK] new content."

In [None]:
#BERT
fill_bert = pipeline("fill-mask", model="bert-base-uncased")
fill_bert("The goal of Generative AI is to [MASK] new content.")

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cpu


[{'score': 0.5396932363510132,
  'token': 3443,
  'token_str': 'create',
  'sequence': 'the goal of generative ai is to create new content.'},
 {'score': 0.15575720369815826,
  'token': 9699,
  'token_str': 'generate',
  'sequence': 'the goal of generative ai is to generate new content.'},
 {'score': 0.05405500903725624,
  'token': 3965,
  'token_str': 'produce',
  'sequence': 'the goal of generative ai is to produce new content.'},
 {'score': 0.04451530799269676,
  'token': 4503,
  'token_str': 'develop',
  'sequence': 'the goal of generative ai is to develop new content.'},
 {'score': 0.01757744885981083,
  'token': 5587,
  'token_str': 'add',
  'sequence': 'the goal of generative ai is to add new content.'}]

In [None]:
#RoBERTa
fill_roberta = pipeline("fill-mask", model="roberta-base")
fill_roberta("The goal of Generative AI is to <mask> new content.")

Device set to use cpu


[{'score': 0.3711312413215637,
  'token': 5368,
  'token_str': ' generate',
  'sequence': 'The goal of Generative AI is to generate new content.'},
 {'score': 0.3677145540714264,
  'token': 1045,
  'token_str': ' create',
  'sequence': 'The goal of Generative AI is to create new content.'},
 {'score': 0.08351420611143112,
  'token': 8286,
  'token_str': ' discover',
  'sequence': 'The goal of Generative AI is to discover new content.'},
 {'score': 0.021335121244192123,
  'token': 465,
  'token_str': ' find',
  'sequence': 'The goal of Generative AI is to find new content.'},
 {'score': 0.016521666198968887,
  'token': 694,
  'token_str': ' provide',
  'sequence': 'The goal of Generative AI is to provide new content.'}]

In [None]:
#BART
fill_bart = pipeline("fill-mask", model="facebook/bart-base")
fill_bart("The goal of Generative AI is to <mask> new content.")

Device set to use cpu


[{'score': 0.07461541891098022,
  'token': 1045,
  'token_str': ' create',
  'sequence': 'The goal of Generative AI is to create new content.'},
 {'score': 0.06571870297193527,
  'token': 244,
  'token_str': ' help',
  'sequence': 'The goal of Generative AI is to help new content.'},
 {'score': 0.060880109667778015,
  'token': 694,
  'token_str': ' provide',
  'sequence': 'The goal of Generative AI is to provide new content.'},
 {'score': 0.03593561053276062,
  'token': 3155,
  'token_str': ' enable',
  'sequence': 'The goal of Generative AI is to enable new content.'},
 {'score': 0.03319477662444115,
  'token': 1477,
  'token_str': ' improve',
  'sequence': 'The goal of Generative AI is to improve new content.'}]

All three models performed similarly on the fill-mask task because they were trained to reconstruct missing tokens using contextual information.

# Experiment 3: Question Answering
Question : "What are the risks?"

Context : "Generative AI poses significant risks such as hallucinations, bias, and deepfakes."


In [None]:
#BERT
qa_bert = pipeline("question-answering", model="bert-base-uncased")

qa_bert({
    "question": "What are the risks?",
    "context": "Generative AI poses significant risks such as hallucinations, bias, and deepfakes."
})

Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Device set to use cpu


{'score': 0.010208480060100555,
 'start': 0,
 'end': 71,
 'answer': 'Generative AI poses significant risks such as hallucinations, bias, and'}

In [None]:
#RoBERTa
qa_roberta = pipeline("question-answering", model="roberta-base")

qa_roberta({
    "question": "What are the risks?",
    "context": "Generative AI poses significant risks such as hallucinations, bias, and deepfakes."
})

Some weights of RobertaForQuestionAnswering were not initialized from the model checkpoint at roberta-base and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Device set to use cpu


{'score': 0.007568269269540906, 'start': 0, 'end': 10, 'answer': 'Generative'}

In [None]:
#BART
qa_bart = pipeline("question-answering", model="facebook/bart-base")

qa_bart({
    "question": "What are the risks?",
    "context": "Generative AI poses significant risks such as hallucinations, bias, and deepfakes."
})

Some weights of BartForQuestionAnswering were not initialized from the model checkpoint at facebook/bart-base and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Device set to use cpu


{'score': 0.051301321014761925, 'start': 72, 'end': 82, 'answer': 'deepfakes.'}

The QA pipeline performs extractive question answering, which aligns well with BERT’s encoder-only architecture. RoBERTa and BART were not fine-tuned or architecturally optimized for extractive QA in this configuration.

# Deliverable: Observation Table

| Task        | Model     | Classification (Success/Failure) | Observation (What actually happened?) | Why did this happen? (Architectural Reason) |
|------------|-----------|----------------------------------|---------------------------------------|---------------------------------------------|
| Generation | BERT      | Failure                          | Generated random output with .....         | Encoder-only model so it cannot generate text autoregressively |
| Generation | RoBERTa   | Failure                          | Just returened the same sentence as output            | Encoder-only model which lacks decoder for generation |
| Generation | BART      | Success                          | The model produced incoherent and repetitive text containing unrelated tokens and no meaningful semantic structure    | Encoder–Decoder architecture enables generation |
| Fill-Mask  | BERT      | Success                          | Correctly predicted missing words      | Trained using Masked Language Modeling (MLM) |
| Fill-Mask  | RoBERTa   | Success                          | High-quality predictions               | Improved MLM training and more data |
| Fill-Mask  | BART      | Success                          | Filled masked token correctly          | Denoising autoencoder objective |
| QA         | BERT      | Partial Success                  | Extracted partial answer from the context              | Not fine-tuned on QA datasets |
| QA         | RoBERTa   | Partial Success                  | One word answer from context only and low confidence              | Strong encoder but no QA fine-tuning |
| QA         | BART      | Partial Success                  | One word answer  from context with extremely low confidence                  | Base model not optimized for extractive QA |
