In [1]:
# Install transformers if not already installed
!pip install transformers torch



In [2]:
# Import necessary libraries
from transformers import pipeline
import warnings
warnings.filterwarnings('ignore')

print("Libraries imported successfully!")

Libraries imported successfully!


**Experiment 1: Text Generation**

Setup for Text Generation

In [3]:
prompt = "The future of Artificial Intelligence is"

print("="*80)
print("EXPERIMENT 1: TEXT GENERATION")
print("="*80)
print(f"\nPrompt: '{prompt}'\n")

EXPERIMENT 1: TEXT GENERATION

Prompt: 'The future of Artificial Intelligence is'



Test BERT for Text Generation

In [4]:
print("Testing BERT (bert-base-uncased)...")
print("-" * 50)

try:
    generator_bert = pipeline('text-generation', model='bert-base-uncased')
    result_bert = generator_bert(prompt, max_length=20)
    print(f"Result: {result_bert}")
except Exception as e:
    print(f"ERROR: {type(e).__name__}")
    print(f"Message: {str(e)[:200]}")

print("\n")

Testing BERT (bert-base-uncased)...
--------------------------------------------------


If you want to use `BertLMHeadModel` as a standalone, add `is_decoder=True.`
Device set to use cuda:0
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Both `max_new_tokens` (=256) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Result: [{'generated_text': 'The future of Artificial Intelligence is................................................................................................................................................................................................................................................................'}]




Test RoBERTa for Text Generation

In [5]:
print("Testing RoBERTa (roberta-base)...")
print("-" * 50)

try:
    generator_roberta = pipeline('text-generation', model='roberta-base')
    result_roberta = generator_roberta(prompt, max_length=20)
    print(f"Result: {result_roberta}")
except Exception as e:
    print(f"ERROR: {type(e).__name__}")
    print(f"Message: {str(e)[:200]}")

print("\n")

If you want to use `RobertaLMHeadModel` as a standalone, add `is_decoder=True.`


Testing RoBERTa (roberta-base)...
--------------------------------------------------


Device set to use cuda:0
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Both `max_new_tokens` (=256) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Result: [{'generated_text': 'The future of Artificial Intelligence is'}]




Test BART for Text Generation

In [6]:
print("Testing BART (facebook/bart-base)...")
print("-" * 50)

try:
    generator_bart = pipeline('text-generation', model='facebook/bart-base')
    result_bart = generator_bart(prompt, max_length=20)
    print(f"Result: {result_bart}")
except Exception as e:
    print(f"ERROR: {type(e).__name__}")
    print(f"Message: {str(e)[:200]}")

print("\n")

Testing BART (facebook/bart-base)...
--------------------------------------------------


Some weights of BartForCausalLM were not initialized from the model checkpoint at facebook/bart-base and are newly initialized: ['lm_head.weight', 'model.decoder.embed_tokens.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Device set to use cuda:0
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Both `max_new_tokens` (=256) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Result: [{'generated_text': 'The future of Artificial Intelligence isURA patrons patrons Controller Ole Ole OleDON Ole Nost SY Ole subsection Ole Guests Guests Nostrahim Nost patrons patronsIPS flush patrons patrons Ole Ole Pe Nost Nost patrons Nost Might patronslette patronsorative flush flush flush Guests futures Nost Ole Ole PCR nurture Pe flush flush sack ornabove flush disorderagements deepen Become Nost Nost nurture kind patrons propagate propagate NostRat Nost Sevenwas ravumo demonstrate Guests "\'Elizabethposeセ Leopard Nost Nost demonstrate candles Seven appre demonstrate demonstrate Nostintersinters drawbacknell demonstrate Seven nurture nurture demonstrate nurture demonstratelyn nurture propagate rav Mand grosswas nurture corrections rav rav mammals rav Nost nurtureusher Nost Nost candles candles demonstrate Pe Nost gross candleswas demonstrate propagate rav rav rav wandering Nost NostCharge drawback Nost Nost Array drawback demonstrate refresh hotsldom propagate Seven exhibi

**Experiment 2: Fill-Mask**

Setup for Fill-Mask

In [7]:
# Note: Each model uses different mask tokens
text_bert = "The goal of Generative AI is to [MASK] new content."
text_roberta = "The goal of Generative AI is to <mask> new content."
text_bart = "The goal of Generative AI is to <mask> new content."

print("="*80)
print("EXPERIMENT 2: MASKED LANGUAGE MODELING (FILL-MASK)")
print("="*80)
print()

EXPERIMENT 2: MASKED LANGUAGE MODELING (FILL-MASK)



Test BERT for Fill-Mask

In [8]:
print("Testing BERT (bert-base-uncased)...")
print("-" * 50)
print(f"Input: '{text_bert}'\n")

try:
    fill_mask_bert = pipeline('fill-mask', model='bert-base-uncased')
    result_bert = fill_mask_bert(text_bert)
    print("Top 3 predictions:")
    for i, pred in enumerate(result_bert[:3], 1):
        print(f"{i}. '{pred['token_str']}' (score: {pred['score']:.4f})")
except Exception as e:
    print(f"ERROR: {str(e)[:200]}")

print("\n")

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Testing BERT (bert-base-uncased)...
--------------------------------------------------
Input: 'The goal of Generative AI is to [MASK] new content.'



Device set to use cuda:0


Top 3 predictions:
1. 'create' (score: 0.5397)
2. 'generate' (score: 0.1558)
3. 'produce' (score: 0.0541)




Test RoBERTa for Fill-Mask

In [9]:
print("Testing RoBERTa (roberta-base)...")
print("-" * 50)
print(f"Input: '{text_roberta}'\n")

try:
    fill_mask_roberta = pipeline('fill-mask', model='roberta-base')
    result_roberta = fill_mask_roberta(text_roberta)
    print("Top 3 predictions:")
    for i, pred in enumerate(result_roberta[:3], 1):
        print(f"{i}. '{pred['token_str']}' (score: {pred['score']:.4f})")
except Exception as e:
    print(f"ERROR: {str(e)[:200]}")

print("\n")

Testing RoBERTa (roberta-base)...
--------------------------------------------------
Input: 'The goal of Generative AI is to <mask> new content.'



Device set to use cuda:0


Top 3 predictions:
1. ' generate' (score: 0.3711)
2. ' create' (score: 0.3677)
3. ' discover' (score: 0.0835)




Test BART for Fill-Mask

In [10]:
print("Testing BART (facebook/bart-base)...")
print("-" * 50)
print(f"Input: '{text_bart}'\n")

try:
    fill_mask_bart = pipeline('fill-mask', model='facebook/bart-base')
    result_bart = fill_mask_bart(text_bart)
    print("Top 3 predictions:")
    for i, pred in enumerate(result_bart[:3], 1):
        print(f"{i}. '{pred['token_str']}' (score: {pred['score']:.4f})")
except Exception as e:
    print(f"ERROR: {str(e)[:200]}")

print("\n")

Testing BART (facebook/bart-base)...
--------------------------------------------------
Input: 'The goal of Generative AI is to <mask> new content.'



Device set to use cuda:0


Top 3 predictions:
1. ' create' (score: 0.0746)
2. ' help' (score: 0.0657)
3. ' provide' (score: 0.0609)




**Experiment 3: Question Answering**

 Setup for QA

In [11]:
context = "Generative AI poses significant risks such as hallucinations, bias, and deepfakes."
question = "What are the risks?"

print("="*80)
print("EXPERIMENT 3: QUESTION ANSWERING")
print("="*80)
print(f"\nContext: '{context}'")
print(f"Question: '{question}'\n")

EXPERIMENT 3: QUESTION ANSWERING

Context: 'Generative AI poses significant risks such as hallucinations, bias, and deepfakes.'
Question: 'What are the risks?'



Test BERT for QA

In [12]:
print("Testing BERT (bert-base-uncased)...")
print("-" * 50)

try:
    qa_bert = pipeline('question-answering', model='bert-base-uncased')
    result_bert = qa_bert(question=question, context=context)
    print(f"Answer: '{result_bert['answer']}'")
    print(f"Confidence Score: {result_bert['score']:.4f}")
except Exception as e:
    print(f"ERROR: {str(e)[:200]}")

print("\n")

Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Testing BERT (bert-base-uncased)...
--------------------------------------------------


Device set to use cuda:0


Answer: ', and deepfakes'
Confidence Score: 0.0073




Test RoBERTa for QA

In [13]:
print("Testing RoBERTa (roberta-base)...")
print("-" * 50)

try:
    qa_roberta = pipeline('question-answering', model='roberta-base')
    result_roberta = qa_roberta(question=question, context=context)
    print(f"Answer: '{result_roberta['answer']}'")
    print(f"Confidence Score: {result_roberta['score']:.4f}")
except Exception as e:
    print(f"ERROR: {str(e)[:200]}")

print("\n")

Some weights of RobertaForQuestionAnswering were not initialized from the model checkpoint at roberta-base and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Testing RoBERTa (roberta-base)...
--------------------------------------------------


Device set to use cuda:0


Answer: 'Generative AI poses significant risks such as hallucinations, bias,'
Confidence Score: 0.0089




Test BART for QA

In [14]:
print("Testing BART (facebook/bart-base)...")
print("-" * 50)

try:
    qa_bart = pipeline('question-answering', model='facebook/bart-base')
    result_bart = qa_bart(question=question, context=context)
    print(f"Answer: '{result_bart['answer']}'")
    print(f"Confidence Score: {result_bart['score']:.4f}")
except Exception as e:
    print(f"ERROR: {str(e)[:200]}")

print("\n")

Some weights of BartForQuestionAnswering were not initialized from the model checkpoint at facebook/bart-base and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Testing BART (facebook/bart-base)...
--------------------------------------------------


Device set to use cuda:0


Answer: 'deepfakes.'
Confidence Score: 0.0526




## Deliverable: Observation Table

Based on the experiments conducted, here are the observations:

| Task | Model | Classification (Success/Failure) | Observation (What actually happened?) | Why did this happen? (Architectural Reason) |
|------|-------|----------------------------------|---------------------------------------|----------------------------------------------|
| **Generation** | BERT | Failure | Generated only periods/dots: '........' (nonsensical repetitive symbols) | BERT is an Encoder-only model; it lacks a language modeling head and wasn't trained to predict the next word autoregressively |
| | RoBERTa | Failure | Simply repeated the input prompt with no new text generated | RoBERTa is also Encoder-only; like BERT, it's designed for understanding context, not generating new sequences |
| | BART | Failure | Generated random, incoherent words: 'patrons', 'Ole', 'flush', 'demonstrate', 'nurture', etc. (complete gibberish) | BART has Encoder-Decoder architecture suitable for generation, BUT the base model wasn't trained as a causal language model - it needs task-specific fine-tuning |
| **Fill-Mask** | BERT | Success | Predicted 'create' (0.5397), 'generate' (0.1558), 'produce' (0.0541) - all semantically correct | BERT is trained on Masked Language Modeling (MLM) as its core pre-training objective; this is exactly what it was designed for |
| | RoBERTa | Success | Predicted 'generate' (0.3711), 'create' (0.3677), 'discover' (0.0835) - semantically accurate predictions | RoBERTa uses optimized MLM training with more data, dynamic masking, and longer training; performs excellently on its native task |
| | BART | Success | Predicted 'create' (0.0746), 'help' (0.0657), 'provide' (0.0609) - reasonable but lower confidence scores | BART uses a denoising autoencoder approach with text infilling capabilities, but MLM isn't its primary training objective, hence lower confidence |
| **QA** | BERT | Failure | Answered ', and deepfakes' with very low confidence (0.0073) - extracted wrong span | Base BERT has suitable architecture for span extraction but isn't fine-tuned on SQuAD or QA datasets; picks arbitrary spans |
| | RoBERTa | Failure | Answered with nearly the entire context sentence (0.0089 confidence) - failed to identify the specific answer span | Similar to BERT - encoder architecture is suitable but lacks QA fine-tuning; extremely low confidence indicates poor performance |
| | BART | Failure | Answered 'deepfakes.' with low confidence (0.0526) - partially correct but incomplete | BART's Encoder-Decoder architecture is designed for generation tasks, not extractive QA; it's not fine-tuned for span extraction tasks |