In [8]:
from transformers import pipeline

In [9]:
models = {
    "BERT": "bert-base-uncased",
    "RoBERTa": "roberta-base",
    "BART": "facebook/bart-base"
}

In [10]:
def text_generation(models):
    print("\n===== TEXT GENERATION BENCHMARK =====")
    prompt = "The future of Artificial Intelligence is"

    for name, model_id in models.items():
        print(f"\n{name}:")
        try:
            gen = pipeline("text-generation", model=model_id)
            out = gen(prompt, max_length=30)
            print("Status: SUCCESS")
            print("Output:", out[0]["generated_text"][:100])
        except Exception as e:
            print("Status: FAILURE")
            print("Reason:", e)



In [13]:
text_generation(models)


If you want to use `BertLMHeadModel` as a standalone, add `is_decoder=True.`



===== TEXT GENERATION BENCHMARK =====

BERT:


Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Both `max_new_tokens` (=256) and `max_length`(=30) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
If you want to use `RobertaLMHeadModel` as a standalone, add `is_decoder=True.`


Status: SUCCESS
Output: The future of Artificial Intelligence is............................................................

RoBERTa:


Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Both `max_new_tokens` (=256) and `max_length`(=30) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Status: SUCCESS
Output: The future of Artificial Intelligence is

BART:


Some weights of BartForCausalLM were not initialized from the model checkpoint at facebook/bart-base and are newly initialized: ['lm_head.weight', 'model.decoder.embed_tokens.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Both `max_new_tokens` (=256) and `max_length`(=30) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Status: SUCCESS
Output: The future of Artificial Intelligence isachaacha Idaho Johns droid droid raided outweigh correctraph


In [14]:
def text_fill_mask(models):
    print("\n===== FILL-MASK BENCHMARK =====")

    sentences = {
        "BERT": "The goal of Generative AI is to [MASK] new content.",
        "RoBERTa": "The goal of Generative AI is to <mask> new content.",
        "BART": "The goal of Generative AI is to <mask> new content."
    }

    for name, model_id in models.items():
        print(f"\n{name}:")
        try:
            fill = pipeline("fill-mask", model=model_id)
            outputs = fill(sentences[name])
            print("Status: SUCCESS")
            print("Top predictions:")
            for o in outputs[:3]:
                print("-", o["token_str"])
        except Exception as e:
            print("Status: FAILURE")
            print("Reason:", e)

In [15]:
text_fill_mask(models)

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).



===== FILL-MASK BENCHMARK =====

BERT:


Device set to use cpu


Status: SUCCESS
Top predictions:
- create
- generate
- produce

RoBERTa:


Device set to use cpu


Status: SUCCESS
Top predictions:
-  generate
-  create
-  discover

BART:


Device set to use cpu


Status: SUCCESS
Top predictions:
-  create
-  help
-  provide


In [16]:
def text_qa(models):
    print("\n===== QUESTION ANSWERING BENCHMARK =====")

    context = "Generative AI poses significant risks such as hallucinations, bias, and deepfakes."
    question = "What are the risks?"

    for name, model_id in models.items():
        print(f"\n{name}:")
        try:
            qa = pipeline("question-answering", model=model_id)
            answer = qa(question=question, context=context)
            print("Status: SUCCESS")
            print("Answer:", answer["answer"])
            print("Score:", round(answer["score"], 4))
        except Exception as e:
            print("Status: FAILURE")
            print("Reason:", e)


In [17]:
text_qa(models)

Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



===== QUESTION ANSWERING BENCHMARK =====

BERT:


Device set to use cpu
Some weights of RobertaForQuestionAnswering were not initialized from the model checkpoint at roberta-base and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Status: SUCCESS
Answer: , and deepfakes
Score: 0.0095

RoBERTa:


Device set to use cpu
Some weights of BartForQuestionAnswering were not initialized from the model checkpoint at facebook/bart-base and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Status: SUCCESS
Answer: Generative
Score: 0.0073

BART:


Device set to use cpu


Status: SUCCESS
Answer: such
Score: 0.0333
