* Designing different prompts to explore how varying instructions impact the model's output.

* By using a pretrained model for summarization task



In [3]:
# Importing required python libraries.

import torch
from transformers import GPT2LMHeadModel, GPT2Tokenizer
import nltk
from rouge_score import rouge_scorer
from nltk.translate.bleu_score import sentence_bleu

# Downloading required NLTK data
nltk.download('punkt')


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

In [4]:
# Loading pre-trained GPT-2 model and tokenizer
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = GPT2LMHeadModel.from_pretrained("gpt2")



The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]



model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

In [5]:
# Creating a function to generate text based on the prompts.

def generate_response_gpt2(prompt, max_length=150, temperature=0.7):
    inputs = tokenizer(prompt, return_tensors="pt")
    outputs = model.generate(
        inputs.input_ids,
        max_length=max_length,
        temperature=temperature,
        num_return_sequences=1,
        no_repeat_ngram_size=2
    )
    return tokenizer.decode(outputs[0], skip_special_tokens=True)


In [6]:
# Designing propts for summarization task.

def summarization_prompts(text):
    prompts = {
        "baseline": f"Summarize the following article:\n{text}",
        "three_sentence": f"In three sentences, summarize the key points of the following article:\n{text}",
        "contextual": f"Based on the context, write a concise summary of this article:\n{text}",
        "few_shot": f"Example: 'The economy is facing challenges due to inflation.' Now summarize the next article:\n{text}"
    }
    return prompts


In [7]:
# Evaluating the generated summary with evaluation metrics

def evaluate_summaries(reference, generated):

    # ROUGE evaluation
    scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)
    rouge_scores = scorer.score(reference, generated)

    # BLEU evaluation
    reference_tokens = [nltk.word_tokenize(reference)]
    generated_tokens = nltk.word_tokenize(generated)
    bleu_score = sentence_bleu(reference_tokens, generated_tokens)

    return rouge_scores, bleu_score


In [8]:
# Experimenting with different prompts

def run_experiment(text, reference_summary):
    prompts = summarization_prompts(text)
    results = {}

    for prompt_type, prompt in prompts.items():
        # Generate response from GPT-2
        generated_summary = generate_response_gpt2(prompt)
        print(f"Generated Summary ({prompt_type}):\n", generated_summary, "\n")

        # Evaluate the summary
        rouge_scores, bleu_score = evaluate_summaries(reference_summary, generated_summary)
        results[prompt_type] = {
            "generated_summary": generated_summary,
            "rouge_scores": rouge_scores,
            "bleu_score": bleu_score
        }

    return results


In [10]:
# Generating the summary

if __name__ == "__main__":

    text = """Machine learning is a subset of artificial intelligence (AI) that allows machines to learn and improve
     from experience without explicit programming."""

    reference_summary = "Machine learning allows machines to learn and predict the result."

    # Running the experiment
    results = run_experiment(text, reference_summary)


    for prompt_type, result in results.items():
        print(f"Prompt: {prompt_type}")
        print(f"Generated Summary: {result['generated_summary']}")
        print(f"ROUGE Scores: {result['rouge_scores']}")
        print(f"BLEU Score: {result['bleu_score']}\n")


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Generated Summary (baseline):
 Summarize the following article:
Machine learning is a subset of artificial intelligence (AI) that allows machines to learn and improve
     from experience without explicit programming. Machine learning can be used to improve the performance of a machine, but it can also be applied to other tasks, such as learning to read a book, or to perform a task that requires a certain amount of memory.
The following is an example of machine learning. The machine learns to recognize a given word, and then uses that word to determine the correct answer. It then learns the word by looking at the right side of the screen, then looks at that side to see if it is correct. If it does not, it will try to guess the answer 



The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Generated Summary (three_sentence):
 In three sentences, summarize the key points of the following article:
Machine learning is a subset of artificial intelligence (AI) that allows machines to learn and improve
     from experience without explicit programming.

The following is an excerpt from the book, "Machine Learning: The Science of Learning and the Future of Human Intelligence," by David A. Karpeles, PhD, and published by the University of California, Berkeley. The book is available at http://www.academia.edu/karpels/machinelearning.html. 



The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Generated Summary (contextual):
 Based on the context, write a concise summary of this article:
Machine learning is a subset of artificial intelligence (AI) that allows machines to learn and improve
     from experience without explicit programming. Machine learning can be used to improve the performance of a machine, but it can also be applied to other tasks, such as learning to read a book, or to perform a task.
The following is an example of machine learning. It is based on a simple example, and is not intended to be a complete description of the techniques. The following code is for a typical machine-learning program. This example is intended for use with the following tools: 

Generated Summary (few_shot):
 Example: 'The economy is facing challenges due to inflation.' Now summarize the next article:
Machine learning is a subset of artificial intelligence (AI) that allows machines to learn and improve
     from experience without explicit programming.

The next two articles will fo

Summary:
* In this notebook we used different types of prompts to summarize a sentence.
* Here we used GPT2 pretrained model to perform the task.
* Evaluated the prompts by using ROUGE and BLUE scores.
* By comparing the scores the few_shot prompt score is the best among the others with the BLUE score of 0.05036014288593501.
