# <center> Generative AI with LLM - Understanding Prompt Engineering with Summarization </center>

# Table of Contents



- [ 1 - Set up  Required Dependencies](#1)
- [ 2 - Summarize News article without Prompt Engineering](#2)
- [ 3 - Summarize News article with an Instruction Prompt](#3)
  - [ 3.1 - Zero Shot Inference with an Instruction Prompt](#3.1)
  - [ 3.2 - Zero Shot Inference with the Prompt Template from Zephyr model](#3.2)
- [ 4 - Summarize News article with One Shot and Few Shot Inference](#4)
  - [ 4.1 - One Shot Inference](#4.1)
  - [ 4.2 - Few Shot Inference](#4.2)
- [ 5 - Generative Configuration Parameters for Inference](#5)


<a name='1'></a>
## 1 - Set up Required Dependencies

In [1]:
%pip install --upgrade pip
%pip install --disable-pip-version-check \
    torch==1.13.1 \
    torchdata==0.5.1 --quiet

%pip install -U transformers
%pip install -U datasets

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Collecting transformers
  Using cached transformers-4.36.2-py3-none-any.whl.metadata (126 kB)
Collecting tokenizers<0.19,>=0.14 (from transformers)
  Downloading tokenizers-0.15.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)
Downloading transformers-4.36.2-py3-none-any.whl (8.2 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.2/8.2 MB[0m [31m23.1 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hDownloading tokenizers-0.15.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.8/3.8 MB[0m [31m49.0 MB/s[0m eta [36m0:00:00[0m00:01[0m:00:01[0m
[?25hInstalling collected packages: tokenizers, transformers
  Attempting uninstall: tokenizers
    Found existing installation: tokenizers 0.13.3
    Uninstalling tokenizers-0.

In [2]:
import torch
from datasets import load_dataset
from transformers import pipeline
from transformers import AutoTokenizer
from transformers import GenerationConfig



<a name='2'></a>
## 2 - Summarize News article without Prompt Engineering

In this use case, you will be generating a summary of a news aritcle with the pre-trained Large Language Model (LLM) Zephyr-7b from Hugging Face. The list of available models in the Hugging Face `transformers` package can be found [here](https://huggingface.co/docs/transformers/index). 

Let's upload some news aritcle from the [News-Sum](https://huggingface.co/datasets/glnmario/news-qa-summarization) Hugging Face dataset. This dataset contains 10,000+ news article with the corresponding manually labeled summaries, questions and answers on the story of the artcle. 

In [3]:
huggingface_dataset_name = "glnmario/news-qa-summarization"

dataset = load_dataset(huggingface_dataset_name, data_files="data.jsonl")

In [4]:
example_indices = [90, 270]
dash_line = '-'.join('' for x in range(100))

for i, index in enumerate(example_indices):
    print(dash_line)
    print('Example ', i + 1)
    print(dash_line)
    print('INPUT DIALOGUE:')
    print(dataset['train'][index]['story'])
    print(dash_line)
    print('BASELINE HUMAN SUMMARY:')
    print(dataset['train'][index]['summary'])
    print(dash_line)
    print('QUESTION:')
    print(dataset['train'][index]['questions'])
    print('ANSWER:')
    print(dataset['train'][index]['answers'])
    print()

---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
INPUT DIALOGUE:
(CNN)  -- A 47-year-old woman who became paralyzed after breaking her neck and back on a turbulent flight is developing some motion in her toes and regaining some sensation after two operations, her doctor said Wednesday. Dr. Trey Fulp, an orthopedic spine surgeon who performed the surgeries at McAllen Medical Center in McAllen, Texas, told CNN that the woman initially was paralyzed from the chest down. She underwent six hours of surgery Saturday and a more than five-hour operation late Tuesday, the surgeon said. "She is very brave and is talking," Fulp said. "If she walks again, I get the first dance." The woman was on Continental Flight 511 en route from Houston, Texas, to McAllen early Saturday, a one-hour trip that had been delayed more than three hours becau

Load the [ZEPHYR-7B model](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta), creating an instance of the model class.

In [5]:
model_name='HuggingFaceH4/zephyr-7b-beta'

model = pipeline("text-generation", model=model_name, torch_dtype=torch.bfloat16, device_map="auto")

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/8 [00:00<?, ?it/s]

model-00001-of-00008.safetensors:   0%|          | 0.00/1.89G [00:00<?, ?B/s]

model-00002-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00003-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00004-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00005-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00006-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00007-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00008-of-00008.safetensors:   0%|          | 0.00/816M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.43k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/168 [00:00<?, ?B/s]

To perform encoding and decoding, you need to work with text in a tokenized form. **Tokenization** is the process of splitting texts into smaller units that can be processed by the LLM models. 

Now it's time to explore how well the base LLM summarizes a story without any prompt engineering. **Prompt engineering** is an act of a human changing the **prompt** (input) to improve the response for a given task.

Things to Note
* Use the templte of the model. Here, we have acheived by using 'apply_chat_template' from the model tokenizer as mentioned in model card


In [21]:
# We use the tokenizer's chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating
story = dataset['train'][40]['story']
messages = [
    {
        "role": "system",
        "content": "You are a news article summarizer.",
    },
    {"role": "user", "content": f"{story}"},
]
prompt = model.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
print('MODEL PROMT:',prompt)
print(dash_line)
outputs = model(prompt, max_new_tokens=60)
print('MODEL GENERATION - WITHOUT PROMPT ENGINEERING:')
print(outputs[0]["generated_text"])
print(dash_line)
print("BASELINE HUMAN SUMMARY:")
print(dataset['train'][40]['summary'])

MODEL PROMT: <|system|>
You are a news article summarizer.</s>
<|user|>
(Budget Travel) -- For many travelers, duty-free is a luxurious enigma wrapped up in discounted Swiss chocolate and soaked in tax-free vodka. Duty-free goods are mostly sold inside international airport terminals, ferry stations, cruise ports, and border stops. 

Duty-free shops sell products without local import tax. 

As the name implies, duty-free shops sell products without duty (a.k.a. local import tax). For example, by buying goods in a duty-free shop at Paris's Charles de Gaulle, you avoid paying the duty that France slaps on imported goods (like Swedish vodka) and that French stores ordinarily include as part of a product's list price. 

In Europe, there's a bonus perk: Duty-free shops in airports and ports are "tax-free shops," too, which means you are spared the value added tax (or V.A.T., a type of sales tax) that would otherwise be included in the price of goods sold elsewhere in the European Union. Tha

<a name='3'></a>
## 3 - Summarize news aritcle with an Instruction Prompt

Prompt engineering is an important concept in using foundation models for text generation. You can check out [this blog](https://www.amazon.science/blog/emnlp-prompt-engineering-is-the-new-feature-engineering) from Amazon Science for a quick introduction to prompt engineering.

<a name='3.1'></a>
### 3.1 - Zero Shot Inference with an Instruction Prompt

In order to instruct the model to perform a task - summarize a dialogue - you can take the dialogue and convert it into an instruction prompt. This is often called **zero shot inference**.  You can check out [this blog from AWS](https://aws.amazon.com/blogs/machine-learning/zero-shot-prompting-for-the-flan-t5-foundation-model-in-amazon-sagemaker-jumpstart/) for a quick description of what zero shot learning is and why it is an important concept to the LLM model.

Wrap the dialogue in a descriptive instruction and see how the generated text will change:

In [None]:
# We use the tokenizer's chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating
story = dataset['train'][40]['story']
messages = [
    {
        "role": "system",
        "content": """You are a news article summarizer, 
        your role is to read the content delimited by triple backticks 
        and undersatnd the important keypoints of the story and summarize them into 
        three sentences such that the complete gist of the story is delivered. 
        You need to follow the instructions under <Note> section.
        <Note>:
        The summary should not contain incomplete sentences.
        """,
    },
    {"role": "user", "content": f"```{story}```"},
]
prompt = model.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

outputs = model(prompt, max_new_tokens=60)
print('MODEL GENERATION - WITHOUT PROMPT ENGINEERING:')
print(outputs[0]["generated_text"])
print(dash_line)
print("BASELINE HUMAN SUMMARY:")
print(dataset['train'][40]['summary'])

<a name='4'></a>
## 4 - Summarize News article with One Shot and Few Shot Inference

**One shot and few shot inference** are the practices of providing an LLM with either one or more full examples of prompt-response pairs that match your task - before your actual prompt that you want completed. This is called "in-context learning" and puts your model into a state that understands your specific task.  You can read more about it in [this blog from HuggingFace](https://huggingface.co/blog/few-shot-learning-gpt-neo-and-inference-api).

<a name='4.1'></a>
### 4.1 - One Shot Inference

Let's build a function that takes a list of `example_indices_full`, generates a prompt with full examples, then at the end appends the prompt which you want the model to complete (`example_index_to_summarize`).  You will use the same FLAN-T5 prompt template from section [3.2](#3.2).

In [34]:
def make_messages(example_indices_full, example_index_to_summarize):
    
    content = """You are a news article summarizer, 
        your role is to understand the content delimited by triple backticks 
        and summarize into three complete sentences such that the complete gist of the story is delivered. 
        You need to follow the instructions under <Note> section. 
        <Note>:
        The summary should not contain incomplete sentences.
        <Example>:
        """
    for index in example_indices_full:
        story = dataset['train'][index]['story']
        summary = dataset['train'][index]['summary']
        content += f"""
        Story: 
        ```{story}```
        Summary:
        {summary}
        """
    story_to_summraize = dataset['train'][example_index_to_summarize]['story']
    messages= [
        {
            "role": "system",
            "content":f"{content}"
        },
        {
            "role": "user", "content": f"```{story_to_summraize}```"
        }
    ]
    return messages
    
    

In [35]:
# We use the tokenizer's chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating
messages = make_messages(example_indices_full= [40], example_index_to_summarize= 270)
prompt = model.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

outputs = model(prompt, max_new_tokens=60)
print('MODEL GENERATION - ONE SHOT INFERENCE:')
print(outputs[0]["generated_text"])
print(dash_line)
print("BASELINE HUMAN SUMMARY:")
print(dataset['train'][270]['summary'])

MODEL GENERATION - ONE SHOT INFERENCE:
<|system|>
You are a news article summarizer, 
        your role is to understand the content delimited by triple backticks 
        and summarize into three complete sentences such that the complete gist of the story is delivered. 
        You need to follow the instructions under <Note> section. 
        <Note>:
        The summary should not contain incomplete sentences.
        <Example>:
        
        Story: 
        ```(Budget Travel) -- For many travelers, duty-free is a luxurious enigma wrapped up in discounted Swiss chocolate and soaked in tax-free vodka. Duty-free goods are mostly sold inside international airport terminals, ferry stations, cruise ports, and border stops. 

Duty-free shops sell products without local import tax. 

As the name implies, duty-free shops sell products without duty (a.k.a. local import tax). For example, by buying goods in a duty-free shop at Paris's Charles de Gaulle, you avoid paying the duty that France

**The model still seems to struggle with the completion of summarization.****

<a name='4.2'></a>
### 4.2 - Few Shot Inference

Let's explore few shot inference by adding two more full story-summary pairs to your prompt.

In [36]:
example_indices_full = [90, 80, 120]
example_index_to_summarize = 270

messages = make_messages(example_indices_full= example_indices_full, 
                         example_index_to_summarize= example_index_to_summarize)
prompt = model.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

outputs = model(prompt, max_new_tokens=60)
print('MODEL GENERATION - FEW SHOT INFERENCE:')
print(outputs[0]["generated_text"])
print(dash_line)
print("BASELINE HUMAN SUMMARY:")
print(dataset['train'][example_index_to_summarize]['summary'])

MODEL GENERATION - FEW SHOT INFERENCE:
<|system|>
You are a news article summarizer, 
        your role is to understand the content delimited by triple backticks 
        and summarize into three complete sentences such that the complete gist of the story is delivered. 
        You need to follow the instructions under <Note> section. 
        <Note>:
        The summary should not contain incomplete sentences.
        <Example>:
        
        Story: 
        ```(CNN)  -- A 47-year-old woman who became paralyzed after breaking her neck and back on a turbulent flight is developing some motion in her toes and regaining some sensation after two operations, her doctor said Wednesday. Dr. Trey Fulp, an orthopedic spine surgeon who performed the surgeries at McAllen Medical Center in McAllen, Texas, told CNN that the woman initially was paralyzed from the chest down. She underwent six hours of surgery Saturday and a more than five-hour operation late Tuesday, the surgeon said. "She is ve

**More examples made the model to undersatnd the job and complete the summarization better****

<a name='5'></a>
## 5 - Generative Configuration Parameters for Inference

You can change the configuration parameters of the `generate()` method to see a different output from the LLM. So far the only parameter that you have been setting was `max_new_tokens=50`, which defines the maximum number of tokens to generate. A full list of available parameters can be found in the [Hugging Face Generation documentation](https://huggingface.co/docs/transformers/v4.29.1/en/main_classes/text_generation#transformers.GenerationConfig). 

A convenient way of organizing the configuration parameters is to use `GenerationConfig` class. 

In this case, i can set them with the model config 

Putting `do_sample = True` and changing the temperature value you get more flexibility in the output

In [37]:
example_indices_full = [90, 80, 120]
example_index_to_summarize = 270

messages = make_messages(example_indices_full= example_indices_full, 
                         example_index_to_summarize= example_index_to_summarize)
prompt = model.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

outputs = model(prompt, max_new_tokens=60,do_sample=True)
print('MODEL GENERATION - FEW SHOT INFERENCE:')
print(outputs[0]["generated_text"])
print(dash_line)
print("BASELINE HUMAN SUMMARY:")
print(dataset['train'][example_index_to_summarize]['summary'])

MODEL GENERATION - FEW SHOT INFERENCE:
<|system|>
You are a news article summarizer, 
        your role is to understand the content delimited by triple backticks 
        and summarize into three complete sentences such that the complete gist of the story is delivered. 
        You need to follow the instructions under <Note> section. 
        <Note>:
        The summary should not contain incomplete sentences.
        <Example>:
        
        Story: 
        ```(CNN)  -- A 47-year-old woman who became paralyzed after breaking her neck and back on a turbulent flight is developing some motion in her toes and regaining some sensation after two operations, her doctor said Wednesday. Dr. Trey Fulp, an orthopedic spine surgeon who performed the surgeries at McAllen Medical Center in McAllen, Texas, told CNN that the woman initially was paralyzed from the chest down. She underwent six hours of surgery Saturday and a more than five-hour operation late Tuesday, the surgeon said. "She is ve

In [38]:
example_indices_full = [90, 80, 120]
example_index_to_summarize = 270

messages = make_messages(example_indices_full= example_indices_full, 
                         example_index_to_summarize= example_index_to_summarize)
prompt = model.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

outputs = model(prompt, max_new_tokens=60,do_sample=True, temperature=0.5)
print('MODEL GENERATION - FEW SHOT INFERENCE:')
print(outputs[0]["generated_text"])
print(dash_line)
print("BASELINE HUMAN SUMMARY:")
print(dataset['train'][example_index_to_summarize]['summary'])



MODEL GENERATION - FEW SHOT INFERENCE:
<|system|>
You are a news article summarizer, 
        your role is to understand the content delimited by triple backticks 
        and summarize into three complete sentences such that the complete gist of the story is delivered. 
        You need to follow the instructions under <Note> section. 
        <Note>:
        The summary should not contain incomplete sentences.
        <Example>:
        
        Story: 
        ```(CNN)  -- A 47-year-old woman who became paralyzed after breaking her neck and back on a turbulent flight is developing some motion in her toes and regaining some sensation after two operations, her doctor said Wednesday. Dr. Trey Fulp, an orthopedic spine surgeon who performed the surgeries at McAllen Medical Center in McAllen, Texas, told CNN that the woman initially was paralyzed from the chest down. She underwent six hours of surgery Saturday and a more than five-hour operation late Tuesday, the surgeon said. "She is ve

**top_k and top_p**
* Top-k restricts the model to choose the next word only from top k probabilities in the model softmax output.
* Top-p restricts the model to choose the next word only from the model output probabilities that sums up to p.


In [39]:
example_indices_full = [90, 80, 120]
example_index_to_summarize = 270

messages = make_messages(example_indices_full= example_indices_full, 
                         example_index_to_summarize= example_index_to_summarize)
prompt = model.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

outputs = model(prompt, max_new_tokens=60,do_sample=True, temperature=0.5, top_k=50, top_p=0.95)
print('MODEL GENERATION - FEW SHOT INFERENCE:')
print(outputs[0]["generated_text"])
print(dash_line)
print("BASELINE HUMAN SUMMARY:")
print(dataset['train'][example_index_to_summarize]['summary'])

MODEL GENERATION - FEW SHOT INFERENCE:
<|system|>
You are a news article summarizer, 
        your role is to understand the content delimited by triple backticks 
        and summarize into three complete sentences such that the complete gist of the story is delivered. 
        You need to follow the instructions under <Note> section. 
        <Note>:
        The summary should not contain incomplete sentences.
        <Example>:
        
        Story: 
        ```(CNN)  -- A 47-year-old woman who became paralyzed after breaking her neck and back on a turbulent flight is developing some motion in her toes and regaining some sensation after two operations, her doctor said Wednesday. Dr. Trey Fulp, an orthopedic spine surgeon who performed the surgeries at McAllen Medical Center in McAllen, Texas, told CNN that the woman initially was paralyzed from the chest down. She underwent six hours of surgery Saturday and a more than five-hour operation late Tuesday, the surgeon said. "She is ve

**we can see this has confused the output of the model and summarization is not better.**

In [41]:
example_indices_full = [90, 80, 120]
example_index_to_summarize = 270

messages = make_messages(example_indices_full= example_indices_full, 
                         example_index_to_summarize= example_index_to_summarize)
prompt = model.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

outputs = model(prompt, max_new_tokens=60,do_sample=True, temperature=0.5, top_k=50)
print('MODEL GENERATION - FEW SHOT INFERENCE:')
print(outputs[0]["generated_text"])
print(dash_line)
print("BASELINE HUMAN SUMMARY:")
print(dataset['train'][example_index_to_summarize]['summary'])

MODEL GENERATION - FEW SHOT INFERENCE:
<|system|>
You are a news article summarizer, 
        your role is to understand the content delimited by triple backticks 
        and summarize into three complete sentences such that the complete gist of the story is delivered. 
        You need to follow the instructions under <Note> section. 
        <Note>:
        The summary should not contain incomplete sentences.
        <Example>:
        
        Story: 
        ```(CNN)  -- A 47-year-old woman who became paralyzed after breaking her neck and back on a turbulent flight is developing some motion in her toes and regaining some sensation after two operations, her doctor said Wednesday. Dr. Trey Fulp, an orthopedic spine surgeon who performed the surgeries at McAllen Medical Center in McAllen, Texas, told CNN that the woman initially was paralyzed from the chest down. She underwent six hours of surgery Saturday and a more than five-hour operation late Tuesday, the surgeon said. "She is ve

**Without the top-p value as the model has used top 50 words based on probability weighting, the summarization has given more information from the previous.**

As you can see, prompt engineering can take you a long way for this use case, but there are some limitations. 