<a name='1'></a>
## 1 - Set up Kernel and Required Dependencies

Install the required packages to use PyTorch and Hugging Face transformers and datasets.



In [14]:
%pip install --upgrade pip
%pip install --disable-pip-version-check \
    torch==2.0.1 \
    torchdata==0.6.1 --quiet

%pip install \
    transformers==4.27.2 \
    datasets==2.14.4  --quiet

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Note: you may need to restart the kernel to use updated packages.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Note: you may need to restart the kernel to use updated packages.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Note: you may need 



Load the datasets, Large Language Model (LLM), tokenizer, and configurator.

In [2]:
from datasets import load_dataset
from create_hed_prompts import *
import llm
import datasets

<a name='2'></a>
## 2 - Load dataset and models
We experiment with using pre-trained Large Language Model (LLM) from Hugging Face (the list of available models in the Hugging Face `transformers` package can be found [here](https://huggingface.co/docs/transformers/index)). 
We will be using examples from dataset constructed from https://github.com/dungscout96/HED-LLM/blob/main/examples.tsv

In [23]:
dataset = create_hugging_dataset()
print(dataset)

Dataset({
    features: ['HED', 'description'],
    num_rows: 2
})


Print a couple of dialogues with their baseline summaries.

In [24]:
example_indices = [0]

dash_line = '-'.join('' for x in range(100))

for i, index in enumerate(example_indices):
    print(dash_line)
    print('Example ', i + 1)
    print(dash_line)
    print('INPUT HED:')
    print(dataset[index]['HED'])
    print(dash_line)
    print('Description:')
    print(dataset[index]['description'])
    print(dash_line)
    print()

---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
INPUT HED:
(Visual-presentation,(Background-view,Black),(Foreground-view,((Center-of,Computer-screen),(Cross,White)),(Grayscale,(Face,Hair,Image))))
---------------------------------------------------------------------------------------------------
Description:
The visual presentation has a black background view. In its foreground view, the center is associated with a computer screen and there's a white cross. There's also a grayscale element that includes features like a face, hair, and an image.
---------------------------------------------------------------------------------------------------



Load model by giving the index from the provided list

In [25]:
model_choices = ['FLANT5', 'GPT2', 'BART']
choice_idx = 1
model = llm.create_model(model_choices[choice_idx])
print(model.description)


OpenAI GPT-2 model was proposed in Language Models are Unsupervised Multitask Learners by Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei and Ilya Sutskever from OpenAI.
It’s a causal (unidirectional) transformer pretrained using language modeling on a very large corpus of ~40 GB of text data.
    


<a name='3'></a>
## 3 - HED translation with an Instruction Prompt

Prompt engineering is an important concept in using foundation models for text generation. You can check out [this blog](https://www.amazon.science/blog/emnlp-prompt-engineering-is-the-new-feature-engineering) from Amazon Science for a quick introduction to prompt engineering.

<a name='3.1'></a>
### 3.1 - Zero Shot Inference with an Instruction Prompt

In order to instruct the model to perform a task - summarize a dialogue - you can take the dialogue and convert it into an instruction prompt. This is often called **zero shot inference**.  You can check out [this blog from AWS](https://aws.amazon.com/blogs/machine-learning/zero-shot-prompting-for-the-flan-t5-foundation-model-in-amazon-sagemaker-jumpstart/) for a quick description of what zero shot learning is and why it is an important concept to the LLM model.

Choose or create instruction below

In [6]:
instructions = create_instructions()

Option 0:
	Instruction: "Translate the following tagging into sentences assuming that parentheses mean association:"
	Query: "Translation:"


Option 1:
	Instruction: "The following tagging give short hand annotations with parentheses grouping related concepts together:"
	Query: "Convert the tagging into full sentences:"




In [28]:
# Choose from the list of options above by providing the selection index
instruct_idx = 1
instruction, query = instructions[instruct_idx]

# Or uncomment the lines below and provide your own inputs
instruction = "Convert the following HED tags into sentences with correct grammar, no repeating of the input."
query = "Sentences:"

print(f'''Instruction: "{instruction}"''')
print(f'''Query: "{query}"''')

Instruction: "Convert the following HED tags into sentences with correct grammar, no repeating of the input."
Query: "Sentences:"


In [29]:
hed = dataset[0]['HED']
desc = dataset[0]['description']

prompt = f"""
{instruction}

{hed}

{query}
"""

# Input constructed prompt instead of the dialogue.
output = model.decode(prompt)

print(dash_line)
print('Example ', i + 1)
print(dash_line)
print(f'INPUT PROMPT:\n{prompt}')
print(dash_line)
print(f'BASELINE HUMAN DESCRIPTION:\n{desc}')
print(dash_line)    
print(f'MODEL GENERATION - ZERO SHOT:\n{output}\n')
    

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Input 
Convert the following HED tags into sentences with correct grammar, no repeating of the input.

(Visual-presentation,(Background-view,Black),(Foreground-view,((Center-of,Computer-screen),(Cross,White)),(Grayscale,(Face,Hair,Image))))

Sentences:

---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
INPUT PROMPT:

Convert the following HED tags into sentences with correct grammar, no repeating of the input.

(Visual-presentation,(Background-view,Black),(Foreground-view,((Center-of,Computer-screen),(Cross,White)),(Grayscale,(Face,Hair,Image))))

Sentences:

---------------------------------------------------------------------------------------------------
BASELINE HUMAN DESCRIPTION:
The visual presentation has a black background view. In its foreground view, the center is associated with a computer screen and there's a white 

<a name='4'></a>
## 4 - Summarize Dialogue with One Shot and Few Shot Inference

**One shot and few shot inference** are the practices of providing an LLM with either one or more full examples of prompt-response pairs that match your task - before your actual prompt that you want completed. This is called "in-context learning" and puts your model into a state that understands your specific task.  You can read more about it in [this blog from HuggingFace](https://huggingface.co/blog/few-shot-learning-gpt-neo-and-inference-api).

<a name='4.1'></a>
### 4.1 - One Shot Inference

We build a function that takes a list of `example_indices_full`, generates a prompt with full examples, then at the end appends the prompt which you want the model to complete (`example_index_to_summarize`).

In [30]:
example_indices_full = [0]
example_index_to_translate = 1
one_shot_prompt = make_prompt(dataset, example_indices_full, example_index_to_translate, instruction, query)
desc = dataset[example_index_to_translate]['description']

output = model.decode(one_shot_prompt)

print(dash_line)
print(f'BASELINE HUMAN DESCRIPTION:\n{desc}\n')
print(dash_line)
print(f'MODEL GENERATION - ONE SHOT:\n{output}')

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Input 
Convert the following HED tags into sentences with correct grammar, no repeating of the input.

(Visual-presentation,(Background-view,Black),(Foreground-view,((Center-of,Computer-screen),(Cross,White)),(Grayscale,(Face,Hair,Image))))

Sentences:
The visual presentation has a black background view. In its foreground view, the center is associated with a computer screen and there's a white cross. There's also a grayscale element that includes features like a face, hair, and an image.



Convert the following HED tags into sentences with correct grammar, no repeating of the input.

(Foreground-view, ((Item-count, High), Ingestible-object)), (Background-view, ((Human, Body, Agent-trait/Adult), Outdoors, Furnishing, Natural-feature/Sky, Urban, Man-made-object))

Sentences:

---------------------------------------------------------------------------------------------------
BASELINE HUMAN DESCRIPTION:
<Example contributing description>

-------------------------------------------------

<a name='4.2'></a>
### 4.2 - Few Shot Inference

We explore few shot inference by adding two more full HED-description pairs to the prompt.

In [None]:
example_indices_full = [40, 80, 120]
example_index_to_translate = 200

few_shot_prompt = make_prompt(dataset, example_indices_full, example_index_to_translate, instruction, query)

print(few_shot_prompt)

Now pass this prompt to perform a few shot inference:

In [None]:
desc = dataset[example_index_to_translate]['description']

output = model.decode(few_shot_prompt)

print(dash_line)
print(f'BASELINE HUMAN DESCRIPTION:\n{desc}\n')
print(dash_line)
print(f'MODEL GENERATION - FEW SHOT:\n{output}')

<a name='5'></a>
## 5 - Generative Configuration Parameters for Inference

You can change the configuration parameters of the `generate()` method to see a different output from the LLM. So far the only parameter that you have been setting was `max_new_tokens=50`, which defines the maximum number of tokens to generate. A full list of available parameters can be found in the [Hugging Face Generation documentation](https://huggingface.co/docs/transformers/v4.29.1/en/main_classes/text_generation#transformers.GenerationConfig). 

A convenient way of organizing the configuration parameters is to use `GenerationConfig` class. 

**Exercise:**

Change the configuration parameters to investigate their influence on the output. 

Putting the parameter `do_sample = True`, you activate various decoding strategies which influence the next token from the probability distribution over the entire vocabulary. You can then adjust the outputs changing `temperature` and other parameters (such as `top_k` and `top_p`). 

Uncomment the lines in the cell below and rerun the code. Try to analyze the results. You can read some comments below.

In [None]:
generation_config = GenerationConfig(max_new_tokens=50)
# generation_config = GenerationConfig(max_new_tokens=10)
# generation_config = GenerationConfig(max_new_tokens=50, do_sample=True, temperature=0.1)
# generation_config = GenerationConfig(max_new_tokens=50, do_sample=True, temperature=0.5)
# generation_config = GenerationConfig(max_new_tokens=50, do_sample=True, temperature=1.0)

output = model.decode(one_shot_prompt, generation_config)

print(dash_line)
print(f'MODEL GENERATION - FEW SHOT:\n{output}')
print(dash_line)
print(f'BASELINE HUMAN DESCRIPTION:\n{summary}\n')

Comments related to the choice of the parameters in the code cell above:
- Choosing `max_new_tokens=10` will make the output text too short, so the dialogue summary will be cut.
- Putting `do_sample = True` and changing the temperature value you get more flexibility in the output.

As you can see, prompt engineering can take you a long way for this use case, but there are some limitations. Next, you will start to explore how you can use fine-tuning to help your LLM to understand a particular use case in better depth!