<a href="https://colab.research.google.com/github/DeKUT-DSAIL/DSA-2024-NLP/blob/main/main-lab/prompt_engineering.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## DSA 2024 - NLP Lab Session

### Part 3: Prompt Engineering with LLaMA-2

Prompt engineering is the discipline of developing and optimising prompts to effectively use large language models (LLMs) to achieve desired outputs for a wide variety of applications, including research. By developing prompt engineering skills, we are enabled to better understand the capabilities and limitations of these LLMs.

The key aspects of prompt engineering include, but are not limited to:
* Crafting clear prompts: The model's output is significantly affected by the model's output. To get accurate and relevant responses, prompts should be clear, concise, and specific.
* Providing context: Prompts that include sufficient context within them help the models understand the background and generate more informed responses. Contexts can involve giving background information, setting the scene, or specifying the desired format of the answer.
* Iterative refinement: Prompt engineering is often an iterative process where initial prompts are continuously adjusted and refined to improve the quality of the response.
* Instruction precision: Explicity stating what you want from the model can dramatically improve outcomes. Using words like "list", "describe", etc. help guide the model more effectively.
* Balancing length and detail: Although detailed prompts can provide more guidance, overly long prompts tend to confuse the model. Striking a balance between providing enough details and maintaining brevity is important.
* Leveraging special tokens: Some models allow the use of special tokens or specific structures to control responses, such as separators or format indicators. `LLaMA-2` is one such model.

#### Prerequisites
This lab is targeted at Python developers who have some familiarity with LLMs, such as by using ChatGPT or Gemini, but have limited experience in working with LLMs in a programmatic way.

If you're familiar with the underpinnings of LLMs, you'll have a slight advantage. However, familiarity with basic Python and a basic understanding of LLMs will be sufficient to help you get a lot out of this course.

For this lab, we shall be working with the `LLaMA-2` model available at [HuggingFace](https://huggingface.co/TheBloke/Llama-2-13B-chat-GPTQ). In order to download this model for use in this notebook, you will need to install the [transformers](https://pypi.org/project/transformers/) package. Not to worry, the steps for installing it are baked into this Colab notebook and you will not need to take any extra steps, except if you choose to run the notebook locally.

**Note:** If you have trouble getting the code to work or if it's taking too long, you can consider copy-pasting the prompts on ChatGPT web GUI from OpenAI - [link here](https://chat.openai.com). You can create a free account or login with your Google credentials. 

#### Learning Objectives
1. Use a `transformers` pipeline to generate responses from a LLaMA-2 LLM.
2. Iteratively write precise prompts to get the desired output from the LLM.
3. Work with the LLaMA-2 prompt template to perform instruction fine-tuning.
4. Use LLaMA-2 to generate JSON data for potential use in downstream processing tasks


Let's get cracking.

In [None]:
# Run this cell to prevent code cell outputs from overflowing.
from IPython.display import HTML, display

def set_css():
    display(HTML('''
        <style>
            pre {
                white-space: pre-wrap;
            }
        </style>
    '''))

get_ipython().events.register('pre_run_cell', set_css)

First, let's ignore the warrnings to keep the cell outputs nice and clean

In [None]:
import warnings
warnings.filterwarnings("ignore")

def warn(*args, **kwargs):
    pass
warnings.warn = warn

### Setup

Let's get started by setting up our environment by installing the `transformers` package and importing the necessary packages.

Remember to change the runtime type on Colab to GPU by following these steps:
* On the menu bar, click on `Runtime`.
* Click `change runtime type`.
* In the `Hardware accelerator` radio options, select `T4 GPU`.
* Click `save`.



In [None]:
! pip install transformers
! pip install accelerate
! pip install optimum
! pip install auto-gptq

In [None]:
import transformers
import auto_gptq
import accelerate
import optimum



In [None]:
import json
from transformers import pipeline
# import accelerate

## 1. Task One: Iterative Prompt Refinement
### Create LLaMA-2 Pipeline

In [None]:
model = "TheBloke/Llama-2-13B-chat-GPTQ"
llama_pipe = pipeline("text-generation", model=model, device_map="auto")

#### A Helper Function

In [None]:
def generate(prompt, max_length=1024, pipe=llama_pipe, **kwargs):
    """
    This function takes a prompt and passes it to the model e.g. LLaMA and returnss the response from the model

    Parameters:
    @param prompt (str): The input text prompt to generate a response for.
    @param max_length (int): The maximum length, in tokens, of the generated response.
    @param pipe (callable): The model's pipeline function used for generation.
    @param **kwargs: Additional keyword arguments that are passed to the pipeline function.

    Returns:
    str: The generated text response from the model, trimmed of leading and trailing whitespace.

    Example usage:
    ```
    prompt_text = "Explain the theory of relativity."
    response = generate(prompt_text, max_length=512, pipe=my_custom_pipeline, temperature=0.7)
    print(response)
    ```
    """

    def_kwargs = dict(return_full_text=False, return_dict=False)
    response = pipe(prompt.strip(), max_length=max_length,  truncation=True, **kwargs, **def_kwargs)
    return response[0]['generated_text'].strip()

**First prompt: Capital of California**

We shall begin with a very simple prompt and pass it to the `LLaMA-2` model. The desired outcome is that the model responds to us with only the name of the capital of California, which is *Sacramento*, with nothing else in the response.

In [None]:
prompt = "What is the capital of California?"

print(generate(prompt))

Answer: The capital of California is Sacramento.


The model responds back with additional helpful information about the city of Sacramento. We are not interested in this additional information. We want the name of the city and no additional context, so let's craft a prompt that is more **specific**.

In [None]:
prompt = "What is the capital of California? Only answer this question and do so in as few a words as possible."

print(generate(prompt))

Answer: Sacramento


That was a better response, but we still get a leading `Answer:` in the reponse. We can prevent this behaviour by providing the model with the **cue** `Answer:`. Doing this can prevent the model from providing the text itself.

In [None]:
prompt = "What is the capital of California? Only answer this question and do so in as few a words as possible. Answer: "

print(generate(prompt))

Sacramento.


**Second prompt: Vowels in 'Sacramento'**

In this part of the notebook, we ask the model to do somethig more complicated i.e., tell us the vowes found in the name of the capital of California.

We know the correct answer is S**a**cr**a**m**e**nt**o** -> **aaeo** -> **aeo**. Notice that, in order to arrive at the correct answer, you probably had to perform the task in multiple steps.

In [None]:
prompt = "Tell me the vowels in the capital of California."

print(generate(prompt))

Answer: The vowels in the capital of California are "A" and "E".


When LLMs are faced with the need to reason in a way that requires multiple steps, it is often helpful to craft a prompt instructing the model to perform multiple intermediary steps, like asking it to show its working. This technique is often described as giving the model **"time to think"**.

Let's now craft a new prompt asking the model to take the intermediate step of identifying the capital of Kenya before identifying the vowels in it.

In [None]:
prompt = "Tell me the capital of California, and then tell me all the vowels in it."

print(generate(prompt))

I'll start: the capital of California is Sacramento.

Now, the vowels in Sacramento are a, e, and o.


We can see that giving the model **time to think** made a big difference. Armed with this new technique, let's ask the model to do something more complicated - tell the vowels in the name of the capital of Kenya in reverse alphabetical order.

The correct answer is S**a**cr**a**m**e**nt**o** -> **aeo** -> **oea**

In [None]:
prompt = "Tell me the vowels in the capital of California in reverse alphabetical order?"

print(generate(prompt))

I'm thinking... uh... oh, I know this one! The vowels in the capital of California in reverse alphabetical order are... e-a-o-u!


Obviously, we did not give the model **time to think**. Let's ask it to break the task down to intermediate steps and show its work.

In [None]:
prompt = "Tell me the capital of California, and then tell me all the vowels in it, then tell me the vowels in reverse-alphabetical order."

print(generate(prompt))

I'm not sure if you're aware, but the capital of California is Sacramento.

Now, let's get to the vowels. The vowels in Sacramento are A, E, and O.

Now, let's put them in reverse alphabetical order. The reverse alphabetical order of the vowels in Sacramento is O, E, and A.


## 2. Task Two: Using the LLaMA-2 Prompt Template
In this section, we shall use the LLaMA-2 model to perform sentiment analysis on textual reviews and generate data for downstream tasks. You will learn how to use **few-shot learning** to improve the accuracy of the model output by providing it with instructive examples.

### Data - Nyota Ndogo Bike Reviews

The following are customer reviews for the Nyota Ndogo bicycle offered by a fictitious bicycle company known as Nyota Bicycles. We shall be asking the LLM to **sentiment analysis** of these reviews as well as generate data for downstream use.

In [None]:
review = """
I recently purchased the Nyota Ndogo from Nyota Bicycles, and I've been thoroughly impressed. \
The ride is smooth and it handles urban terrains with ease. \
However, I did find the seat a bit uncomfortable for longer rides. \
Also, the color options could be better. Despite these minor drawbacks, \
the build quality and the performance of the bike are commendable. It's a good value for the money.
"""

In [None]:
review_negative = """
Got the Nyota Ndogo last week, and I'm a bit disappointed. \
The brakes are not as responsive as I'd like and the gears often get stuck. \
The design is good but performance-wise, it leaves much to be desired.
"""

In [None]:
review_positive = """
I recently purchased the Nyota Ndogo from Nyota Bicycles, and I've been thoroughly impressed. \
The ride is smooth and it handles urban terrains with ease. \
The seat was very comfortable for longer rides and the color options were great. \
The build quality and the performance of the bike are commendable. It's a good value for the money.
"""

#### Sentiment Analysis

We will ask the model to perform sentiment analysis by telling us the overall sentiment of one of the reviews. The model should output a response with a single word telling, either `positive`, `negative`, or `neutral`.

**NOTE:** The following code cell will take too much time (~20 mins) to run than we have for this lab. So, leave commented out and then you can run it at your own free time. For now, just know that the model's output will be a bunch of **junk**.

In [23]:
# prompt = f"""
# What is the overall sentiment of {review}
# """

# print(generate(prompt))

It's not at all clear why the model gave us such junk as a response to the prompt above, but that is why we should work on prompts iteratively and **precision** when crafting prompts is required. Let's add a `?` to the end of the prompt, which should make more clear to the model that we are asking it a question we are hoping to get a response to.

In [24]:
# this cell will take a little over 2 minutes to run
prompt = f"""
What is the overall sentiment of {review}?
"""

print(generate(prompt))

You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset


The overall sentiment of the text is positive. The reviewer mentions that they are "thoroughly impressed" with the bike and that it handles urban terrains with ease. However, they also mention a few minor drawbacks, such as the seat being uncomfortable for longer rides and the color options being limited. Despite these issues, the reviewer concludes that the bike is a good value for the money.


As you can see, the `?` made a huge difference! This example reminds us that minor tweaks to prompts can sometimes lead to drastic changes in the model's responses.

Given that our goal here is to get a single word response back from the model, let's make the prompt more **specific** about the response we would like.

In [25]:
# this cell will take a little over 2 minutes to run
prompt = f"""
What is the overall sentiment of this review {review}?

Choose one of "positive", "negative", or "neutral".
"""

print(generate(prompt))

This review is positive. The reviewer mentions that the ride is smooth, the bike handles urban terrains with ease, and the build quality is commendable. They also mention that the bike is a good value for the money. However, they do mention that the seat can be uncomfortable for longer rides and that the color options could be better. Overall, the review is positive because the reviewer has more positive comments than negative ones.


Although the model's reponse is helpful, it is more than just the single word response we are aiming for. Let's add a **cue** to the prompt as we have done before and check its response.

In [26]:
# adding the cue made the model faster at giving its output
prompt = f"""
What is the overall sentiment of this review {review}?

Choose one of "positive", "negative", or "neutral".
Overall Sentiment:
"""

print(generate(prompt))

Positive


That's much better. But we expect this review to be categorised as `neutral` rather than `positive`. Here is the prompt again for reference.

In [27]:
review = """
I recently purchased the Nyota Ndogo from Nyota Bicycles, and I've been thoroughly impressed. \
The ride is smooth and it handles urban terrains with ease. \
However, I did find the seat a bit uncomfortable for longer rides. \
Also, the color options could be better. Despite these minor drawbacks, \
the build quality and the performance of the bike are commendable. It's a good value for the money.
"""

Let's make a minor change to the prompt that will lead to drastic changes in the model's response. We do so by modifying the options that model has to choose from.

In [28]:
# this cell will take a little over 1 minute to run
prompt = f"""
What is the overall sentiment of this review {review}?

Choose one of "neutral", "negative", or "positive".
Overall Sentiment:
"""

print(generate(prompt))

Neutral. The reviewer has mixed feelings about the bike. They like the smooth ride and the bike's ability to handle urban terrain, but they found the seat uncomfortable and the color options limited.


Before we can conclude this experiment, let's change the order of the options one more time.

In [29]:
prompt = f"""
What is the overall sentiment of this review {review}?

Choose one of "negative", "neutral", or "positive".
Overall Sentiment:
"""

print(generate(prompt))

Positive


Three different orders of the options and three different responses. That's not very helpful. We conclude that our prompt does not currently give us confidence that we will get meaningful responses from the model.

In order to get more reliable, trustworthy responses, let's turn our attention to an important technique, **few-shot learning**, which will allow us to provide instructive examples to the model about how it ought to behave.

#### **Few-Shot Learning - Providing Examples**

Depending on the number of examples given, this technique can be called **one-shot learning**, **two-shot learning**, **three-shot learning**, or **many-shot learning**. A **shot** is an example prompt/response pair provided to the model to help guide its behaviour.

These shots are typically prepended to the main prompt we wish the model to generate a response for. Depending on the model being used, there are specific ways to format our shots that will help the model understand that what we are providing it are prompt/response examples.

On Hugging Face, there are models of the `-chat` variant, such as the `Llama-2-13b-chat` that we have been using. These models have had additional training on top of the base models e.g. `Llama-2-13b` that makes them better at following instructions in support of their use in chat applications. They are said to have been **instruction fine-tuned**.

Depending on the model, the prommpt/response pairs will be formatted in different ways using a training template known as a **prompt template**, typically found in the model's documentation. Here is simplified version of the of the **Llama-2 prompt template** we shall be using shortly.

```python
<s>[INST] {{ user_msg_1 }} [/INST] {{ model_answr_1 }} </s>
```

Let's disect this **prompt template**:
* A single user/model interaction is contained between the `<s>` and `</s>` tags.
* The user part of the user/model interaction is contained between the `[INST]` and `[/INST]` tags.
* The model part of the user/model interaction follows the `[/INST]` tag and ends at the interaction-concluding `</s>` tag.

This template is used during **instruction fine-tuning** and we can use it to provide our own instructive examples to the model for how it ought to behave. We shall make our work easier by creating a helper function.

#### A Helper Function

In [30]:
def prompt_with_examples(prompt, examples=[]):
    """
    This function takes an initial prompt and a list of example prompt/response pairs, then
    formats them into a single string according to the model's prompt template.
    Each example is included in the final prompt, which could be beneficial for models that
    take into account the context provided by examples.

    Parameters:
    @param prompt (str): The main prompt to be processed by the language model.
    @param examples (list of tuples): A list where each tuple contains a pair of strings
      (example_prompt, example_response). Default is an empty list.

    Returns:
    str: A string with the structured prompt and examples formatted for a language model.

    Example usage:
    ```
    main_prompt = "Translate the following sentence into French:"
    example_pairs = [("Hello, how are you?", "Bonjour, comment ça va?"),
                     ("Thank you very much!", "Merci beaucoup!")]
    formatted_prompt = prompt_with_examples(main_prompt, example_pairs)
    print(formatted_prompt)
    ```
    """

    # Start with the initial part of the prompt
    full_prompt = "<s>[INST]\n"

    # Add each example to the prompt
    for example_prompt, example_response in examples:
        full_prompt += f"{example_prompt} [/INST] {example_response} </s><s>[INST]"

    # Add the main prompt and close the template
    full_prompt += f"{prompt} [/INST]"

    return full_prompt

#### One-Shot Learning Example

Let's briefly step away from our sentiment analysis task, and use a simple text generation prompt to explore how we can use the `prompt_with_examples` function to provide an instructive example, or put another way, perform **one-shot learning**.



In [31]:
example_prompt = "Give me an all uppercase color that starts with the letter 'O'."
example_response = "ORANGE"

example_1 = (example_prompt, example_response)
examples = [example_1]
prompt = "Give me an all uppercase color that starts with the letter 'P'."

prompt_with_one_example = prompt_with_examples(prompt, examples)

In [32]:
print(prompt_with_one_example)

<s>[INST]
Give me an all uppercase color that starts with the letter 'O'. [/INST] ORANGE </s><s>[INST]Give me an all uppercase color that starts with the letter 'P'. [/INST]


`prompt_with_one_example` above includes a single user/model interaction (`<s>...</s>`), using the LLaMA-2 **prompt template**, prepended to the main prompt. Note that the main prompt only includes the user part of the interaction (between the `[INST]` and `[/INST]` tags) and leaves the rest of the interaction (the model's response and the `</s>` tag) for the model to complete.

Before using `prompt_with_one_example` let's see what kind of response we get back from the model passing in just the main prompt, without an instructive example.

In [33]:
print(generate(prompt))

Answer: Sure! Here's an all uppercase color that starts with the letter 'P':

PURPLE


Although we got `PURPLE` we also got additional chat-like response prior to the output that we want.

Now let's get a response using `prompt_with_one_example`, which contains an example of how the model should response with only the word for the color we are interested in it generating.

In [34]:
print(generate(prompt_with_one_example))

PURPLE


#### Sentiment Analysis with Examples

Previously we lacked confidence about whether the model would correctly label a neutral review. Let's now apply what we just learned about **one-shot learning** to provide our model with an instructive example of responding to what, as a human, we would consider a neutral review.

Here is an example we would like classified as a neutral review, which while clearly not negative contains both positive and negative sentiments about the bike.

In [35]:
example_neutral_review = """
I've had the chance to put several miles on my new Nyota Ndogo from Nyota Bicyles.
First off, the bike's design is sleek, and it provides an exceptionally stable ride,
even when navigating the bustle of city streets. The gear shifting is fluid,
and the bike feels robust, promising longevity. On the downside, the braking system,
while reliable, lacks the responsiveness I've experienced with other bikes.
I also noticed that the handlebar grips can become rather uncomfortable on prolonged journeys.
Nevertheless, these issues aside, the bike offers impressive performance for its price range,
making it a solid, middle-of-the-road choice for both commuting and leisure rides.
"""

We will use the example review to construct an `examples` list of prompt/response pairss we can pass into the `prompt_with_examples` function.

In [36]:
example_prompt_neutral = f"""
What is the overall sentiment of this review {example_neutral_review}?

Choose one of "negative", "neutral", or "positive".
Overall Sentiment:
"""
example_response_neutral = "neutral"

example_neutral = (example_prompt_neutral, example_response_neutral)
examples = [example_neutral]

Now we construct the main prompt, which again uses the review from above that we hope to be classified as neutral.

In [37]:
prompt = f"""
What is the overall sentiment of this review {review}?

Choose one of "negative", "neutral", or "positive".
Overall Sentiment:
"""

prompt_with_one_example = prompt_with_examples(prompt, examples)

In [38]:
print(generate(prompt_with_one_example))

Positive


Not the response we were hoping to get since the model still labels the review as `"Positve"`. Perhaps providing more examples will guide the model better. Lets give two-shot learning a shot, pun intended.

#### Two-Shot Learning

In addition to the neutral example we have given the model, let's also provide it with an example of a positive review in hopes that it will be more clear about the difference between the two.

In [39]:
example_review_positive = """
I've been absolutely delighted with my Starlight Cruiser purchase from Star Bikes.
The bike exudes a charm with its sleek design that turns heads as I glide through city lanes.
It's not just about looks though; the bike performs wonderfully. The gears shift like a dream,
making for a ride that's as smooth as silk across various urban terrains. I was initially skeptical
about the comfort of the seat, but it proved to be pleasantly supportive, even on my longer weekend adventures.
While the color choices were limited, I found one that suited my style perfectly.
Any minor imperfections pale in comparison to the bike's overall quality and the sheer joy it brings to my daily commutes.
For the price, the Starlight Cruiser is an undeniable gem that I would happily recommend.
"""

In [40]:
example_prompt_positive = f"""
What is the overall sentiment of this review {example_review_positive}?

Choose one of "negative", "neutral", or "positive".
Overall Sentiment:
"""
example_response_positive = "positive"

In [41]:
main_prompt = f"""
What is the overall sentiment of this review {review}?

Choose one of "negative", "neutral", or "positive".
Overall Sentiment:
"""

#### Exercise: Perform Two-Shot Learning

Perform **two-shot learning** by providing the model with both a neutral and a positive example interaction before prompting it for a response to `review` which we are hoping the model will classify as `neutral`.

- Use `example_neutral` (already defined above) as one example.
- Use `example_review_positive`, `example_prompt_positive` and `example_response_positive` above to construct a positive user/model interaction example. Call it `example_positive`.
- Use both examples (neutral and positive) along with `main_prompt` above, to construct a prompt with two examples (using the `prompt_with_examples` function).
- Generate and print a model response using your prompt with two examples.

If you get stuck, there is a working solution below.

#### Your work here

In [42]:
example_positive = (example_prompt_positive, example_response_positive)
examples = [example_neutral, example_positive]

prompt_with_two_examples = prompt_with_examples(main_prompt, examples)

print(generate(prompt_with_two_examples))

neutral


#### Solution

<details>
<summary>
Click here to see the solution
</summary>

```python
example_positive = (example_prompt_positive, example_response_positive)
examples = [example_neutral, example_positive]

prompt_with_two_examples = prompt_with_examples(main_prompt, examples)

print(generate(prompt_with_two_examples))
```

</details>

## Additional Task (Optional)
### Generating Data for Downstream Consumption

Now that our model is able to perform **sentiment analysis** effectively, let's extend its analysis capabilities to be able to generate JSON objects for downstream consumption that contain a given review's positive and negative points.

We will begin iterating on a prompt by simply asking the model to separately list out the positive and negative points in a review.

In [43]:
prompt = f"""
From the review, list the positive points and negative points separately: {review}
"""

print(generate(prompt))

Positive points:

* Smooth ride
* Handles urban terrains with ease
* Good value for the money

Negative points:

* Seat can be uncomfortable for longer rides
* Limited color options


Not surprisingly, the model did quite well. Let's iterate the prompt to get the model to produce a JSON object for us.

In [44]:
prompt = f"""
From the review, list down the positive points and negative points separately, in JSON: {review}
"""

print(generate(prompt))

Here is the JSON representation of the review:

{
"positive_points": [
"smooth ride",
"ease of handling urban terrain",
"good value for money"
],
"negative_points": [
"seat uncomfortable for longer rides",
"limited color options"
]
}


That appears to have made no difference. Let's try to be more **precise** in our prompt about how we want the output to be formatted. We must use double curly braces `{{` and `}}` rather than single ones because we are using Python f-string literals which interprets single curly braces as placeholders for Python variables.

**NOTE:** In some instances, the output does look like a JSON output but in reality is not a valid JSON string.

In [45]:
prompt = f"""
From the review below, list down the positive points and negative points separately, in JSON. Use the following format:

{{"positive": [], "negative": []}}

Review: {review}
"""

print(generate(prompt))

Positive points:

* Smooth ride
* Handles urban terrains with ease
* Good value for the money

Negative points:

* Seat can be uncomfortable for longer rides
* Limited color options


That did not make any difference either.

To output a JSON response, we need to employ the methods we have learnt so far. We can do so in two ways:
* Providing the model with **instructive examples** i.e. example reviews alongside their corresponding JSON outputs.
* Adding a **cue** to the prompt.

We also provide a helper function called `pretty_print_json` which you can pass to the LLM, and it will print the output with nice indenting only if it is valid JSON.

If you get stuck, a solution is provided further down below.

**1. Using instructive examples**

In [46]:
example_reviews = [
"""\
I recently purchased the Starlight Cruiser from Star Bikes, and I've been thoroughly impressed. \
The ride is smooth and it handles urban terrains with ease. \
However, I did find the seat a bit uncomfortable for longer rides. \
Also, the color options could be better. Despite these minor drawbacks, \
the build quality and the performance of the bike are commendable. It's a good value for the money.\
""",
"""\
Got the Starlight Cruiser last week, and I'm a bit disappointed. \
The brakes are not as responsive as I'd like and the gears often get stuck. \
The design is good but performance-wise, it leaves much to be desired.\
"""
]

In [47]:
example_outputs = [
    {
        "positive": ["smooth ride", "ease of handling urban terrains", "good value for the money"],
        "negative": ["seat uncomfortable for longer rides", "limited color options"]
    },
    {
        "positive": ["good design"],
        "negative": ["brakes not repsonsive", "gears often get stuck", "performance leaves much to be desired"]
    }
]

In [48]:
def pretty_print_json(json_string):
    print(json.dumps(json.loads(json_string), indent=4))

#### Your work here

In [51]:
examples = [(example_review, json.dumps(example_output)) for example_review, example_output in zip(example_reviews, example_outputs)]

# we use `review` as our base prompt
prompt = prompt_with_examples(review, examples)
pretty_print_json(generate(prompt))

{
    "positive": [
        "smooth ride",
        "ease of handling urban terrains",
        "good value for the money"
    ],
    "negative": [
        "seat uncomfortable for longer rides",
        "limited color options"
    ]
}


Now the output is a valid JSON string.

#### Solution

<details>
<summary>
Click here to see the solution
</summary>

```python
examples = [(example_review, json.dumps(example_output)) for example_review, example_output in zip(example_reviews, example_outputs)]

# we use `review` as our base prompt
prompt = prompt_with_examples(review, examples)
pretty_print_json(generate(prompt))
```
</details>

**2. Adding a cue to the prompt**

While Solution **1** demonstrates an effective use of **two-shot learning**, in line with this notebook's objectives, it's worth mentioning, one can get a working solution by adding a **cue** of `JSON:` to the prompt we were iterating on earlier.

In [None]:
prompt = f"""
From the review below, list down the positive points and negative points separately, in JSON. Use the following format:

{{"positive": [], "negative": []}}

Review: {review}
JSON output:
"""

pretty_print_json(generate(prompt))

### Review of Main Concepts

The following key concepts were introduced in this lab:

- **Precise**: Being as explicit as necessary to guide the response of an LLM.
- **Cue**: A conclusion to a prompt that guides its response, often to prevent it from including the cue itself in its response.
- **"Time to think"**: A quality in prompts that supports LLM responses (often requiring calculation) by asking for the model to take multiple steps and show its work.
- **Sentiment Analysis:** Identifying the mood or sentiment for a piece of text.
- **Instruction Fine-Tuning:** Improving a model's task performance through tailored example-based learning.
- **LLaMA-2 Prompt Template:** A pre-designed format guiding LLaMA-2 model responses, used during instruction fine-tuning.
- **Few-shot Learning:** Prepending one-to-many instructive examples to a model to improve its responses.