# Advanced Prompt Engineering for Meta Llama 3
---

> *This notebook should work well with the **`Python 3`** kernel in SageMaker Studio on **`us-west-2`** region*

## What is Prompt Engineering?
Prompt engineering is an emerging discipline focused on developing and optimizing prompts to efficiently adapt language model to various tasks. By using various prompt engineering techniques, you can often get better responses from the foundation models (FM) without spending efforts and resources on finetuning them.

There are multiple advanced prompt engineering techniques, this notebook will covers:
1. **Few-shots example**: This technique involves providing the large language model with a few examples (normally, these can be between 3 and 5) of the desired task or output. It helps the model learn patterns from limited data, making it particularly useful when extensive datasets are unavailable. Few-shot prompting can enable the model to generalize from minimal input effectively.
2. **Chain of Thoughts (CoT)**: This technique encourages the language model to articulate its reasoning step-by-step, which can improve transparency and understanding of the model's thought and decision making process. By breaking down complex problems into more manageable parts, CoT prompting increases problem-solving accuracy.
3. **Self-consistency**: This technique enhances the performance, and consistency of the language models by generating multiple outputs for the same input and selecting the most consistent answer among them.
4. **ReAct (Reasoning and Acting)**: This technique combines reasoning with action by having the model generate both logical reasoning steps and specific actions to take based on those steps. This allows for more dynamic interactions where the model can adapt its responses based on ongoing interactions.
5. **Prompt chaining**: This technique involves linking (hence, chaining) multiple prompts together to create a coherent, consistent flow of interactions. By referencing previous inputs or outputs, prompt chaining encourages deeper interactions and provides more consistent responses from language model.


### Implementation
In this notebook, we will show how to implement various prompt engineering techniques with **Meta Llama 3** foundation model, using the **Amazon Bedrock API** with `Boto3` client.


<div class="alert alert-block alert-info">
    <b>Remark</b>: If this is your first time user, please make sure you are enable the access to Llama model on the Amazon Bedrock console.
</div>

### Set up
---

**Note**: You can ignore the error message during the `pip install`.

In [1]:
%pip install --upgrade --quiet boto3

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
autogluon-multimodal 1.1.1 requires nvidia-ml-py3==7.352.0, which is not installed.
aiobotocore 2.13.2 requires botocore<1.34.132,>=1.34.70, but you have botocore 1.38.4 which is incompatible.
amazon-sagemaker-sql-magic 0.1.3 requires sqlparse==0.5.0, but you have sqlparse 0.5.1 which is incompatible.
autogluon-core 1.1.1 requires scikit-learn<1.4.1,>=1.3.0, but you have scikit-learn 1.4.2 which is incompatible.
autogluon-features 1.1.1 requires scikit-learn<1.4.1,>=1.3.0, but you have scikit-learn 1.4.2 which is incompatible.
autogluon-multimodal 1.1.1 requires omegaconf<2.3.0,>=2.1.1, but you have omegaconf 2.3.0 which is incompatible.
autogluon-multimodal 1.1.1 requires scikit-learn<1.4.1,>=1.3.0, but you have scikit-learn 1.4.2 which is incompatible.
autogluon-tabular 1.1.1 requires scikit-learn<1.4.1,>=1

In [2]:
import boto3
from botocore.exceptions import ClientError
import ipywidgets as widgets
import json
import time

If you want to use other meta foundation model, here is the `model_id` for 8b and 405b:

- Llama 3.1 8B: `'meta.llama3-1-8b-instruct-v1:0'`
- Llama 3.1 70B: `'meta.llama3-1-70b-instruct-v1:0'`
- Llama 3.1 405B: `'meta.llama3-1-405b-instruct-v1:0'`
- Llama 3 8B: `'meta.llama3-8b-instruct-v1:0'`
- Llama 3 70B: `'meta.llama3-70b-instruct-v1:0'`
- Llama 4 Scout: `'meta.llama4-scout-17b-instruct-v1:0'`
- Llama 4 Maverick: `'meta.llama4-maverick-17b-instruct-v1:0'`

In [11]:
model_selection = widgets.Dropdown(
    options=[
        'meta.llama3-1-8b-instruct-v1:0',
        'meta.llama3-1-70b-instruct-v1:0',
        'meta.llama3-1-405b-instruct-v1:0',
        'meta.llama3-8b-instruct-v1:0',
        'meta.llama3-70b-instruct-v1:0',
        'us.meta.llama4-maverick-17b-instruct-v1:0',
        'us.meta.llama4-scout-17b-instruct-v1:0'
    ],
    value='us.meta.llama4-maverick-17b-instruct-v1:0',
    description='Select model:',
    disabled=False,
)
model_selection

Dropdown(description='Select model:', index=5, options=('meta.llama3-1-8b-instruct-v1:0', 'meta.llama3-1-70b-iâ€¦

In [12]:
model_id = model_selection.value
print('You selected: {}'.format(model_id))

You selected: us.meta.llama4-maverick-17b-instruct-v1:0


In [13]:
boto_session = boto3.session.Session()
bedrock_runtime_client = boto_session.client(
    service_name='bedrock-runtime',
    region_name='us-east-1'
)

For this notebook, we will use `converse` API from Boto3 client.

For more details please refer to boto3 documentation [here](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/bedrock-runtime/client/converse.html) and the user guide [here](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-meta.html)

In [14]:
def generate_text(
    model_id: str,
    user_message: str,
    max_tokens: int=1024,
    temperature: float=0.2,
    top_p: float=0.8,
    verbose: bool=False
) -> dict:
    output_dict = dict()
    message_payload = [{
        'role': 'user',
        'content': [{
            'text': user_message
        }],
    }]
    inference_config = {
        'maxTokens': max_tokens,
        'temperature': temperature,
        'topP': top_p,
    }
    if verbose:
        print(' -- You are using: {}'.format(model_id))
        print(' -- Here is the inference configuration:\n{}'.format(json.dumps(inference_config, indent=2)))

    try:
        resp = bedrock_runtime_client.converse(
            modelId=model_id,
            messages=message_payload,
            inferenceConfig=inference_config,
            additionalModelRequestFields={}
        )
        output_dict['text'] = resp['output']['message']['content'][0]['text'].strip()
        output_dict['usage'] = resp['usage']
        output_dict['metrics'] = resp['metrics']

    except (ClientError, Exception) as e:
        print('Error: cannot invoke {model_id}. Reason: {error}'.format(model_id=model_id, error=e))

    return output_dict



## Few-shot prompting
---
This  technique involves providing the language model with a **few** specific examples of the desired tasks or outputs, generally resulting in more accurate, and consistent output. The user will supplies a few **input-output** pairs that illustrate the desired response.



First, let's look at the input-output (zero-shot) prompting for movie sentimental use case.

In [15]:
user_prompt = '''Please rate this comment from the scale of 1 to 5 stars.
Here is the comment: This movie is excellent.
'''

resp = generate_text(model_id, user_prompt)

print(resp['text'])

I would rate this comment 3 out of 5 stars. The comment is brief and to the point, but it lacks specific details or explanations to support the claim that the movie is "excellent." A more detailed review would be more helpful and engaging. 

Here's a breakdown of the rating:
* The comment is clear and easy to understand (1 star for clarity).
* The comment is positive and shows enthusiasm for the movie (1 star for positivity).
* The comment lacks specific examples or details to support the claim (deduction of 1 star for lack of detail).

Overall, it's a decent but unimpressive comment, hence 3 stars.


Let's continue with movie sentimental use case but with guidelines on how to rate the movie.

In [16]:
user_prompt = '''Please rate this comment from the scale of 1 to 5 stars
Probably the best movie of my life!: 5 star
I'm ok with this movie: 3 star
This movie is waste of time: 1 star

Here is the comment: This movie is excellent.
'''
resp = generate_text(model_id, user_prompt)
print(resp['text'])

To rate the comment "This movie is excellent," let's analyze the given ratings and their corresponding comments:

1. 5-star: "Probably the best movie of my life!" - This indicates a very positive review.
2. 3-star: "I'm ok with this movie" - This suggests a neutral or mediocre opinion.
3. 1-star: "This movie is waste of time" - This shows a very negative review.

The comment "This movie is excellent" is very positive, similar to "Probably the best movie of my life!" It expresses a high level of satisfaction and approval. Therefore, it aligns with the 5-star rating.

Rating: 5 stars.


As observed from the output, language model provides more consistent, and more accurate rating.

## Chain-of-Thoughts (CoT)
---
This technique involves the language model to articulate its reasoning or follow specific prompt steps, which can improve transparency, help guide its thinking, and generate a more coherent and relevant response. By breaking down complex problems into more manageable steps, CoT prompting improves problem solving accuracy.

Let's start with an example of generic prompts.

In [17]:
user_prompt = '''Can you describe Tokyo Tower to your audience?
'''

# Let's generate 2 examples for comparison
for i in range(2):
    print('Round {}'.format(i+1))
    resp = generate_text(model_id, user_prompt, max_tokens=1024, temperature=.5)
    print(resp['text'])
    print(' ----- ')

Round 1
Tokyo Tower is a communications tower located in the heart of Tokyo, Japan. Standing at 332.9 meters (1,092 feet) tall, it was the tallest tower in the world when completed in 1958, and it held that title until 1965. The tower is modeled after the Eiffel Tower in Paris, but it is painted in a distinctive orange and white color scheme to comply with air traffic regulations.

The tower has two main observation decks: the Main Deck at 150 meters (492 feet) and the Top Deck at 250 meters (820 feet). Visitors can enjoy panoramic views of the city from these decks, and on a clear day, they can see Mount Fuji, the famous Japanese mountain.

Tokyo Tower is not only a popular tourist destination but also an important broadcasting tower, with many radio and television stations using it to transmit their signals. Over the years, it has become an iconic symbol of Tokyo and a must-visit attraction for anyone traveling to the city.

In recent years, Tokyo Tower has been surrounded by other t

Now let's test with CoT prompting techniques.

In [18]:
user_prompt = '''Can you describe Tokyo Tower to your audience? Begin with:
1. What is it
2. When it was built, and how long it took them to build
3. Why it was built
4. End it with why should I visit this place, throw in fun facts to engage with the audiences.
'''

for i in range(2):
    print('Round {}'.format(i+1))
    resp = generate_text(model_id, user_prompt, max_tokens=1024, temperature=.5)
    print(resp['text'])
    print('-----\n\n')

Round 1
Tokyo Tower! Let's dive into the fascinating world of this iconic landmark.

1. **What is it?** Tokyo Tower is a communications tower located in the heart of Tokyo, Japan. It's a lattice tower, also known as a truss tower, that's painted in a distinctive orange and white color scheme to comply with air traffic regulations.

2. **When was it built, and how long did it take?** Construction on Tokyo Tower began in June 1957 and was completed in December 1958. It took approximately 1.5 years to build, with a workforce of over 1,000 laborers. The tower was officially opened to the public on December 23, 1958.

3. **Why was it built?** Tokyo Tower was built as a broadcasting tower to support the growing demand for television and radio broadcasting in Japan. At the time, it was the tallest tower in the world, standing at 332.9 meters (1,092 feet) tall. Its primary purpose was to provide a broadcasting signal to the Tokyo metropolitan area.

4. **Why should you visit Tokyo Tower?** You

As you can see from the responses, the language model responds to user in more structured and consistent approach. This way we can provide more control on the language model's output.

## Self-consistency 
---
The language models are probabilistic, so even with Chain-of-Thought technique, a single generation might produce incorrect or inconsistent results. With self-consistency, it selects the most frequent answer from multiple generations, hence improving the consistency to the user.

Let's examine the example for customer support use case.

In [19]:
user_prompt = '''
If a customer have ordered a laptop but received a tablet instead.
What are the steps you will take?
'''
for i in range(2):
    print('Round {}'.format(i+1))
    resp = generate_text(model_id, user_prompt)
    print(resp['text'])
    print('-----\n\n')
    time.sleep(2)

Round 1
To resolve the issue of a customer receiving a tablet instead of the laptop they ordered, the following steps can be taken:

1. **Acknowledge and Apologize**: Respond to the customer's complaint promptly, acknowledging the mistake, and apologize for the inconvenience caused.

2. **Verify the Order**: Check the customer's order details to confirm that a laptop was indeed ordered and to identify the specific laptop model and any other relevant details.

3. **Investigate the Issue**: Look into the cause of the discrepancy to understand how a tablet was shipped instead of a laptop. This could involve checking the warehouse or shipping procedures.

4. **Arrange for Correct Order Fulfillment**: Immediately process a new order for the correct laptop, ensuring it is shipped out as soon as possible. Provide the customer with a tracking number for the new shipment.

5. **Return or Dispose of Incorrect Item**: Inform the customer that they can either keep the tablet (potentially at a disc

In [20]:
user_prompt = '''
If a customer have ordered a laptop but received a tablet instead.
The support agent should first acknowledge the mistake,
guide the customer on how to return the tablet,
and ensure that the laptop is shipped promptly.
'''
for i in range(2):
    print('Round {}'.format(i+1))
    resp = generate_text(model_id, user_prompt)
    print(resp['text'])
    print('-----\n\n')
    time.sleep(2)

Round 1
That's a great approach to handling the situation. Here's a more detailed step-by-step guide on how the support agent can resolve the issue:

1. **Acknowledge the mistake**: The support agent should start by apologizing for the error and acknowledging the customer's frustration. This shows that the agent is taking the issue seriously and is committed to making it right.

Example: "I'm so sorry to hear that you received a tablet instead of the laptop you ordered. I can imagine how frustrating that must be, and I apologize for the mistake."

2. **Guide the customer on how to return the tablet**: The agent should then provide clear instructions on how to return the tablet, including any relevant details such as:
	* Return shipping labels or instructions on how to obtain one.
	* Packaging requirements.
	* Any specific documentation or information required for the return.

Example: "To return the tablet, please use the pre-paid return shipping label that will be sent to you via emai

## ReAct (Reasoning and Acting)
---
This technique combines both reasoning and action planning within language models. This allows the models to generate both reasoning traces and task-specific actions, enhancing the models' ability to interact with external environments and adapt their responses accordingly. As a result, this technique can significantly improve the accuracy and reliability of models' responses.


In [21]:
user_prompt = '''
What should tech company do if it wants to expand its operations into Malaysia?
'''

resp = generate_text(model_id, user_prompt,)
print(resp['text'])

Expanding Operations into Malaysia: A Tech Company's Guide

If a tech company wants to expand its operations into Malaysia, here are some steps it can take:

1. **Research the Malaysian Market**: Understand the local tech industry, market trends, and consumer behavior. Identify potential competitors and partners.
2. **Register the Business**: Register the company with the Companies Commission of Malaysia (SSM) and obtain necessary licenses and permits. Consider registering as a foreign-owned company or setting up a subsidiary.
3. **Understand Local Regulations**: Familiarize yourself with Malaysian laws and regulations, such as the Personal Data Protection Act 2010 and the Communications and Multimedia Act 1998.
4. **Obtain Necessary Permits and Licenses**: Obtain permits and licenses from relevant authorities, such as the Malaysian Communications and Multimedia Commission (MCMC) and the Ministry of International Trade and Industry (MITI).
5. **Set up a Local Office**: Establish a loca

In [22]:
user_prompt = '''
A tech company is considering expanding its operations into Malaysia. 
First, gather information about the market conditions in that country, 
then evaluate potential risks and benefits, and finally suggest an action plan.
'''
resp = generate_text(model_id, user_prompt,)
print(resp['text'])

To address the task of expanding a tech company into Malaysia, we'll follow a step-by-step approach that includes gathering information about market conditions, evaluating potential risks and benefits, and suggesting an action plan.

### 1. Gathering Information about Market Conditions

#### Economic Overview
Malaysia has a diverse economy with a strong presence of manufacturing, services, and to a lesser extent, agriculture. The country is known for its stable economic growth and has been a hub for multinational corporations (MNCs) in Southeast Asia.

#### Tech Industry
The tech industry in Malaysia is growing, driven by government initiatives to promote digitalization and innovation. Key areas include:
- **Digital Economy:** The Malaysian government has been proactive in promoting the digital economy through various initiatives, including the Malaysia Digital Economy Corporation (MDEC) and the National Digitalization Blueprint (MyDIGITAL).
- **Startups and Innovation:** There's a gro

## Prompt chaining
---
This technique involves breaking down a complex task into smaller, manageable subtasks, where the output of one prompt serves as the input for the next one. This technique enhaces the reliability and performance of language model to process information in more structured manner and on specific task at a time.

One common **use case** is writing a long document (i.e., proposal, marketing plan). Let's first observe the traditional prompting.

In [23]:
user_prompt = '''
Write a 300-words marketing plan for an eco-friendly product.
'''
resp = generate_text(model_id, user_prompt,)
print(resp['text'])

**Marketing Plan for Eco-Friendly Product: "GreenClean"**

**Objective:** Position GreenClean, a biodegradable and non-toxic cleaning solution, as the go-to eco-friendly alternative for households and businesses, achieving $250,000 in sales within the first year.

**Target Market:** Environmentally conscious consumers, eco-friendly influencers, and businesses seeking sustainable practices.

**Marketing Strategies:**

1. **Digital Marketing:** Utilize social media platforms (Facebook, Instagram, Twitter) to share engaging content highlighting GreenClean's eco-friendly features and benefits. Collaborate with eco-influencers to promote the product.
2. **Content Marketing:** Develop a blog on the GreenClean website featuring articles on sustainable living, eco-friendly cleaning tips, and the environmental impact of traditional cleaning products.
3. **Email Marketing:** Build an email list and send regular newsletters with promotions, product updates, and educational content.
4. **Partnersh

Now, let's implement it with prompt chaining.

In [24]:
step1_prompt = 'Identify top 3 target audiences for an eco-friendly product.'
resp = generate_text(model_id, step1_prompt,)
output1_resp = resp['text']
print(output1_resp)

Based on current market trends and consumer behavior, the top 3 target audiences for an eco-friendly product are:

1. **Millennials (born 1981-1996) and Gen Z (born 1997-2012)**: These younger generations are more likely to prioritize sustainability and environmental responsibility when making purchasing decisions. They are digitally native, active on social media, and often influence their families and friends to adopt eco-friendly habits.

2. **Health-Conscious Consumers**: Individuals who prioritize their health and wellbeing are more likely to be interested in eco-friendly products, as they often perceive these products as being safer and more sustainable. This audience may include parents seeking non-toxic products for their families, individuals with allergies or sensitivities, and those who follow a wellness-oriented lifestyle.

3. **Environmentally Aware and Active Individuals**: People who are already engaged in environmentally friendly activities, such as recycling, reducing 

In [25]:
step2_prompt = '''
Based on the identified target audience, suggest effective marketing channels.
Here are the target audience: 
{target_audience}
'''.format(target_audience=output1_resp)
resp = generate_text(model_id, step2_prompt,)
output2_resp = resp['text']
print(output2_resp)

Based on the identified target audiences for an eco-friendly product, the following are effective marketing channels to reach them:

**For Millennials (born 1981-1996) and Gen Z (born 1997-2012):**

1. **Social Media**: Utilize platforms like Instagram, TikTok, Facebook, and Twitter to create engaging content, promote products, and collaborate with eco-influencers.
2. **Influencer Marketing**: Partner with social media influencers and content creators who have a strong focus on sustainability and eco-friendliness.
3. **Online Advertising**: Use targeted online ads, such as Google Ads and social media ads, to reach this digitally native audience.
4. **Content Marketing**: Create informative blog posts, videos, and guides that highlight the eco-friendly features and benefits of the product.

**For Health-Conscious Consumers:**

1. **Wellness and Health-Focused Media**: Advertise in publications and online platforms that cater to health-conscious individuals, such as wellness blogs and ma

In [26]:
step3_prompt = '''
Develop a messaging strategy that emphasizes sustainability for this audience.
Here are the target audience and channel:
{step2_resp}
'''.format(step2_resp=output2_resp)
resp = generate_text(model_id, step3_prompt,)
output3_resp = resp['text']
print(output3_resp)

**Sustainability-Focused Messaging Strategy**

To effectively engage with the target audiences for an eco-friendly product, we will develop a messaging strategy that emphasizes sustainability across the identified marketing channels.

**Core Messaging Pillars:**

1. **Environmental Responsibility**: Highlight the product's eco-friendly features, sustainable materials, and reduced environmental impact.
2. **Health and Wellness**: Emphasize the product's safety, non-toxicity, and benefits for human health and well-being.
3. **Sustainable Lifestyle**: Promote the product as part of a broader sustainable lifestyle, encouraging consumers to make environmentally conscious choices.

**Channel-Specific Messaging:**

### For Millennials and Gen Z (Social Media, Influencer Marketing, Online Advertising, Content Marketing)

1. **Visual Storytelling**: Utilize engaging visuals and videos to showcase the product's eco-friendly features and sustainable lifestyle.
2. **Influencer Partnerships**: Coll

## Bonus: Tree-of-Thoughts (ToT)
---
Tree-of-Thoughts (ToT) prompting is more recent, innovative approach in prompt engineering designed to enhance the problem-solving capabilities of the large language models. This technique explores multiple reasoning paths simultaneously, hence improving the models' ability to tackle complex tasks that requires strategic planning and evaluation.

Let's consider the example of birthday party planning.

In [27]:
tot_prompt = '''
1. Plan a surprise birthday party for a friend.
Choice 1: A small gathering at a local park.
Choice 2: A fancy dinner at a restaurant.
Choice 3: A themed party at home.
Choice 4: A weekend getaway.
2. For each choice, consider:
- Guest list
- Decorations
- Activities
3. Weigh the pros and cons of each idea before finalizing the plan.

Based on the pros and cons, finalize one plan in your <response> tag, 
put the rest of your comparison and the reason why you chose the plan in your <thought_process> tag.
'''

resp = generate_text(model_id, tot_prompt)
print(resp['text'])

### Finalized Plan

<response>
I will plan a themed party at home for my friend's surprise birthday party. 
The party will be a 90s-themed party with a guest list of close friends and family. 
The decorations will include neon lights, old 90s posters, and a playlist of popular 90s music. 
The activities will include a trivia game about 90s pop culture, a karaoke session, and a dance competition.
</response>

### Thought Process

<thought_process>
To plan a surprise birthday party for my friend, I considered four different options: a small gathering at a local park, a fancy dinner at a restaurant, a themed party at home, and a weekend getaway.

1. **Small Gathering at a Local Park:**
   - Guest list: Close friends and family (around 20 people).
   - Decorations: Balloons, a birthday banner, and a picnic setup.
   - Activities: Outdoor games like frisbee or soccer, and a cake-cutting ceremony.
   - Pros: Casual and relaxed atmosphere, easy to organize.
   - Cons: Weather-dependent, limit

## In summary
--- 
Prompt engineering is an important field for optimizing how you apply, develop, and understand large language models. At its core, it is about designing prompts or interactions to expand what large language model can do to our objectives, address their weakness like hallucination, and gain insights into their functioning. 

In a production environment, you should experiment multiple prompt techniques, and may need to combine multiple techniques to achieve your use cases' objectives.