## Llama 3.1: Prompt Engineering Guide - Bedrock Converse API Best-Practices

This notebook should work well with the Data Science 3.0 kernel in SageMaker Studio.

This guide is designed to assist you in transitioning your Message API prompts to work well with Llama 3.1 and the Bedrock Converse API. Our goal is to demonstrate how Llama 3.1 can is similar to the Messages API formatting for handling specific tasks. While this notebook may not cover every edge case or unique use case scenario, it will provide a strong foundation for starting your journey with Llama 3.1.

We will begin by reviewing how to format prompts and create a chat completion interface. Additionally, we will highlight the performance and efficiency of Llama 3.1 through a few examples, illustrating its effectiveness in different applications.

## Llama 3.1

Llama 3.1 is a high-quality open-source language model developed by Meta. It is designed for a wide range of natural language processing tasks, including text generation, question answering, and more.
Llama 3.1 has the following key features:
    
- Proficient in several languages, including English, French, Italian, German, Spanish, and others.
- Context window of up to 128k tokens, allowing for long-form text processing.
- Capable of handling various tasks, such as text generation, summarization, and question answering.
- Generates coherent and fluent responses.

## How to Format Prompts for Llama 3.1

Here we will describe the prompt format for Llama 3.1 with it's emphasis on new features.

- **<|begin_of_text|> Token:** This token denotes the beginning of a prompt. Llama 3.1 will understand that there will be no other tokens before this special beginning of sequence token.
- **<|end_of_text|> Token:** This token denotes the end of a prompt. Llama 3.1 will understand that there will be no other tokens after this special end of sequence token.
- **<|start_header_id|> and <|end_header_id|> Tokens:** These tokens enclose the role for a particular message. The possible roles are system, user, assistant, and ipython.
- **<|eot_id|> Token:** End of turn. Represents when the model has determined that it has finished interacting with the user message that initiated its response.
- **<|eom_id|> Token:** End of message. Represents a possible stopping point for execution where the model can inform the executor that a tool call needs to be made.
- **<|python_tag|> Token:** Used in the model’s response to signify a tool call.

When invoking the Amazon Bedrock Converse API, these tokens are extracted to ensure that you can run inference without needing to use the special tokens ingested into the prompt. We will take a look in the cells below.

## Comparison with the Messages API format

If you have used OpenAI models and other platforms that use the messages API format for chat completion API, you will see that instructions for the model go into a system/user/assistant format:

- System Role: This role sets the context or instructions for the model.
- User Role: This role includes the specific task or question posed to the model.
- Assistant Role: This role is the model's response.

Llama models on Bedrock when using the Converse API behave very similar having the same assigned roles for llama 3.1.

Additionally, Llama 3.1 now supports tool calling - introducing a new ipython role semantically used to mark messages with the output of a tool call when sent back to the model form the executor. With the Bedrock converse API, we can focus on the JSON formatting and outputting of the JSON syntax, rather than focusing specifically on the <ipython> role tag. For more information on the Llama 3.1 supported roles, feel free to check out the [Llama 3.1 model card](https://llama.meta.com/docs/model-cards-and-prompt-formats/llama3_1).

## Prerequisites:
- AWS SDK for Python (Boto3): Ensure you have the Boto3 library installed.
- Amazon Bedrock Model ID: Obtain the model ID for the conversational AI model you wish to use.
- AWS Credentials: Configure your AWS credentials to authenticate API requests.

## Import necessary libraries

In the next cell, we will set up our environment by importing the boto3 library and necessary libraries that we will need to confiure our bedrock client to invoke our llama 3.1 model within the Amazon Bedrock Converse API.

In [1]:
import boto3
from boto3 import client
from botocore.config import Config
import json
import re
import logging
from botocore.exceptions import ClientError
logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)

In [2]:
config = Config(read_timeout=2000)
bedrock_client = boto3.client(service_name='bedrock-runtime', region_name="us-west-2", config=config)

INFO:botocore.credentials:Found credentials from IAM Role: BaseNotebookInstanceEc2InstanceRole


## Instantiate llama 3.1 model 

In [3]:
meta_llama_3_70b = 'meta.llama3-1-70b-instruct-v1:0'

model_id = meta_llama_3_70b

## Prompting with Amazon Bedrock Converse API

The Amazon Bedrock Converse API can be used to invoke large language models on Amazon Bedrock by extracting away the special stop tokens and facilitate the creation of conversational applications by enabling the exchange of messages between users and Amazon Bedrock models.

**Key features include:**
- **Consistent Interface:**
The Converse API provides a uniform interface that works across all Amazon Bedrock models supporting messages. This consistency allows developers to write code once and use it with different models without needing to adjust for model-specific differences.

- **Turn-Based Conversations:**
The API supports multi-turn conversations, where a series of messages are exchanged between the user (acting as the user role) and the model (acting as the assistant role). This enables the development of chatbots and other conversational agents that can maintain context over multiple interactions.

- **Tool Use (Function Calling):**
The Converse API supports tool use, allowing models to request the invocation of external functions or tools. This is particularly useful for tasks that require interaction with external APIs or services. The model generates a JSON structure with the necessary parameters, which the calling application then uses to invoke the specified tool

## Prompting with the Bedrock Converse API

Very similar to OpenAI API Messages format, where you define roles to have a conversational interface - Amazon Bedrock Converse API uses the same concept.

#### Prompting with OpenAI API

```
response = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is a LLM?"}
  ]
)
```

After setting your model ID as llama 3.1 8b model, to start interfacing with the converse API - you can put the prompt input inside the content 

Listed below is an example of how you can change / assign the different roles and add input to the role.

#### Prompting with Converse API on Bedrock

```
{
    "role": "user | assistant",
    "content": [
        {
            "text": "string"
        }
    ]
}
```

In [4]:
# Define the message with content as a list
message = {
    "role": "user",
    "content": [
        {"text": "Write me a story of a fairytale fable"}
    ]
}


## Calling the Bedrock Converse API 

The Amazon Bedrock Converse API allows you to interact with conversational AI models. Below is a step-by-step guide on how to call this API using the bedrock_client.converse method in Python.

Parameters:
- modelId: This parameter specifies the ID of the conversational AI model you want to use. Replace model_id with your actual model ID.
- messages: This parameter contains the conversation history. It should be a list of message objects. Each message object typically includes the role (e.g., "user" or "assistant") and the content of the message. In this example, message is wrapped in a list to form the conversation history.
- inferenceConfig: This parameter configures the inference settings for the API call. It includes:
maxTokens: The maximum number of tokens (words or subwords) to generate in the response. In this example, it is set to 2000.
- temperature: Controls the randomness of the output. Lower values make the output more deterministic. Here, it is set to 0, making the output as deterministic as possible.
- topP: Controls nucleus sampling, where the model considers the smallest set of tokens whose cumulative probability is greater than or equal to topP. Here, it is set to 0.5, balancing between diversity and focus.

The bedrock_client.converse method is called with the specified parameters. The response from the API call is stored in the response variable. For more information on calling the Converse API, please feel free to follow the [Amazon Bedrock Converse API documenation](https://docs.aws.amazon.com/bedrock/latest/userguide/conversation-inference.html#conversation-inference-call).

In [5]:
response = bedrock_client.converse(
    modelId=model_id,
    messages=[message],  # Wrap the message in a list
    inferenceConfig={
        "maxTokens": 2000,
        "temperature": 0,
        "topP": .5
    }
)

In [6]:
# Print the response
print("Full Response:")
print(json.dumps(response, indent=4))

Full Response:
{
    "ResponseMetadata": {
        "RequestId": "6335488c-c096-4b4e-a316-08488ef85f55",
        "HTTPStatusCode": 200,
        "HTTPHeaders": {
            "date": "Wed, 14 Aug 2024 01:29:13 GMT",
            "content-type": "application/json",
            "content-length": "3618",
            "connection": "keep-alive",
            "x-amzn-requestid": "6335488c-c096-4b4e-a316-08488ef85f55"
        },
        "RetryAttempts": 0
    },
    "output": {
        "message": {
            "role": "assistant",
            "content": [
                {
                    "text": "\n\nOnce upon a time, in a land far, far away, there was a tiny village nestled in the heart of a dense forest. The villagers lived simple lives, tending to their gardens, raising their animals, and relying on the forest for their livelihood.\n\nIn the center of the village stood an enormous tree, its trunk as wide as a house and its branches stretching up towards the sky like giant arms. The tree wa

## Code Generation: One-shot Prompting with Roles

The provided code demonstrates how to structure a conversation for code generation tasks using one-shot prompting approach using the Amazon Bedrock Converse API with Llama 3.1. Here's a breakdown of the process:

- **Conversation Structure:**
The chat_convo list contains a series of messages that alternate between user and assistant roles. This structure mimics a real conversation, providing context for the model.
- **Initial User Query:**
The conversation starts with a user request for a specific coding task:
- **Assistant's Response with Code:**
The model responds with an explanation and a code snippet. Note that in this example, the assistant mistakenly provided code for S3 instead of RDS, which highlights the importance of clear prompts and potentially multiple conversation turns for complex tasks.
- **User Feedback and Clarification:**
The user provides additional context about the assistant's expertise:
- **Assistant Acknowledgment:**
The assistant acknowledges the user's statement, maintaining the conversational flow.
- **Final User Prompt:**
The user asks for help, which serves as a prompt for the model to generate or refine the code:

In [7]:
chat_convo = [
    {
        "role": "user",
        "content": [{"text": "Can you write me a Python script to upload a file to Amazon RDS using a lambda function?"},
                    {"text": """
import boto3
from botocore.exceptions import NoCredentialsError

def upload_to_s3(file_name, bucket, object_name=None):
    # If S3 object_name was not specified, use file_name
    if object_name is None:
        object_name = file_name

    # Upload the file
    s3_client = boto3.client('s3')
    try:
        s3_client.upload_file(file_name, bucket, object_name)
        print(f"File {file_name} uploaded to {bucket}/{object_name}")
    except FileNotFoundError:
        print(f"The file {file_name} was not found")
    except NoCredentialsError:
        print("Credentials not available")

# Example usage
upload_to_s3('my_file.txt', 'my_bucket')
""" }]
    },
    {
        "role": "assistant",
        "content": [{"text": "Sure, I can help with that."}]
    },
    {
        "role": "user",
        "content": [{"text": "You are an expert at writing Python code for AWS services."}]
    },
    {
        "role": "assistant",
        "content": [{"text": "Thank you! If you have any more questions or need further assistance, feel free to ask."}]
    },
    {
        "role": "user",
        "content": [{"text": "Can you help with that?"}]
    }
]

In [8]:
response = bedrock_client.converse(
    modelId=model_id,
    messages=chat_convo,  # Wrap the message in a list
    inferenceConfig={
        "maxTokens": 2000,
        "temperature": 0,
        "topP": .5
    }
)

In [9]:
# Extract the assistant's message
assistant_message = response['output']['message']['content'][0]['text']

# Print the response in a human-readable format
print("Assistant's Response:")
print(assistant_message.strip())

# Print the response metrics
print("\nResponse Metrics:")
print(f"Input Tokens: {response['usage']['inputTokens']}")
print(f"Output Tokens: {response['usage']['outputTokens']}")
print(f"Total Tokens: {response['usage']['totalTokens']}")
print(f"Latency: {response['metrics']['latencyMs']} ms")

Assistant's Response:
To upload a file to Amazon RDS using a Lambda function, you'll need to use the `boto3` library to interact with the RDS service. However, RDS is a relational database service, and it's not designed for storing files. Instead, you can store files in Amazon S3 and then use the Lambda function to upload the file to S3.

But if you want to store the file in RDS, you can store it as a BLOB (Binary Large OBject) in a database table. Here's an example of how you can modify the script to upload a file to RDS using a Lambda function:
```
import boto3
import psycopg2

def upload_to_rds(file_name, db_instance_identifier, db_name, db_username, db_password, table_name):
    # Create a connection to the RDS instance
    rds_client = boto3.client('rds')
    db_instance = rds_client.describe_db_instances(DBInstanceIdentifier=db_instance_identifier)
    db_endpoint = db_instance['DBInstances'][0]['Endpoint']['Address']
    db_port = db_instance['DBInstances'][0]['Endpoint']['Port'

In the cell above, we can see that Llama 3.1 was able to successfully provide a pyhton script for lambda to get data into RDS. With the help of the assistant, the model was able to produce a response similar to the provided python script.

## Tool Calling with Llama 3.1

The llama 3.1 Instruct models are recommended for applications combing chat conversation and tool calling. With OpenAI models, an API call can describe functions and have the model intelligently choose to output a JSON object containing arguments to call one or many functions. The Chat Completions API does not call the function; instead, the model generates JSON that you can use to call the function in your code.

---

### JSON Generation and Outputting

LLaMA 3.1 models have built-in capabilities to generate structured JSON outputs, which can be used for various applications, including tool calling and data extraction. This functionality is particularly useful for integrating AI models with other systems and APIs that require structured data.

In [10]:
tools = [
    {
        "toolSpec": {
            "name": "create_support_ticket",
            "description": "Create a support ticket based on email content.",
            "inputSchema": {
                "json": {
                    "type": "object",
                    "properties": {
                        "ticket_id": {
                            "type": "string",
                            "description": "A unique identifier for the support ticket."
                        },
                        "issue_summary": {
                            "type": "string",
                            "description": "A brief summary of the issue described in the email."
                        },
                        "issue_description": {
                            "type": "string",
                            "description": "A detailed description of the issue."
                        },
                        "priority_level": {
                            "type": "integer",
                            "description": "The priority level of the issue on a scale from 1-5",
                            "minimum": 1,
                            "maximum": 5
                        },
                        "customer_contact_info": {
                            "type": "object",
                            "properties": {
                                "name": {
                                    "type": "string",
                                    "description": "The name of the customer."
                                },
                                "email": {
                                    "type": "string",
                                    "description": "The email address of the customer."
                                },
                                "phone": {
                                    "type": "string",
                                    "description": "The phone number of the customer."
                                }
                            }
                        },
                        "assigned_department": {
                            "type": "string",
                            "description": "The department to which the ticket should be assigned.",
                            "enum": ["Technical Support", "Customer Service", "Billing", "Sales"]
                        },
                        "attachments": {
                            "type": "array",
                            "description": "An array of file attachments related to the issue.",
                            "items": { "type": "string" }
                        },
                        "related_tickets": {
                            "type": "array",
                            "description": "An array of related ticket IDs.",
                            "items": { "type": "string" }
                        }
                    },
                    "required": [
                        "ticket_id",
                        "issue_summary",
                        "issue_description",
                        "priority_level",
                        "customer_contact_info",
                        "assigned_department"
                    ]
                }
            }
        }
    }
]

In [11]:
content = """Dear Support Team,

I am experiencing an issue with my account login. Every time I try to log in, I receive an error message saying "Invalid credentials." I have tried resetting my password multiple times, but the issue persists. This is causing a significant disruption to my work.

Please find attached screenshots of the error messages.

Regards,
Jane Doe
Email: janedoe@example.com
Phone: 123-456-7890
"""

# Define the message to be sent to the model
message = {
    "role": "user",
    "content": [
        { "text": f"<content>{content}</content>" },
        { "text": "Please use the create_support_ticket tool to generate a support ticket JSON based on the content within the <content> tags." }
    ],
}

In [12]:
response = bedrock_client.converse(
    modelId=model_id,
    messages=[message],
    inferenceConfig={
        "maxTokens": 2000,
        "temperature": 0
    },
    toolConfig={
        "tools": tools  # Correct parameter name and value
    }
)

# Print the entire response for debugging
print("Full Response:")
print(json.dumps(response, indent=4))

# Process the response
response_message = response['output']['message']
response_content_blocks = response_message['content']

# Look for the function call structure instead of 'toolUse'
function_call_block = next((block for block in response_content_blocks if 'text' in block and '<|python_tag|>' in block['text']), None)

if function_call_block:
    # Extract the JSON string from the function call block
    json_string = function_call_block['text'].split('<|python_tag|>')[1]
    
    # Print the raw JSON string for debugging
    print("Raw JSON string:")
    print(json_string)
    
    # Try to clean up the JSON string
    try:
        # Remove any leading/trailing whitespace
        json_string = json_string.strip()
        # Use regex to extract the JSON object
        match = re.search(r'\{.*\}', json_string, re.DOTALL)
        if match:
            json_string = match.group(0)
        
        # Replace escaped quotes with actual quotes
        json_string = json_string.replace('\\"', '"')
        
        # Parse the JSON string
        tool_result_dict = json.loads(json_string)
        print("\nParsed JSON:")
        print(json.dumps(tool_result_dict, indent=4))
    except json.JSONDecodeError as e:
        print(f"\nError parsing JSON: {e}")
        print("JSON string causing the error:")
        print(json_string)
else:
    print("No function call block found.")

Full Response:
{
    "ResponseMetadata": {
        "RequestId": "aa4a57c3-64ca-49a4-8527-e8ffdb1c4278",
        "HTTPStatusCode": 200,
        "HTTPHeaders": {
            "date": "Wed, 14 Aug 2024 01:29:35 GMT",
            "content-type": "application/json",
            "content-length": "906",
            "connection": "keep-alive",
            "x-amzn-requestid": "aa4a57c3-64ca-49a4-8527-e8ffdb1c4278"
        },
        "RetryAttempts": 0
    },
    "output": {
        "message": {
            "role": "assistant",
            "content": [
                {
                    "text": "\n\n<|python_tag|>{\"type\":\"function\",\"name\":\"create_support_ticket\",\"parameters\":{\"customer_contact_info\":{\"name\":\"Jane Doe\",\"phone\":\"123-456-7890\",\"email\":\"janedoe@example.com\"},\"issue_description\":\"I am experiencing an issue with my account login. Every time I try to log in, I receive an error message saying \\\\\"Invalid credentials.\\\\\" I have tried resetting my passwo

### Output 

Here we can see that Llama 3.1 was able to successfully use the toolConfiguration of the expected JSON formatting to create JSON of the support ticket to be created.

## Llama 3.1 Tool usage with Converse API

Amazon Bedrock's Converse API allows you to create conversational applications that can interact with LLaMA 3.1 models for various purposes, including tool usage. This functionality enables the model to call external tools or APIs to fetch real-time data or perform specific tasks, enhancing the model's capabilities.

In [13]:
# Define the tool configuration
toolConfig = {
    "tools": [
        {
            "toolSpec": {
                "name": "movie_showtimes",
                "description": "Fetches movie showtimes for a specified theater and movie.",
                "inputSchema": {
                    "json": {
                        "type": "object",
                        "properties": {
                            "theater": {
                                "type": "string",
                                "description": "The theater name."
                            },
                            "movie": {
                                "type": "string",
                                "description": "The movie title."
                            }
                        },
                        "required": ["theater", "movie"]
                    }
                }
            }
        }
    ]
}

In [14]:
def movie_showtimes(theater, movie):
    showtimes = {
        "Cineplex": {
            "Inception": ["14:00", "17:30", "21:00"],
            "The Godfather": ["15:00", "19:00"],
            "Pulp Fiction": ["16:30", "20:30"]
        },
        "AMC": {
            "Inception": ["13:30", "16:45", "20:15"],
            "The Godfather": ["14:45", "18:30"],
            "Pulp Fiction": ["15:15", "19:45"]
        }
    }
    return showtimes.get(theater, {}).get(movie, "No showtimes found")

def prompt_llama3_1_8b(prompt):
    messages = [{"role": "user", "content": [{"text": prompt}]}]
    converse_api_params = {
        "modelId": model_id,
        "messages": messages,
        "toolConfig": toolConfig,  
        "inferenceConfig": {"temperature": 0.0, "maxTokens": 400},
    }

    response = bedrock_client.converse(**converse_api_params)
    
    if response['output']['message']['content'][0].get('toolUse'):
        tool_use = response['output']['message']['content'][0]['toolUse']
        tool_name = tool_use['name']
        tool_inputs = tool_use['input']

        if tool_name == "movie_showtimes":
            print("Llama 3.1 wants to use the movie_showtimes tool")
            theater = tool_inputs["theater"]
            movie = tool_inputs["movie"]
            
            try:
                result = movie_showtimes(theater, movie)
                print(f"Showtimes for {movie} at {theater}:", result)
            except ValueError as e:
                print(f"Error: {str(e)}")

    else:
        print("Llama 3.1 8b responded with:")
        print(response['output']['message']['content'][0]['text'])

In [15]:
prompt_llama3_1_8b("What are the showtimes for Inception at Cineplex?")

Llama 3.1 wants to use the movie_showtimes tool
Showtimes for Inception at Cineplex: ['14:00', '17:30', '21:00']


---
## Distributors
- Amazon Web Services
- Meta