# Step 2 - Optimize Prompt Using Bedrock Prompt Optimizer

## Introduction

In this step, we'll tackle a critical aspect of successful model migration: **prompt optimization**. Different models respond best to different prompt structures and instructions. Simply reusing your existing prompts with new models often leads to suboptimal performance.

### Why Prompt Optimization Matters

When migrating between models, especially across different model families (e.g., from OpenAI to Claude or Llama), the same prompt can produce dramatically different results.

### Bedrock Prompt Optimizer

For this workshop, we'll use [Amazon Bedrock Prompt Optimizer](https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-management-optimize.html), an automated tool that:

1. Analyzes your source prompt structure and intent
2. Reformats it to best leverage the target model's capabilities
3. Optimizes instructions for clarity and effectiveness
4. Preserves the core functionality while improving performance

This saves hours of manual prompt engineering while producing more reliable results. Let's see it in action!

## Setting Up Our Environment

First, we'll import the necessary libraries for working with AWS services, data manipulation, and prompt optimization. These tools will enable us to interact with Bedrock services and maintain our evaluation tracking system from the previous step.


In [2]:
import boto3
import random
import json
import sys
import pandas as pd
from IPython.display import display


## Retrieving Our Progress

Let's load our tracking dataframe from Step 1, which contains information about our source model and the candidate models we'll be evaluating. This dataframe will serve as our central repository for all evaluation metrics throughout the workshop.


In [8]:

### load our tracking df

evaluation_tracking_file = '../data/evaluation_tracking.csv'
evaluation_tracking = pd.read_csv(evaluation_tracking_file)
display(evaluation_tracking)

Unnamed: 0,model,model_clean_name
0,source_model,source_model
1,amazon.nova-lite-v1:0,amazon.nova-lite-v1-0
2,us.anthropic.claude-3-5-haiku-20241022-v1:0,us.anthropic.claude-3-5-haiku-20241022-v1-0


## Understanding the Source Prompt

### Analyzing Our Starting Point

Before we can optimize prompts for our candidate models, we need to understand the prompt structure used by our source model. This prompt defines:

1. How we present input context to the model
2. What specific task we're asking the model to perform
3. Any constraints or formatting requirements

Let's first prepare our tracking dataframe to store prompts for each model, then examine the source model's prompt:


In [9]:
## Let's prepare prompts for each model
evaluation_tracking['text_prompt'] = ''
evaluation_tracking['region'] = 'us-east-1' ## set it default to us-east-1
evaluation_tracking['inference_profile'] = 'standard' ## standard or optimized

In [10]:
## add the source_model prompt

raw_prompt = """
First, please read the article below.
{context}
 Now, can you write me an extremely short abstract for it?
"""


evaluation_tracking.loc[evaluation_tracking['model'] == 'source_model', 'text_prompt'] = raw_prompt
display(evaluation_tracking)

Unnamed: 0,model,model_clean_name,text_prompt,region,inference_profile
0,source_model,source_model,"\nFirst, please read the article below.\n{cont...",us-east-1,standard
1,amazon.nova-lite-v1:0,amazon.nova-lite-v1-0,,us-east-1,standard
2,us.anthropic.claude-3-5-haiku-20241022-v1:0,us.anthropic.claude-3-5-haiku-20241022-v1-0,,us-east-1,standard


## Automated Prompt Optimization

Below, we'll define a function that calls the Bedrock Prompt Optimizer API to transform our source prompt for each candidate model. This function:

- Takes a target model ID and our source prompt as inputs
- Calls the Bedrock Prompt Optimizer API
- Processes the response to extract the optimized prompt
- Formats the result for use in our evaluation


In [11]:
def optimize_prompt(target_model_id, prompt, region, inference_profile):
    # Create input structure
    input_structure = {
        "textPrompt": {
            "text": prompt
        }
    }
    try:
        # Make the API call
        response = client.optimize_prompt(
            input=input_structure,
            targetModelId=target_model_id
        )       
        
        # Print request ID for reference
        print("Request ID:", response.get("ResponseMetadata").get("RequestId"))        
        # Print the original prompt
        #print("========================== INPUT PROMPT ======================\n")
        #print(prompt)
        
        # Process the response stream and extract the optimized prompt
        optimized_prompt = None
        try:
            event_stream = response['optimizedPrompt']
            for event in event_stream:
                if 'optimizedPromptEvent' in event:
                    print(f"========================== OPTIMIZED PROMPT FOR MODEL {target_model_id} ======================\n")
                    response = event['optimizedPromptEvent']['optimizedPrompt']['textPrompt']['text']

                   # if response.startswith('"') and text.endswith('"'):
                    optimized_prompt = json.loads(response)
                    
                    ## we will not use the prompt variables, because we need the full prompts for llm-as-a-judge evaluation in step3
                    optimized_prompt = optimized_prompt.replace("{{", "{").replace("}}", "}")

                    #print(optimized_prompt)
                    # Return the optimized prompt                    
                    return optimized_prompt
            
        except Exception as e:
            print(f"Error processing response stream: {e}")
            return None
            
    except Exception as e:
        print(f"Error calling optimize_prompt API: {e}")
        return None



### Applying Optimization to Our Candidate Models

Now that we've defined our optimization function, let's apply it to each candidate model in our tracking dataframe. For each model:

1. We'll extract the model ID (handling cross-region models appropriately)
2. Send our source prompt to the Prompt Optimizer API
3. Store the optimized prompt in our tracking dataframe

This process will automatically tailor our summarization prompt for each model's specific capabilities and preferred instruction format.


In [12]:
## let's get prompt optimized for our candidate models
client = boto3.client('bedrock-agent-runtime')

for index, row in evaluation_tracking.iterrows():
    model_id = row['model']
    if row['model'] == "source_model":
        continue
    if "us." in row['model']:
        model_id = row['model'].replace("us.", "") ## us. prefix is for cross-region inference
        #print(f"modelID: {model_id}")
    optimized_prompt = optimize_prompt(model_id, raw_prompt, "us-east-1", "standard")
    evaluation_tracking.loc[evaluation_tracking['model'] == row['model'], 'text_prompt'] = optimized_prompt
    
    
#display(evaluation_tracking)
    

Request ID: 35f59fb8-253f-4d9d-9a6a-bfc3f45c0678

Request ID: 4c1a402a-1403-4dc2-8ff0-38a762bb087e



In [13]:
# check optimized prompt

for index, row in evaluation_tracking.iterrows():
    print(f"model: {row['model']}")
    print(f"prompt: \n{row['text_prompt']}")
    print("\n" + "="*80)

model: source_model
prompt: 

First, please read the article below.
{context}
 Now, can you write me an extremely short abstract for it?


model: amazon.nova-lite-v1:0
prompt: 
## Instruction
Your task is to read the given article in the {context} section and provide an extremely short abstract or summary of the key points in 1-2 concise sentences.

To produce a high-quality abstract, please follow these guidelines:

### Summarization Guidelines
- Read the article carefully to fully understand its content and main ideas.
- Identify the core topics, arguments, and essential information presented.
- Synthesize the key points into 1-2 brief sentences that capture the essence of the article.
- Use clear and concise language to convey the main takeaways succinctly.
- Avoid extraneous details and focus solely on the critical concepts.

### Article to Summarize
{context}

Please provide your extremely short abstract immediately without any preamble:

model: us.anthropic.claude-3-5-haiku-20241

## Saving Our Progress

With our prompts optimized for each model, let's save our updated tracking dataframe. This will ensure our optimized prompts are available for the next steps in our evaluation process.

In the upcoming notebooks, we'll use these optimized prompts to:
1. Generate responses from each candidate model
2. Evaluate response quality
3. Compare results across models to guide our migration decision

In [14]:
## saving the tracking

evaluation_tracking.to_csv(evaluation_tracking_file, index=False)

## Summary and Next Steps

In this notebook, we've successfully:

1. ✅ **Analyzed** our source model's prompt structure
2. ✅ **Leveraged** Bedrock Prompt Optimizer to automatically adapt our prompt for each candidate model
3. ✅ **Compared** the differences in prompt structure and instruction style between models
4. ✅ **Prepared** optimized prompts for our evaluation process

### Key Takeaways

- Different models respond best to different prompt structures
- Automated tools like Bedrock Prompt Optimizer can significantly reduce the effort required for prompt migration
- Effective prompts often include clear task definitions, structured instructions, and specific formatting guidance
- Optimal prompts can vary dramatically between models, even for identical tasks

### Important Considerations for Real-World Applications

For this workshop, we've simplified the prompt optimization process to focus on the core evaluation workflow. In production scenarios, remember that:

- **Prompt optimization is iterative**: The first optimized prompt is rarely the final one. In practice, you would:
  - Test multiple prompt variations
  - Analyze performance on different data samples
  - Refine based on error patterns 
  
- **Consider hybrid approaches**:
  - Start with automated optimization 
  - Follow with human-in-the-loop refinement by domain experts
  - Incorporate few-shot examples for complex tasks
  - Add specific instructions to address error patterns

- **Implement continuous optimization**:
  - Establish a prompt version control system
  - Regularly test prompt variations as models are updated
  - Create an A/B testing framework for prompt comparison
  - Document which prompts work best for which scenarios

These practices ensure your prompts evolve alongside model capabilities and your business needs, maintaining optimal performance over time. We recommend reading this article on [Improve the performance of your Generative AI applications with Prompt Optimization on Amazon Bedrock](https://aws.amazon.com/blogs/machine-learning/improve-the-performance-of-your-generative-ai-applications-with-prompt-optimization-on-amazon-bedrock/) to learn more about Prompt Optimization techniques. 

In the next notebook, we'll use these optimized prompts to generate responses from our candidate models and measure their performance characteristics, particularly focusing on **latency metrics**.