# Comparing AWS Bedrock Foundation Model Outputs - Claude Sonnet vs. Claude Haiku
* Notebook by Adam Lang
* Date: 2/27/2025

# Overview
1. Testing head to head the outputs of Claude-3.5-Sonnet vs. Claude-3-Haiku. 
2. Generating keywords for a list of given input keywords. This would be similar to a nearest neighbors search so not direct synonyms.


# Install Dependencies

In [8]:
%%capture
!pip install boto3

In [23]:
import boto3
import json
import asyncio
import random
import pandas as pd 
from botocore.exceptions import ClientError
from IPython.display import display, Markdown
import nest_asyncio

nest_asyncio.apply()

In [10]:
# Set up Bedrock client
bedrock = boto3.client(
    service_name='bedrock-runtime',
    region_name='<your region here>'  # Replace with your region
)

In [None]:
## init boto session
session = boto3.Session()
credentials = session.get_credentials()
print(f"Access Key: {credentials.access_key}")
print(f"Secret Key: {'*' * len(credentials.secret_key)}")
print(f"Region: {session.region_name}")

# Class to run API call to Claude on AWS Bedrock
* We now generate related words instead of just synonyms.
* We can now process a list of keywords and generate related words for each.
* This also includes a built-in method to compare the outputs of `Claude-3.5-Sonnet` and `Claude-3 Haiku`.
* There are also comparison metrics, which we can extend in the `compare_models` method. Here are some suggestions for additional metrics:
    1. Token count: You can implement a function to count tokens in the responses.
    2. Response time: Measure the time taken by each model to generate responses.
    3. Unique words: Count the number of unique words in each model's output.
    4. Semantic similarity: Use a pre-trained word embedding model (e.g., Word2Vec, GloVe) to calculate the average semantic similarity between the input keyword and the generated words.
    5. Part-of-speech diversity: Analyze the diversity of parts of speech in the generated words.
    6. Sentiment analysis: Compare the sentiment of the generated words between models.


* Notes:
1. You can change the `n`, right now i have it set to 5. If you want to change it go to this method in the class:
   * `async def get_synonyms_claude(self, word, n=5):`
2. I set the model `temperature` at 0.5 to introduce some probabilistic or randomness but you can reduce this closer to 0 to make it more deterministic and move it closer to 1 to make it more diverse.

# 1. Generate Related Words from both Claude-3.5-Sonnet and Claude-3-Haiku
* We will demo Claude-3 Haiku on Bedrock and then we will compare it to Sonnet.

In [24]:
## related words generator
class RelatedWordsGenerator:
    def __init__(self, region_name='<your region here>'):
        self.bedrock = boto3.client('bedrock-runtime', region_name=region_name)
        self.claude_sonnet = 'anthropic.claude-3-sonnet-20240229-v1:0' ## sonnet model
        self.claude_haiku = 'anthropic.claude-3-haiku-20240307-v1:0' ## haiku model


    async def invoke_with_retry(self, model_id, body, max_retries=5, initial_delay=1):
        for attempt in range(max_retries):
            try:
                response = await asyncio.to_thread(
                    self.bedrock.invoke_model,
                    modelId=model_id,
                    body=body
                )
                return response
            except ClientError as e:
                if e.response['Error']['Code'] == 'ThrottlingException':
                    delay = (2 ** attempt) + random.uniform(0, 1)
                    print(f"Request throttled. Retrying in {delay:.2f} seconds...")
                    await asyncio.sleep(delay)
                else:
                    raise
        raise Exception("Max retries reached")


    async def get_related_words(self, word, n=10, model_id=None):
        if model_id is None:
            model_id = self.claude_sonnet

        prompt = f"""Generate {n} related words for the keyword "{word}".
        Include synonyms and words that might be found in a nearest neighbor search.
        Provide only the related words as a comma-separated list, without any additional text or explanation."""

        body = json.dumps({
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": 300,
            "messages": [{"role": "user", "content": prompt}],
            "temperature": 0.7,
            "top_p": 1,
        })

        try:
            response = await self.invoke_with_retry(model_id, body)
            response_body = json.loads(response['body'].read())
            related_words = response_body['content'][0]['text'].strip().split(', ')
            return related_words[:n]  # Ensure we return at most n related words
        except Exception as e:
            print(f"Error in API call: {str(e)}")
            return []


    async def process_keyword_list(self, keywords, n=10, model_id=None):
        results = {}
        for keyword in keywords:
            related_words = await self.get_related_words(keyword, n, model_id)
            results[keyword] = related_words
        return results


    async def display_related_words(self, results):
        for keyword, related_words in results.items():
            markdown_output = f"## Related words for \"{keyword}\"\n\n" + "\n".join(f"- {word}" for word in related_words)
            display(Markdown(markdown_output))

    async def generate_comparisons(self, keywords, n=10):
        sonnet_results = await self.process_keyword_list(keywords, n, self.claude_sonnet)
        haiku_results = await self.process_keyword_list(keywords, n, self.claude_haiku)
        return sonnet_results, haiku_results

## Utilize RelatedWordsGenerator

In [25]:
# init keyword generator
generator = RelatedWordsGenerator()
## setup list of keywords
keywords = ["Integrity", "Innovation", "Customer Focus", 
            "Excellence", "Collaboration", "Respect", 
            "Accountability", "Sustainability", "Agility", 
            "Community", "AI"]


In [26]:
## function to create dataframe from LLM results dictionary 
def create_dataframe(sonnet_results, haiku_results):
    data = []
    for keyword in sonnet_results.keys():
        sonnet_words = sonnet_results[keyword]
        haiku_words = haiku_results[keyword]
        
        # Ensure both lists have the same length
        max_length = max(len(sonnet_words), len(haiku_words))
        sonnet_words = sonnet_words + [''] * (max_length - len(sonnet_words))
        haiku_words = haiku_words + [''] * (max_length - len(haiku_words))
        
        for i in range(max_length):
            data.append({
                'Keyword': keyword,
                'Model': 'Sonnet',
                'Related Word': sonnet_words[i],
                'Word Index': i + 1
            })
            data.append({
                'Keyword': keyword,
                'Model': 'Haiku',
                'Related Word': haiku_words[i],
                'Word Index': i + 1
            })
    
    df = pd.DataFrame(data)
    return df

# Generate the results
sonnet_results, haiku_results = await generator.generate_comparisons(keywords)

# Create the output DataFrame
df = create_dataframe(sonnet_results, haiku_results)

# Display the DataFrame
print(df)

# You can also save the results to a CSV file if needed
# df.to_csv('related_words_comparison.csv', index=False)

       Keyword   Model                 Related Word  Word Index
0    Integrity  Sonnet                      honesty           1
1    Integrity   Haiku                      Honesty           1
2    Integrity  Sonnet                       ethics           2
3    Integrity   Haiku                     Morality           2
4    Integrity  Sonnet                   principles           3
..         ...     ...                          ...         ...
215         AI   Haiku  natural language processing           8
216         AI  Sonnet                deep learning           9
217         AI   Haiku                     big data           9
218         AI  Sonnet                    cognitive          10
219         AI   Haiku                    analytics          10

[220 rows x 4 columns]


In [60]:
df.head()

Unnamed: 0,Keyword,Model,Related Word,Word Index
0,Integrity,Sonnet,honesty,1
1,Integrity,Haiku,Honesty,1
2,Integrity,Sonnet,ethics,2
3,Integrity,Haiku,Morality,2
4,Integrity,Sonnet,principles,3


In [30]:
## results as a pandas pivot table
def create_pivot_table(sonnet_results, haiku_results):
    data = []
    for keyword in sonnet_results.keys():
        sonnet_words = sonnet_results[keyword]
        haiku_words = haiku_results[keyword]
        
        # Ensure both lists have the same length
        max_length = max(len(sonnet_words), len(haiku_words))
        sonnet_words = sonnet_words + [''] * (max_length - len(sonnet_words))
        haiku_words = haiku_words + [''] * (max_length - len(haiku_words))
        
        for i in range(max_length):
            data.append({
                'Keyword': keyword,
                'Sonnet': sonnet_words[i],
                'Haiku': haiku_words[i],
                'Word Index': i + 1
            })
    
    df = pd.DataFrame(data)
    
    # Create pivot table
    pivot_df = df.pivot(index='Keyword', columns='Word Index', values=['Sonnet', 'Haiku'])
    
    # Flatten column multi-index
    pivot_df.columns = [f'{col[0]}_{col[1]}' for col in pivot_df.columns]
    
    return pivot_df

# # Generate the results
# sonnet_results, haiku_results = await generator.generate_comparisons(keywords)

# Create the pivot table
pivot_df = create_pivot_table(sonnet_results, haiku_results)

# Display the pivot table
print(pivot_df)

# You can also save it to a CSV file if needed
# pivot_df.to_csv('related_words_comparison_pivot.csv')

                               Sonnet_1          Sonnet_2  \
Keyword                                                     
AI              artificial intelligence  machine learning   
Accountability           Responsibility     Answerability   
Agility                      Nimbleness         Dexterity   
Collaboration                  Teamwork       Cooperation   
Community                  neighborhood           society   
Customer Focus           Client-centric   User Experience   
Excellence                   Perfection           Quality   
Innovation                   Creativity         Invention   
Integrity                       honesty            ethics   
Respect                           honor           dignity   
Sustainability              Environment      eco-friendly   

                           Sonnet_3               Sonnet_4  \
Keyword                                                      
AI                  neural networks             algorithms   
Accountability      

In [69]:
pivot_df.T.to_csv('keywords_claude_vs_haiku.csv',index=True)

# 2. Compare Sonnet vs. Haiku model outputs
* For measuring token counts, the exact tokenization method used by Claude models isn't publicly available. 
* Anthropic, the company behind Claude, hasn't released an official tokenizer for their models. They often suggest using tiktoken with the "cl100k_base" encoding as the closest approximation so we will use that to count tokens.

In [50]:
%%capture 
!pip install sentence-transformers nltk scikit-learn tiktoken

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


In [51]:
## download nltk modules
import nltk
nltk.download('punkt')
nltk.download('punkt_tab')
nltk.download('averaged_perceptron_tagger')
nltk.download('averaged_perceptron_tagger_eng')
nltk.download('stopwords')

[nltk_data] Downloading package punkt to /home/sagemaker-
[nltk_data]     user/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to /home/sagemaker-
[nltk_data]     user/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /home/sagemaker-user/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     /home/sagemaker-user/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger_eng is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package stopwords to /home/sagemaker-
[nltk_data]     user/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [53]:
## additional imports 
import time
from collections import Counter
import numpy as np

## NLP imports
from nltk import pos_tag, word_tokenize
from nltk.corpus import stopwords
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
import tiktoken ## we will use this instead of the nltk tokenizer

In [65]:
## model comparison with totals
class ModelComparator:
    def __init__(self):
        self.sentence_transformer = SentenceTransformer('all-MiniLM-L6-v2')
        self.stop_words = set(stopwords.words('english'))
        self.claude_sonnet = 'anthropic.claude-3-sonnet-20240229-v1:0'
        self.claude_haiku = 'anthropic.claude-3-haiku-20240307-v1:0'
        self.tokenizer = tiktoken.get_encoding("cl100k_base")

    def estimate_tokens(self, text):
        return len(self.tokenizer.encode(text))

    def calculate_token_cost(self, token_count, model):
        rates = {
            self.claude_sonnet: 0.00001563,
            self.claude_haiku: 0.00000735
        }
        return token_count * rates[model]

    def calculate_cosine_similarity(self, text1, text2):
        embedding1 = self.sentence_transformer.encode([text1])[0]
        embedding2 = self.sentence_transformer.encode([text2])[0]
        return cosine_similarity([embedding1], [embedding2])[0][0]

    def compare_outputs(self, sonnet_results, haiku_results):
        comparison_results = {}
        total_sonnet_tokens = 0
        total_haiku_tokens = 0
        total_sonnet_cost = 0
        total_haiku_cost = 0
        total_cosine_similarity = 0

        for keyword in sonnet_results.keys():
            sonnet_words = sonnet_results[keyword]
            haiku_words = haiku_results[keyword]

            sonnet_text = " ".join(sonnet_words)
            haiku_text = " ".join(haiku_words)

            # Estimated token counts
            sonnet_estimated_tokens = self.estimate_tokens(sonnet_text)
            haiku_estimated_tokens = self.estimate_tokens(haiku_text)

            # Estimated Token count and cost
            sonnet_cost = self.calculate_token_cost(sonnet_estimated_tokens, self.claude_sonnet)
            haiku_cost = self.calculate_token_cost(haiku_estimated_tokens, self.claude_haiku)

            # Cosine similarity
            cosine_sim = self.calculate_cosine_similarity(sonnet_text, haiku_text)

            # Update totals
            total_sonnet_tokens += sonnet_estimated_tokens
            total_haiku_tokens += haiku_estimated_tokens
            total_sonnet_cost += sonnet_cost
            total_haiku_cost += haiku_cost
            total_cosine_similarity += cosine_sim

            comparison_results[keyword] = {
                "sonnet": {
                    "words": sonnet_words,
                    "estimated_tokens": sonnet_estimated_tokens,
                    "estimated_cost": sonnet_cost,
                },
                "haiku": {
                    "words": haiku_words,
                    "estimated_tokens": haiku_estimated_tokens,
                    "estimated_cost": haiku_cost,
                },
                "cosine_similarity": cosine_sim
            }

        num_keywords = len(sonnet_results)
        avg_cosine_similarity = total_cosine_similarity / num_keywords

        summary = {
            "total_sonnet_tokens": total_sonnet_tokens,
            "total_haiku_tokens": total_haiku_tokens,
            "total_sonnet_cost": total_sonnet_cost,
            "total_haiku_cost": total_haiku_cost,
            "avg_cosine_similarity": avg_cosine_similarity
        }

        return comparison_results, summary

    def display_comparison(self, comparison_results, summary):
        print("Detailed Comparison:")
        for keyword, results in comparison_results.items():
            print(f"\nKeyword: {keyword}")
            print(f"Sonnet words: {results['sonnet']['words']}")
            print(f"Haiku words: {results['haiku']['words']}")
            print(f"Sonnet estimated tokens: {results['sonnet']['estimated_tokens']}")
            print(f"Haiku estimated tokens: {results['haiku']['estimated_tokens']}")
            print(f"Sonnet estimated cost: ${results['sonnet']['estimated_cost']:.6f}")
            print(f"Haiku estimated cost: ${results['haiku']['estimated_cost']:.6f}")
            print(f"Cosine similarity: {results['cosine_similarity']:.4f}")

        print("\nSummary:")
        print(f"Total Sonnet tokens: {summary['total_sonnet_tokens']}")
        print(f"Total Haiku tokens: {summary['total_haiku_tokens']}")
        print(f"Total Sonnet cost: ${summary['total_sonnet_cost']:.6f}")
        print(f"Total Haiku cost: ${summary['total_haiku_cost']:.6f}")
        print(f"Average cosine similarity: {summary['avg_cosine_similarity']:.4f}")


In [66]:
## get results
comparator = ModelComparator()
comparison_results, summary = comparator.compare_outputs(sonnet_results, haiku_results)

# Display the detailed comparison and summary
comparator.display_comparison(comparison_results, summary)

Detailed Comparison:

Keyword: Integrity
Sonnet words: ['honesty', 'ethics', 'principles', 'morals', 'values', 'character', 'virtue', 'righteousness', 'uprightness', 'probity.']
Haiku words: ['Honesty', 'Morality', 'Ethics', 'Virtue', 'Principle', 'Sincerity', 'Trustworthiness', 'Uprightness', 'Credibility', 'Accountability']
Sonnet estimated tokens: 15
Haiku estimated tokens: 23
Sonnet estimated cost: $0.000234
Haiku estimated cost: $0.000169
Cosine similarity: 0.7981

Keyword: Innovation
Sonnet words: ['Creativity', 'Invention', 'Ingenuity', 'Advancement', 'Modernization', 'Novelty', 'Transformation', 'Disruption', 'Breakthrough', 'Pioneering.']
Haiku words: ['Creativity', 'Invention', 'Disruption', 'Transformation', 'Advancement', 'Modernization', 'Breakthrough', 'Ingenuity', 'Originality', 'Pioneering']
Sonnet estimated tokens: 21
Haiku estimated tokens: 20
Sonnet estimated cost: $0.000328
Haiku estimated cost: $0.000147
Cosine similarity: 0.9310

Keyword: Customer Focus
Sonnet wor