## Bedrock embedding with cohere: Load testing 

In [63]:
import boto3
import json

# Create a Boto3 client for Bedrock Runtime
bedrock_client = boto3.client(service_name='bedrock-runtime')

def split_into_chunks(file_path, chunk_size):
    with open(file_path, 'r') as file:
        content = file.read()
        chunks = [content[i:i+chunk_size] for i in range(0, len(content), chunk_size)]
        return chunks

# Example usage
file_path = 'doc.txt'
chunk_size = 2048  # Specify the desired chunk size

chunks = split_into_chunks(file_path, chunk_size)
print("Chunks lens: ", len(chunks))
print("Chunks content first item: ", chunks[0])
body = json.dumps({'texts': chunks, 'input_type': "search_document"})

modelId = 'cohere.embed-multilingual-v3'
accept = "*/*"
contentType = 'application/json'


Chunks lens:  20
Chunks content first item:  We introduced Amazon Bedrock to the world a little over a year ago, delivering an entirely new way to build generative artificial intelligence (AI) applications. With the broadest selection of first- and third-party foundation models (FMs) as well as user-friendly capabilities, Amazon Bedrock is the fastest and easiest way to build and scale secure generative AI applications. Now tens of thousands of customers are using Amazon Bedrock to build and scale impressive applications. They are innovating quickly, easily, and securely to advance their AI strategies. And we’re supporting their efforts by enhancing Amazon Bedrock with exciting new capabilities including even more model choice and features that make it easier to select the right model, customize the model for a specific use case, and safeguard and scale generative AI applications.

Customers across diverse industries from finance to travel and hospitality to healthcare to consumer tech

In [64]:
# Invoke the model
response = bedrock_client.invoke_model(
    modelId=modelId,
    accept=accept,
    contentType=contentType,
    body=body
)

# Process the response
response_body = json.loads(response.get('body').read())

### Testing the throughput and lantency with locust

In [59]:
!pip install locust



In [60]:
%%writefile locustfile.py

from locust import User, task, between
import logging

import boto3
import json

# Create a Boto3 client for Bedrock Runtime
bedrock_client = boto3.client(service_name='bedrock-runtime')

def split_into_chunks(file_path, chunk_size):
    with open(file_path, 'r') as file:
        content = file.read()
        chunks = [content[i:i+chunk_size] for i in range(0, len(content), chunk_size)]
        return chunks

# Example usage
file_path = 'doc.txt'
chunk_size = 2048  # Specify the desired chunk size

chunks = split_into_chunks(file_path, chunk_size)
print("Chunks lens: ", len(chunks))
print("Chunks content first item: ", chunks[0])
body = json.dumps({'texts': chunks, 'input_type': "search_document"})

modelId = 'cohere.embed-multilingual-v3'
accept = "*/*"
contentType = 'application/json'

class LLMUser(User):
    @task
    def generation(self):
        # Invoke the model
        with self.environment.events.request.measure("[Send]", "Prompt"):
            response = bedrock_client.invoke_model(
                modelId=modelId,
                accept=accept,
                contentType=contentType,
                body=body
            )
            # Process the response
            response_body = json.loads(response.get('body').read())
            
        logging.info("Finished generation!")            


Overwriting locustfile.py


The configuration with Command Line Options https://docs.locust.io/en/stable/configuration.html

--users <int> Peak number of concurrent Locust users. Primarily used together with --headless or --autostart.
    
--headless Disable the web interface, and start the test immediately.
    
--csv Store request stats to files in CSV format.

--spawn-rate <float> Rate to spawn users at (users per second)

In this example, the --users option sets the total number of users to 30, and the --spawn-rate option sets the rate of user spawning to 30 users per second. By using the same value for --spawn-rate as the total number of users, all 30 users will be spawned immediately. Therefore, at any given time during the test, there will be a maximum of 30 concurrent users.

Please note that the --run-time option sets the duration of the test in seconds. In this example, the test will run for 120 seconds before stopping.

!locust --headless --users 10 --spawn-rate 10 --run-time 120 --csv ./benchmark_metric/benchmark_u30

In [62]:
!locust --headless --users 10 --spawn-rate 10 --run-time 60 --csv ./benchmark_metric/benchmark_u10

Chunks lens:  20
Chunks content first item:  We introduced Amazon Bedrock to the world a little over a year ago, delivering an entirely new way to build generative artificial intelligence (AI) applications. With the broadest selection of first- and third-party foundation models (FMs) as well as user-friendly capabilities, Amazon Bedrock is the fastest and easiest way to build and scale secure generative AI applications. Now tens of thousands of customers are using Amazon Bedrock to build and scale impressive applications. They are innovating quickly, easily, and securely to advance their AI strategies. And we’re supporting their efforts by enhancing Amazon Bedrock with exciting new capabilities including even more model choice and features that make it easier to select the right model, customize the model for a specific use case, and safeguard and scale generative AI applications.

Customers across diverse industries from finance to travel and hospitality to healthcare to consumer tech