### Testing the throughput and lantency with provisioned throughput

Throughput refers to the number and rate of inputs and outputs that a model processes and returns. You can purchase Provisioned Throughput to provision a higher level of throughput for a model at a fixed cost.

The number of Model Units (MUs) that you specify for the Provisioned Throughput. An MU delivers a specific throughput level for the specified model. The throughput level of an MU specifies: The number of input and output tokens that an MU can process across all requests within a span of one minute.

<div align="center">
    <img src="image/provisioned_throughput.png" />
</div>

<div align="center">
    <img src="image/provisioned_throughput_details.png" />
</div>

In [4]:
modelId = 'arn:aws:bedrock:us-east-1:70768*******:provisioned-model/nnp8ar503q42'

Run inference using a Provisioned Throughput https://docs.aws.amazon.com/bedrock/latest/userguide/prov-thru-code-samples.html

In [5]:
import logging

import boto3
import json

max_tokens_to_sample = 200

system_message = f"You are a long and high-quality story teller. Make the story longer than {max_tokens_to_sample}"

messages = [
  {"role": "user", "content": """
  Rex and Charlie were best friends who did everything together. 
  They lived next door to each other with their human families and spent all day playing in the backyard. 
  Rex was a golden retriever, always happy and eager for fun. Charlie was a German shepherd, more serious but very loyal.  
  Every morning, Rex and Charlie would wake up and bark excitedly, ready to start the day's adventures. 
  Their families would let them out into the backyard and they'd run around chasing each other and sniffing for interesting smells. 
  After tiring themselves out, they'd nap in the shade of the big oak tree, Rex's tail still thumping contentedly even in his sleep. 
  """
  }
]


In [6]:
print(system_message)
print(messages)

You are a long and high-quality story teller. Make the story longer than 200
[{'role': 'user', 'content': "\n  Rex and Charlie were best friends who did everything together. \n  They lived next door to each other with their human families and spent all day playing in the backyard. \n  Rex was a golden retriever, always happy and eager for fun. Charlie was a German shepherd, more serious but very loyal.  \n  Every morning, Rex and Charlie would wake up and bark excitedly, ready to start the day's adventures. \n  Their families would let them out into the backyard and they'd run around chasing each other and sniffing for interesting smells. \n  After tiring themselves out, they'd nap in the shade of the big oak tree, Rex's tail still thumping contentedly even in his sleep. \n  "}]


In [7]:
# Create a Boto3 client for Bedrock Runtime
bedrock_client = boto3.client(service_name='bedrock-runtime')

# Define the prompt and other parameters
accept = 'application/json'
contentType = 'application/json'

In [8]:
# Invoke the model
response = bedrock_client.invoke_model(
    modelId=modelId,
    accept=accept,
    contentType=contentType,
    body=json.dumps({
        "anthropic_version": "bedrock-2023-05-31", 
        "messages": messages, 
        "system": system_message,    
        "max_tokens": 300, 
        "temperature": 0.7, 
        "top_p": 0.9})
)


# Process the response
response_body = json.loads(response.get('body').read())
print(response_body['content'][0]['text'])

Here is a longer version of the story about Rex and Charlie, the best friend dogs:

Rex and Charlie were best friends who did everything together. They lived next door to each other with their human families and spent all day, every day playing in the backyard. Rex was a golden retriever, always happy and eager for fun and adventure. Charlie was a German shepherd, more serious and focused but extremely loyal to his goofy pal.

Every morning, before the sun was fully up, Rex and Charlie would wake up and start barking excitedly, ready to begin the day's adventures and shenanigans. Their families would stumble out of bed, smiling at the pups' boundless energy, and let them out into the big backyard. Rex and Charlie would burst through the dog doors and immediately break into a spirited game of chase, running circles around the yard, kicking up grass and dirt. 

After wearing themselves out from their morning zoomies, they'd plop down in the cool shade of the giant oak tree in the corner 

### Load testing with Locust

In [1]:
!pip install locust



In [2]:
%%writefile locustfile.py

from locust import User, task, between
import logging

import boto3
import json

modelId = 'arn:aws:bedrock:us-east-1:70768*******:provisioned-model/nnp8ar503q42'
max_tokens_to_sample = 200

system_message = f"You are a long and high-quality story teller. Make the story longer than {max_tokens_to_sample}"

messages = [
  {"role": "user", "content": """
  Rex and Charlie were best friends who did everything together. 
  They lived next door to each other with their human families and spent all day playing in the backyard. 
  Rex was a golden retriever, always happy and eager for fun. Charlie was a German shepherd, more serious but very loyal.  
  Every morning, Rex and Charlie would wake up and bark excitedly, ready to start the day's adventures. 
  Their families would let them out into the backyard and they'd run around chasing each other and sniffing for interesting smells. 
  After tiring themselves out, they'd nap in the shade of the big oak tree, Rex's tail still thumping contentedly even in his sleep. 
  """
  }
]

# Create a Boto3 client for Bedrock Runtime
bedrock_client = boto3.client(service_name='bedrock-runtime')

# Define the prompt and other parameters
accept = 'application/json'
contentType = 'application/json'


class LLMUser(User):
    @task
    def generation(self):
        # Invoke the model
        with self.environment.events.request.measure("[Send]", "Prompt"):
            # Invoke the model
            response = bedrock_client.invoke_model(
                modelId=modelId,
                accept=accept,
                contentType=contentType,
                body=json.dumps({
                    "anthropic_version": "bedrock-2023-05-31", 
                    "messages": messages, 
                    "system": system_message,    
                    "max_tokens": 300, 
                    "temperature": 0.7, 
                    "top_p": 0.9})
            )


            # Process the response
            response_body = json.loads(response.get('body').read())
            print(response_body['content'][0]['text'])
            
        logging.info("Finished generation!")            

Overwriting locustfile.py


The configuration with Command Line Options https://docs.locust.io/en/stable/configuration.html

--users <int> Peak number of concurrent Locust users. Primarily used together with --headless or --autostart.
    
--headless Disable the web interface, and start the test immediately.
    
--csv Store request stats to files in CSV format.

--spawn-rate <float> Rate to spawn users at (users per second)

In this example, the --users option sets the total number of users to 30, and the --spawn-rate option sets the rate of user spawning to 30 users per second. By using the same value for --spawn-rate as the total number of users, all 30 users will be spawned immediately. Therefore, at any given time during the test, there will be a maximum of 30 concurrent users.

Please note that the --run-time option sets the duration of the test in seconds. In this example, the test will run for 120 seconds before stopping.

!locust --headless --users 30 --spawn-rate 30 --run-time 120 --csv ./benchmark_metric/benchmark_u30

In [10]:
!locust --headless --users 30 --spawn-rate 30 --run-time 240 --csv ./benchmark_metric/benchmark_u30

[2024-04-23 03:10:30,658] ip-172-16-187-106.ec2.internal/INFO/locust.main: Run time limit set to 240 seconds
[2024-04-23 03:10:30,658] ip-172-16-187-106.ec2.internal/INFO/locust.main: Starting Locust 2.26.0
Type     Name  # reqs      # fails |    Avg     Min     Max    Med |   req/s  failures/s
--------||-------|-------------|-------|-------|-------|-------|--------|-----------
--------||-------|-------------|-------|-------|-------|-------|--------|-----------
         Aggregated       0     0(0.00%) |      0       0       0      0 |    0.00        0.00

[2024-04-23 03:10:30,659] ip-172-16-187-106.ec2.internal/INFO/locust.runners: Ramping to 30 users at a rate of 30.00 per second
[2024-04-23 03:10:41,086] ip-172-16-187-106.ec2.internal/INFO/root: Finished generation!
Type     Name  # reqs      # fails |    Avg     Min     Max    Med |   req/s  failures/s
--------||-------|-------------|-------|-------|-------|-------|--------|-----------
[Send]   Prompt      30     0(0.00%) |   8935  