SageMaker Endpoint Load Testing

SageMaker Endpoint Load Testing using Locust

Setup Instructions:

Create a Python virtual or conda environment or use native Python on your desktop
Using your terminal
$ pip install locust or $ pip install -r requirements.txt
Set your parameters in config.py
Start the locust server:
$ locust --host=http://localhost:8080 --locustfile=locustfile.py
Access Locust UI running at this location: http://localhost:8089/

This document encompasses all the instructions to calibrate a SageMaker model endpoint using Locust. Also, it includes pointers to help pick the optimal strategy for auto scaling.

Locust Resources:

In the Locust UI, we need to set 2 main parameters:

The number of users = the number of users testing your application. Each user opens a TCP connection to your application and tests it.
Hatch or spawn rate: For each second, how many users will be added to the current pool of users until the total amount of users is reached. At each hatch, Locust calls the on_start function if you have one defined in the locustfile.py.

Example:

Number of users: 1000
Hatch rate: 10
For every second, 10 users will be added to current pool of users starting from 0. So in 100 seconds, you will have 1000 users. Locust tries to distribute the load equally for each user.

Basically, requests per second (RPS) or throughput indicates the number of transactions per second your application can handle. And response time or latency is the amount of time from the moment that a user sends a request until the time that your application indicates that the request has completed.

The overall throughput tends to decrease as you increase the response time for an average transaction. The reason is, after sending the 1st request, locust needs to wait (blocking) until the request is completed or processed before sending the 2nd request.

In the Locust UI, we can track the following parameters:

Request — Total number of requests made so far
Fails — Number of requests that have failed
Median — Response speed for 50 percentile in ms
90%ile — Response speed for 90 percentile in ms
Average — Average response speed in ms
Min — Minimum response speed in ms
Max — Maximum response speed in ms
Average bytes — Average response size in bytes
Current RPS — Current requests per second
Current Failure/s — Total number of failures per second

In Locust, each simulated user does the following:

Pick one of the tasks from your locust file
Run the task (execute that task function)
Pick a random wait time between min_wait and max_wait (specified in your locustfile.py)
Wait that amount of time
Repeat from 1

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
mme		mme
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
authorizer.py		authorizer.py
config.py		config.py
locustfile.py		locustfile.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mme

mme

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

authorizer.py

authorizer.py

config.py

config.py

locustfile.py

locustfile.py

requirements.txt

requirements.txt

Repository files navigation

SageMaker Endpoint Load Testing

Setup Instructions:

Locust Resources:

About

Releases

Packages

Languages

License

arunprsh/SageMaker-Load-Testing

Folders and files

Latest commit

History

Repository files navigation

SageMaker Endpoint Load Testing

Setup Instructions:

Locust Resources:

About

Resources

License

Stars

Watchers

Forks

Languages