Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional load testing recommendations #12974

Merged
merged 5 commits into from
Jan 5, 2024
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions docs/docs/monitoring/load-testing-guidelines.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,29 @@ In our tests we used the Rasa [HTTP-API](https://rasa.com/docs/rasa/pages/http-a
| Up to 50,000 | 6vCPU | 16 GB |
| Up to 80,000 | 6vCPU, with almost 90% CPU usage | 16 GB |

:::info This is the most optimal AWS setup tested on EKS with
sanchariGr marked this conversation as resolved.
Show resolved Hide resolved

ec2: c5.2xlarge - 9.2rps/node throughput
ec2: c5.4xlarge - 19.5rps/node throughput
You can always choose a bigger compute efficient instance like c5.4xlarge with more CPU per node to maximize throughput per node

:::

| AWS | RasaPro | Rasa Action Server |
|--------------------------|----------------------------------------------|-------------------------------------------|
| EC2: C52xlarge | 3vCPU, 10Gb Memory, 3 Sanic Threads | 3vCPU, 2Gb Memory, 3 Sanic Threads |
| EC2: C54xlarge | 7vCPU, 16Gb Memory, 7 Sanic Threads | 7vCPU, 12Gb Memory, 7 Sanic Threads |

### Some recommendations to improve latency
- Running action as a sidecar, saves about ~100ms on average trips from the action server on the concluded tests. Results may vary depending on the number of calls made to the action server.
sanchariGr marked this conversation as resolved.
Show resolved Hide resolved
- Sanic Workers must be mapped 1:1 to CPU for both Rasa Pro and Rasa Action Server
- Create `async` actions to avoid any blocking I/O
- Use KEDA for pre-emptive autoscaling of rasa pods in production based on http requests
sanchariGr marked this conversation as resolved.
Show resolved Hide resolved
- `enable_selective_domain: true` : Domain is only sent for actions that needs it. This massively trims the payload between the two pods.
- Consider using c5n.nxlarge machines which are more compute optimized and support better parallelization on http requests.
sanchariGr marked this conversation as resolved.
Show resolved Hide resolved
However, as they are low on memory, models need to be trained lightweight.
Not suitable if you want to run transformers
sanchariGr marked this conversation as resolved.
Show resolved Hide resolved


### Debugging bot related issues while scaling up

Expand Down