Additional load testing recommendations

RasaHQ · Dec 11, 2023 · d56545c · d56545c
1 parent 7cfc7d0
commit d56545c
Showing 1 changed file with 21 additions and 0 deletions.
diff --git a/docs/docs/monitoring/load-testing-guidelines.mdx b/docs/docs/monitoring/load-testing-guidelines.mdx
@@ -17,6 +17,27 @@ In our tests we used the Rasa [HTTP-API](https://rasa.com/docs/rasa/pages/http-a
 | Up to 50,000             |         6vCPU                                |      16 GB    |
 | Up to 80,000             |         6vCPU, with almost 90% CPU usage     |      16 GB    |
 
+::: This is the most optimal AWS setup tested on EKS with
+    ec2: c5.2xlarge - 9.2rps/node throughput
+    ec2: c5.4xlarge - 19.5rps/node throughput
+    You can always choose a bigger compute efficient instance like c5.4xlarge with more CPU per node to maximize throughput per node
+:::
+
+|        AWS               |               RasaPro                        |      Rasa Action Server                   |
+|--------------------------|----------------------------------------------|-------------------------------------------|
+| EC2: C52xlarge           |         3vCPU, 10Gb Memory, 3 Sanic Threads  |      3vCPU, 2Gb Memory, 3 Sanic Threads   |
+| EC2: C54xlarge           |         7vCPU, 16Gb Memory, 7 Sanic Threads  |      7vCPU, 12Gb Memory, 7 Sanic Threads  |
+
+### Some recommendations to improve latency
+- Running action as a sidecar, saves about ~100ms on average trips from the action server on the concluded tests. Results may vary depending on the number of calls made to the action server.
+- Sanic Workers must be mapped 1:1 to CPU for both Rasa Pro and Rasa Action Server
+- Create `async` actions to avoid any blocking I/O
+- Use KEDA for pre-emptive autoscaling of rasa pods in production based on http requests
+- `enable_selective_domain: true` : Domain is only sent for actions that needs it. This massively trims the payload between the two pods.
+- Consider using c5n.nxlarge machines which are more compute optimized and support better parallelization on http requests.
+  However, as they are low on memory, models need to be trained lightweight.
+  Not suitable if you want to run transformers
+
 
 ### Debugging bot related issues while scaling up