Conversation
|
Thanks, interesting data. It's surprising that CPU is that much of a bottleneck for Airnode (though the number of triggers is a bit low). Fortunately, Airnode can be scaled linearly by simply deploying more of them (and the same applies to signed APIs, API providers deploying their own distributes the load in a natural way, and we can put ours behind a sort of a load balancer). We must find out the limits of a single instance first though.
I'm not sure what you refer to as the expected load. I would look at the following.
We should have headroom in both the number of Airnodes and the number of Beacons per Airnode, 100 Beacons is potentially on the low side. Similarly, 25 triggers is not necessarily the upper limit. Even the best machine configuration will probably fail to perform well in the most pessimistic scenario, in which case we can adjust the specs. |
I guess you mean Airnode feed.
I tried mimick 300 unique dAPIs and every 1s updates, and the API supporting batching.
Sorry, I meant that those numbers are the minimum. I consider
Sure, it fairly quick for me to run these, but I didn't get the:
What do you mean by the "25 Airnodes" part? Do you mean that 25 Airnode feeds pushing to single Signed API? |
|
It's not clear to me what you mean by a dAPI, Airnode and signed API only cares about Beacons
Yes. So the signed API receives 25 * 300 individual API calls a second in the case that no API supports batching (or rather double that if we have API providers deploy Airnodes redundantly, which we will). |
I mean beacons |
|
Let's talk about it on tomorrow's call |
12e4a80 to
7ba8b6d
Compare
|
fyi: I did some tests like we talked about, but the results are pretty bad and I didn't even do the full scaled one... My intention is to properly identify what is causing the performance issues, because I think the performance should be much better. |
649b7c9 to
4476f99
Compare
52d03eb to
97f316e
Compare
Context updateThe bottleneck mentioned in this comment is signed data verification, which takes 2-3ms. That doesn't scale with the number of API calls we want signed API to handle. We discussed possible options internally, but in the meantime I continued doing performance tests with the verification in Signed API removed (which has good performance). Because the Signed API is currently without authentication and signed data verification, there is not much point in stress testing it in current state (although the added auth should be lightweight and the result should have similar perf). We agreed to wrap up the performance test and focus on different Airnode feed scenarios. So here are the results. ResultsAirnode feed (cpu=2048, mem=4096), 300 beacons, 1s fetchInterval, no batching Smaller machine results in 100% CPU. As a note, the target signed API without verification (cpu=1024, mem=2048) was 12% cpu, 6% mem. This scaled linearly with the number of Airnode feeds deployed. Airnode feed (cpu=1024, mem=2048), 300 beacons, 5s fetchInterval, no batching
I also tried to deploy cpu=512, mem=1024 but the result was 100% cpu (and it never went down). Airnode feed (cpu=1024, mem=2048), 100 beacons, 1s fetchInterval, no batching
Airnode feed (cpu=512, mem=1024), 100 beacons, 5s fetchInterval, no batching
Airnode feed (cpu=1024, mem=2048), 300 beacons, 1s fetchInterval, batched per 100 (3 triggers)
As a note, increase to 400 beacons ends up with 100% CPU. I also did a similar test with 300 beacons yesterday with not so optimal Signed API (larger responses and that had 100% CPU as well). Airnode feed (cpu=1024, mem=2048), 100 beacons, 1s fetchInterval, batched per 100 (1 triggers)
Airnode feed (cpu=1024, mem=2048), 100 beacons, 1s fetchInterval, batched per 100 (1 triggers)
Final notes
|
9c72efb to
cdc18a0
Compare
There was a problem hiding this comment.
I understand that these instructions might be hard to follow, but it's quite complex to set the configurations right. If you are interested I can explain it on-call or short Q/A.
There was a problem hiding this comment.
The templates here have a risk of going stale, but it's convenient to have them here. In the README I mention explicitly to check whether the CF template is correct.
andreogle
left a comment
There was a problem hiding this comment.
I read through everything and it doesn't seem too complicated. I didn't replicate locally though
| apiCredentials: [], | ||
| nodeSettings: { | ||
| nodeVersion: '0.1.0', | ||
| airnodeWalletMnemonic: 'destroy manual orange pole pioneer enemy detail lady cake bus shed visa', |
There was a problem hiding this comment.
Randomly generated wallet. I hardcoded it for convenience.
| }, | ||
| { | ||
| "urlPath": "/30s-delay", | ||
| "delaySeconds": 20 |
There was a problem hiding this comment.
| "delaySeconds": 20 | |
| "delaySeconds": 30 |
There was a problem hiding this comment.
Hehe, I actually never ended up using these.
e93fc8e to
f537f35
Compare
ea7c9d2 to
ff7e4bf
Compare







Closes #40
Rationale
Used https://pool.nodary.io/ as an API and created a small script that uses all endpoints from
https://pool.nodary.io/<AIRNODE>and treats them as different dAPIs.There are ~25 different Airnodes, each has ~100 beacons giving an upper limit of ~2500 for perf test. I immediately noticed I am rate limited so I deployed a bunch of cloud workers to avoid sending requests from a single URL (which worked).
Results overview
Details
I deployed everything on us-east-1 or us-east-2 with one exception at the end when I debugged the latency issues of Signed API.
~2500 dAPIs across 25 triggers each with 60s fetchInterval
cpu=256, mem=512, logLevel=debug
Airnode feed

Signed API

cpu=512, mem=1024, logLevel=debug
Airnode feed

Signed API

cpu=512, mem=1024, logLevel=warn
Airnode feed

Signed API

cpu=1024, mem=2048, logLevel=warn
Airnode feed

Signed API

The stats are looking OK on this machines, but Signed API is slow, requests take ~5s, response size is ~1MB. Half of requests fail with 502 error.
cpu=2048, mem=4096, logLevel=warn
Airnode feed

Signed API

"Only" 1/4 of requests fail with 502, it seems the bottleneck is somewhere else. I also redeployed using logLevel debug and the results were similar - so the cost of logging is negligible.
399 dAPIs across 4 triggers each with 60s fetchInterval
cpu=2048, mem=4096, logLevel=debug
Airnode feed

Signed API

All request successful, response time is ~200ms (but ~120 is inherent latency).
cpu=256, mem=512, logLevel=debug
Airnode feed

Signed API

1/4 requests fail, some requests take ~500ms and some ~5s. Probably depends on whether the machine is processing something or not.
cpu=512, mem=1024, logLevel=debug
Airnode feed

Signed API

30 request, delay between them 1s:
399 dAPIs across 4 triggers each with 1s fetchInterval
cpu=512, mem=1024, logLevel=debug
Airnode feed

Signed API

Request stats:
cpu=2048, mem=4096, logLevel=debug
Airnode feed

Signed API

Weaker machines were causing 100% CPU in Airnode feed. Request stats:
cpu=8192, mem=16384, logLevel=debug
Signed API
Redeployed just the Signed API to see if bigger machine helps. Stats:
Debugging the Signed API latency
From AWS logs the server spends ~50ms processing the query. I run smaller Airnode feed (40 dAPIs) decreased the response payload to 15kb (from 180kb) and request times were stable, around ~180ms which was expected. I suspected this is all caused by a combination of network distance and response payload.
I confirmed the hypothesis by deploying Signed API (512cpu, 1024mem) on Europe. This should not be an issue in prod, because we will use CDN over the signed API. Stats:
I ran the script a bit longer and out of 2000 calls only 2 failed. The singed API CPU and memory was stable
