You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Original work was done in #1529 to reduce the amount of time the initial request to kick off a mapshed job. However, the latency on submitting async tasks, including MapShed, are still quite long and an exploration of ELB logs and p95 latency charts show that we have consistent 10-20 second execution of these submission views. The main culprit seems to pinging the workers to get a list via chose_worker, investigate alternatives. Things to keep in mind:
We're not trying to reduce the actual latency of the submission call (such as invoking celery on a new thread and responding to the request sooner), but rather the adding of the new job. If it takes 15 seconds to queue the job and 5 to execute it, it still takes 20s to get results no matter how quickly the initial http request is resolved
Caching the list of workers for a period of time may not be that effective in practice. If we cache for a short enough period of time to make it unlikely to have workers cycle in and out, it may not be long enough for relatively infrequent requests to job submissions. Although, for a lot of our jobs, the submissions comes in very close together because the requests are fired off all at once. Caching in a static variable in python would also be additionally unhelpful because the requests are likely routed to a number of app server instances which execute the choose_worker.
The latency is highly variable (30s to 2s) and it's unclear what the cause is for it is
Here are the highest 20 latency request for 3 days in March. Note they're all to /modeling/start/*. I've anonomyzied the IP address, but kept the label consistent (AAA is the same IP in all requests).
Additional anecdotal evidence: when running the multi-year model on a small AoI (65km 2), the job submission endpoint took considerably longer than actually executing the model:
Original work was done in #1529 to reduce the amount of time the initial request to kick off a mapshed job. However, the latency on submitting async tasks, including MapShed, are still quite long and an exploration of ELB logs and p95 latency charts show that we have consistent 10-20 second execution of these submission views. The main culprit seems to pinging the workers to get a list via
chose_worker
, investigate alternatives. Things to keep in mind:Here are the highest 20 latency request for 3 days in March. Note they're all to
/modeling/start/*
. I've anonomyzied the IP address, but kept the label consistent (AAA
is the same IP in all requests).The text was updated successfully, but these errors were encountered: