fix: reset `RequestQueue` state after 5 minutes of inactivity by B4nan · Pull Request #1324 · apify/crawlee

B4nan · 2022-04-06T14:49:50Z

We now track last activity done on a RequestQueue instance:

added new request
started processing a request (added to inProgress cache)
marked request as handled
reclaimed request

If we don't detect one of those actions in last 5 minutes, and we have some requests in the inProgress cache, we try to reset the state. We can override this limit via APIFY_INTERNAL_TIMEOUT env var. If request handler timeout value is higher, we use that instead. This internal timeout is also used when executing the user-provided request handler, so we know there needs to be some activity in this internal (the handler either finished or timeouted).

This should finally resolve the 0 concurrency bug, as it was always about stuck requests in the inProgress cache.

Closes #997

B4nan · 2022-04-06T14:54:28Z

Output from the E2E test:

➜  apify-js git:(zero-concurrency) ✗ node test/request_queue_reset.test-e2e.js
WARN  Neither APIFY_LOCAL_STORAGE_DIR nor APIFY_TOKEN environment variable is set, defaulting to APIFY_LOCAL_STORAGE_DIR="/Users/adamek/htdocs/apify/apify-js/apify_storage"
INFO  System info {"apifyVersion":"2.3.0","apifyClientVersion":"2.2.0","osType":"Darwin","nodeVersion":"v16.10.0"}
WARN  Neither APIFY_LOCAL_STORAGE_DIR nor APIFY_TOKEN environment variable is set, defaulting to APIFY_LOCAL_STORAGE_DIR="/Users/adamek/htdocs/apify/apify-js/apify_storage"
INFO  CheerioCrawler:AutoscaledPool:Snapshotter: Setting max memory of this run to 4096 MB. Use the APIFY_MEMORY_MBYTES environment variable to override it.
DEBUG CheerioCrawler:AutoscaledPool: scaling up {"oldConcurrency":1,"newConcurrency":2,"systemStatus":{"isSystemIdle":true,"memInfo":{"isOverloaded":false,"limitRatio":0.2,"actualRatio":null},"eventLoopInfo":{"isOverloaded":false,"limitRatio":0.7,"actualRatio":null},"cpuInfo":{"isOverloaded":false,"limitRatio":0.4,"actualRatio":null},"clientInfo":{"isOverloaded":false,"limitRatio":0.3,"actualRatio":null}}}
INFO  CheerioCrawler:AutoscaledPool: state {"currentConcurrency":0,"desiredConcurrency":2,"systemStatus":{"isSystemIdle":true,"memInfo":{"isOverloaded":false,"limitRatio":0.2,"actualRatio":null},"eventLoopInfo":{"isOverloaded":false,"limitRatio":0.7,"actualRatio":null},"cpuInfo":{"isOverloaded":false,"limitRatio":0.4,"actualRatio":null},"clientInfo":{"isOverloaded":false,"limitRatio":0.3,"actualRatio":null}}}
DEBUG CheerioCrawler:SessionPool: Created new Session - session_cffdMdWFM4
DEBUG CheerioCrawler:SessionPool: Created new Session - session_NiDeZFYe1C
INFO  pvjOOriUMWoJNqA
INFO  35bn3sNHmEkLZ5B
WARN  RequestQueue: Queue head returned a request that is already in progress?! {"nextRequestId":"QkJTmIh3PUZUkQs","inProgress":true,"recentlyHandled":false}
DEBUG CheerioCrawler:SessionPool: Created new Session - session_0wAmzeXEWj
^[[AWARN  RequestQueue: The request queue seems to be stuck for 30s, resetting internal state. {"inProgress":["QkJTmIh3PUZUkQs"]}
DEBUG CheerioCrawler:SessionPool: Created new Session - session_TCguiWXgZF
INFO  QkJTmIh3PUZUkQs
INFO  CheerioCrawler: All the requests from request list and/or request queue have been processed, the crawler will shut down.
DEBUG CheerioCrawler:SessionPool: Persisting state {"persistStateKey":"SDK_SESSION_POOL_STATE"}
DEBUG Statistics: Persisting state {"persistStateKey":"SDK_CRAWLER_STATISTICS_0"}
INFO  CheerioCrawler: Final request statistics: {"requestsFinished":3,"requestsFailed":0,"retryHistogram":[3],"requestAvgFailedDurationMillis":null,"requestAvgFinishedDurationMillis":753,"requestsFinishedPerMinute":6,"requestsFailedPerMinute":0,"requestTotalDurationMillis":2258,"requestsTotal":3,"crawlerRuntimeMillis":31834}

One of the 3 requests is added to the inProgress cache to trigger the 0 concurrency state. We don't explicitly check whether we are at 0 concurrency, we check that the RQ is being actively used instead. The test gets stuck after first 2 requests are processed, waits for the internal timeout to pass, then automatically resets the RQ, reloads state and processes the missing third request.

Will test this on platform now via patch-package on some real world example, with browsers involved too.

…tivity We now track last activity done on a `RequestQueue` instance: - added new request - started processing a request (added to `inProgress` cache) - marked request as handled - reclaimed request If we don't detect one of those actions in last 5 minutes, and we have some requests in the `inProgress` cache, we try to reset the state. We can override this limit via `APIFY_INTERNAL_TIMEOUT` env var. This should finally resolve the 0 concurrency bug, as it was always about stuck requests in the `inProgress` cache. Closes #997

mnmkng · 2022-04-06T16:30:23Z

-        this.internalTimeoutMillis = Math.max(this.handleRequestTimeoutMillis, 300e3); // allow at least 5min for internal timeouts
+        const tryEnv = (val) => (val == null ? val : +val);
+        // allow at least 5min for internal timeouts
+        this.internalTimeoutMillis = tryEnv(process.env.APIFY_INTERNAL_TIMEOUT) ?? Math.max(this.handleRequestTimeoutMillis, 300e3);


I would go with something like this.handleRequestTimeoutMillis * 2 + 10. There are all the internal SDK function calls that need to complete as well.

The timeout was already there, all I changed was to make it configurable via env vars. It's been in production for about 4 months, so I'd say the value is fine. But we can double it for the new RQ reset check to be reallu sure.

Note that this value is used as a timeout for several things separately, it is not meant to be some "master timeout that fits all for a request life cycle", it is used for the internal API calls, and also as a timeout for the user handler.

Even if it was there, that does not change the validity of my argument that such a timeout has a race condition 😄

mnmkng · 2022-04-06T16:32:10Z

    async isFinished() {
+        if (this.inProgressCount() > 0 && (Date.now() - +this.lastActivity) > this.internalTimeoutMillis) {
+            const message = `The request queue seems to be stuck for ${this.internalTimeoutMillis / 1e3}s, resetting internal state.`;
+            this.log.warning(message, { inProgress: [...this.inProgress] });


Would be nice to know where it got stuck here. If it was when fetching request, or handling request, marking request as handled and so on. Can we do this?

I can save the last action that was done next to the time, but I already know it won't help us. The problem was never about the last action that was done on the queue, but a stuck request - and that one was processed much ealier, not at the very end. We can detect this only once all other requests are processed.

Yeah, I meant to have this tied to the stuck request, not the one currently being processed.

…imeout value Related: #1324

B4nan force-pushed the zero-concurrency branch from 55b75be to 8cad475 Compare April 6, 2022 14:55

B4nan requested review from metalwarrior665 and mnmkng April 6, 2022 14:58

mnmkng reviewed Apr 6, 2022

View reviewed changes

B4nan changed the title ~~fix: automatically reset RequestQueue state after 5 minutes of inactivity~~ fix: reset RequestQueue state after 5 minutes of inactivity Apr 7, 2022

B4nan merged commit fc6e4bf into master Apr 7, 2022

B4nan deleted the zero-concurrency branch April 7, 2022 08:21

B4nan added a commit that referenced this pull request Apr 7, 2022

chore: use handleRequestTimeoutMillis * 2 when computing internal t…

4f54355

…imeout value Related: #1324

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: reset `RequestQueue` state after 5 minutes of inactivity#1324

fix: reset `RequestQueue` state after 5 minutes of inactivity#1324
B4nan merged 1 commit intomasterfrom
zero-concurrency

B4nan commented Apr 6, 2022

Uh oh!

B4nan commented Apr 6, 2022

Uh oh!

mnmkng Apr 6, 2022

Uh oh!

B4nan Apr 6, 2022

Uh oh!

mnmkng Apr 7, 2022

Uh oh!

mnmkng Apr 6, 2022

Uh oh!

B4nan Apr 6, 2022 •

edited

Loading

Uh oh!

mnmkng Apr 7, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

B4nan commented Apr 6, 2022

Uh oh!

B4nan commented Apr 6, 2022

Uh oh!

mnmkng Apr 6, 2022

Choose a reason for hiding this comment

Uh oh!

B4nan Apr 6, 2022

Choose a reason for hiding this comment

Uh oh!

mnmkng Apr 7, 2022

Choose a reason for hiding this comment

Uh oh!

mnmkng Apr 6, 2022

Choose a reason for hiding this comment

Uh oh!

B4nan Apr 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mnmkng Apr 7, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

B4nan Apr 6, 2022 •

edited

Loading