fix: reset RequestQueue state after 5 minutes of inactivity#1324
fix: reset RequestQueue state after 5 minutes of inactivity#1324
RequestQueue state after 5 minutes of inactivity#1324Conversation
|
Output from the E2E test: One of the 3 requests is added to the Will test this on platform now via patch-package on some real world example, with browsers involved too. |
…tivity We now track last activity done on a `RequestQueue` instance: - added new request - started processing a request (added to `inProgress` cache) - marked request as handled - reclaimed request If we don't detect one of those actions in last 5 minutes, and we have some requests in the `inProgress` cache, we try to reset the state. We can override this limit via `APIFY_INTERNAL_TIMEOUT` env var. This should finally resolve the 0 concurrency bug, as it was always about stuck requests in the `inProgress` cache. Closes #997
| this.internalTimeoutMillis = Math.max(this.handleRequestTimeoutMillis, 300e3); // allow at least 5min for internal timeouts | ||
| const tryEnv = (val) => (val == null ? val : +val); | ||
| // allow at least 5min for internal timeouts | ||
| this.internalTimeoutMillis = tryEnv(process.env.APIFY_INTERNAL_TIMEOUT) ?? Math.max(this.handleRequestTimeoutMillis, 300e3); |
There was a problem hiding this comment.
I would go with something like this.handleRequestTimeoutMillis * 2 + 10. There are all the internal SDK function calls that need to complete as well.
There was a problem hiding this comment.
The timeout was already there, all I changed was to make it configurable via env vars. It's been in production for about 4 months, so I'd say the value is fine. But we can double it for the new RQ reset check to be reallu sure.
Note that this value is used as a timeout for several things separately, it is not meant to be some "master timeout that fits all for a request life cycle", it is used for the internal API calls, and also as a timeout for the user handler.
There was a problem hiding this comment.
Even if it was there, that does not change the validity of my argument that such a timeout has a race condition 😄
| async isFinished() { | ||
| if (this.inProgressCount() > 0 && (Date.now() - +this.lastActivity) > this.internalTimeoutMillis) { | ||
| const message = `The request queue seems to be stuck for ${this.internalTimeoutMillis / 1e3}s, resetting internal state.`; | ||
| this.log.warning(message, { inProgress: [...this.inProgress] }); |
There was a problem hiding this comment.
Would be nice to know where it got stuck here. If it was when fetching request, or handling request, marking request as handled and so on. Can we do this?
There was a problem hiding this comment.
I can save the last action that was done next to the time, but I already know it won't help us. The problem was never about the last action that was done on the queue, but a stuck request - and that one was processed much ealier, not at the very end. We can detect this only once all other requests are processed.
There was a problem hiding this comment.
Yeah, I meant to have this tied to the stuck request, not the one currently being processed.
RequestQueue state after 5 minutes of inactivityRequestQueue state after 5 minutes of inactivity
We now track last activity done on a
RequestQueueinstance:inProgresscache)If we don't detect one of those actions in last 5 minutes, and we have some requests in the
inProgresscache, we try to reset the state. We can override this limit viaAPIFY_INTERNAL_TIMEOUTenv var. If request handler timeout value is higher, we use that instead. This internal timeout is also used when executing the user-provided request handler, so we know there needs to be some activity in this internal (the handler either finished or timeouted).This should finally resolve the 0 concurrency bug, as it was always about stuck requests in the
inProgresscache.Closes #997