Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CheerioCrawler may timeout and _requestFunction promise might stick around #571

Closed
pocesar opened this issue Feb 11, 2020 · 1 comment
Closed

Comments

@pocesar
Copy link
Contributor

pocesar commented Feb 11, 2020

after some debugging, and checking for bugs related to got, when using stream: true, the code below might be stuck "forever" (note the quotes):

https://github.com/apifytech/apify-js/blob/336db28c357c7817e4f06d38e254321805de2a46/src/crawlers/cheerio_crawler.js#L582-L591

since it has a "timeout" attached to it using addTimeoutToPromise, it might never complete and that other promise (_parseHtmlToDom) will stick for a while, which might take longer to be GC'ed (consider it a feature).
the stream: true fails to emit the "error" event when there's a timeoutSecs provided that's low enough (tested with 1 sec with consistent results), the timeout happens before streaming content and it gets stuck in a limbo, the passthrough stream never starts emitting "readable" or "data" events.

I've also tried attaching 'end', 'close' and 'error' events to the parser (and also to the response stream) before calling pipe(), alas, they don't emit as well, because the response is never generating anything.

a workaround would to manually emit an 'error' (and/or 'close') when it reaches the timeout (like half the time of the timeout provided to the upper function?) and clearing the timeout when the resolve/reject is called, so it does two things at once, settles the promise so it no longer lingers, and destroys the stream because of the error emitted (unless autoDestroy is set to false, that is the default).
calling destroy() would emit a 'close' event only, and would not be caught by the "on" listener there, that is expecting only the 'error' event.

@B4nan
Copy link
Member

B4nan commented Jul 18, 2022

Closing as outdated, quite a lot of internals changed, and it could have been a nodejs bug as well I guess. Or do you think this is still valid?

@B4nan B4nan closed this as completed Jul 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants