Replies: 2 comments
-
Is this still relevant? Have you tried up-to-date versions? It's hard to help without seeing some code. |
Beta Was this translation helpful? Give feedback.
0 replies
-
Thanks for getting back in this! I did manage to go around the issue by throwing an Critical Error once I want to finish the crawl (either because the crawl doesn't progress or the maximum crawl length I want I reached). I did not do further testing as this solved the issue for me. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Which package is this bug report for? If unsure which one to select, leave blank
@crawlee/playwright (PlaywrightCrawler)
Issue description
Dear Crawlee Dev team,
I am using the Crawlee Playwright Chromium crawler in headfull mode to crawl different domains one at a time. In general it works very well, but I have an issue that sometimes (randomly) a page crashes, but isn't recovered by crawlee. This happens when running my project in Windows 11 nativly and also when running it in a docker container (using the docker file from the crawlee example).
The page in the browser simply stays open showing the error (mostly some 40X error) and is not closed automatically. The crawler then simply doesn't finished after the maxRequestsPerCrawl is reached. The PlaywrightCrawler:AutoscaledPool: state keeps showing a currentConcurrency of 1 in the logs endlessly. It's basically in a dealock. In Windows I can finish the crawl by closing the chromium window, but this is obviously not possible in a docker container.
Is this an known issue?
How might I detect such a deadlock during the crawl and stop the crawl? I tried using CriticalError, but this will crash the whole runtime not just the crawl when thrown outside of the crawler callback. And i can't call it inside the crawler, because it's stuck and doesn't load any new pages.
Thank you for your help and let me know if I can provide any additional information! I think crawlee is a great project and the best nodejs crawler currently out there.
Best wishes
Tobias
Code sample
No response
Package version
3.1.4
Node.js version
NodeJs 16
Operating system
Windows 11 as well as Docker
Apify platform
I have tested this on the
next
releaseNo response
Other context
No response
Beta Was this translation helpful? Give feedback.
All reactions