You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am scraping website with concurrency higher than one and have multiple tabs opened in each browser. I am able to detect in handlePageFunction or gotoFunction that I was blocked by anti-scraping protection. So I need to retry the URL with different IP address.
Workaround:
The easiest way is to kill a browser using browser.close() and to throw an error so that the request gets retried in a new browser. The problem is that this way all the opened tabs get killed however they may be processing successfully opened pages.
Proper implementation:
Add puppeteerPool.retire(browser) or browser.retire() method. And in the case mantioned above call:
Use case:
I am scraping website with concurrency higher than one and have multiple tabs opened in each browser. I am able to detect in
handlePageFunction
orgotoFunction
that I was blocked by anti-scraping protection. So I need to retry the URL with different IP address.Workaround:
The easiest way is to kill a browser using
browser.close()
and to throw an error so that the request gets retried in a new browser. The problem is that this way all the opened tabs get killed however they may be processing successfully opened pages.Proper implementation:
Add
puppeteerPool.retire(browser)
orbrowser.retire()
method. And in the case mantioned above call:The text was updated successfully, but these errors were encountered: