Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PuppeteerPool.retire(browser) method #122

Closed
mtrunkat opened this issue Jul 4, 2018 · 3 comments
Closed

Add PuppeteerPool.retire(browser) method #122

mtrunkat opened this issue Jul 4, 2018 · 3 comments
Labels
feature Issues that represent new features or improvements to existing features.

Comments

@mtrunkat
Copy link
Member

mtrunkat commented Jul 4, 2018

Use case:

I am scraping website with concurrency higher than one and have multiple tabs opened in each browser. I am able to detect in handlePageFunction or gotoFunction that I was blocked by anti-scraping protection. So I need to retry the URL with different IP address.

Workaround:

The easiest way is to kill a browser using browser.close() and to throw an error so that the request gets retried in a new browser. The problem is that this way all the opened tabs get killed however they may be processing successfully opened pages.

Proper implementation:

Add puppeteerPool.retire(browser) or browser.retire() method. And in the case mantioned above call:

handlePageFunction({ browser, puppeteerPool, request }) {
   ...
   puppeteerPool.retire(browser);
   throw new Error('Request was blocked!');
   ...
}
@mtrunkat mtrunkat added the feature Issues that represent new features or improvements to existing features. label Jul 4, 2018
@metalwarrior665
Copy link
Member

This means the browser would wait for other tabs to finish their functions and then shut down?

@mtrunkat
Copy link
Member Author

mtrunkat commented Jul 5, 2018

Yes, retired browser wait for all tabs to be closed (5min timeout).

@mtrunkat
Copy link
Member Author

Fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Issues that represent new features or improvements to existing features.
Projects
None yet
Development

No branches or pull requests

2 participants