Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing timeout option #28

Closed
Jordi-m opened this issue Jun 12, 2016 · 6 comments
Closed

Missing timeout option #28

Jordi-m opened this issue Jun 12, 2016 · 6 comments
Labels

Comments

@Jordi-m
Copy link

Jordi-m commented Jun 12, 2016

Hi,

I've tried to use this library to scrape about 40 website asynchronously. I do this by using the Promise object returned by ScrapeIt, and then doing something like this

Promise.all(promises).then( function(result){

The problem here is that one of the websites I scrape can be down/slow at unspecified times (I have no control over it). The problem with the library is that ScrapeIt never seems to time out (tried it for a few minutes, but it won't return and run the Promise.all ... code.

Any suggestion on how I can make it timeout (while still using the library promises)? Did I miss any option?

@IonicaBizau
Copy link
Owner

IonicaBizau commented Jun 12, 2016

@Jordi-m What if it returned an error (are you using .catch?)? It's probably using the default timeout from the Node.js request functionality.

@Jordi-m
Copy link
Author

Jordi-m commented Jun 15, 2016

I did add the catch, but in the last three days all the 40 websites are up and running; I'm unable to get this situation reproduced myself (even added a slow loading URL). If it becomes an issue again, I'll get back to this Github issue. Thank you for the suggestion so far.

@IonicaBizau
Copy link
Owner

@Jordi-m I will keep an eye on it too. I'm also scraping lots of pages and didn't have this problem yet...

@Jordi-m
Copy link
Author

Jordi-m commented Aug 5, 2016

@IonicaBizau It's been a while, but today I've spotted the problem mentioned in this issue again. I've immediately tried to find out which website it is.

It turns out that this specific website is causing an infinite(?) HTTP redirect. When visiting this page in browser, the developer console shows HTTP 508: Loop detected. I suspect the library will just wait 'forever' for the redirect to stop?

@IonicaBizau
Copy link
Owner

@Jordi-m Hmm, good point. Reported here: follow-redirects/follow-redirects#41

@IonicaBizau
Copy link
Owner

This should be fixed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants