New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crawl timeout #44
Comments
Hi Julien, You can set the I believe you can also call Crawler.Stop() after some time? I'm not sure how well that works to be honest, haven't used gocrawl in a long time. HTH, |
Thanks for your quick response!
The "STOPPING!" string is called from Filter and Visit:
So my question now is how can I access the BTW, many thanks for gocrawl, I love it so far! |
Ah, I see now; it's the value returned by |
So I ended up doing this: ...
c := gocrawl.NewCrawlerWithOptions(opts)
// New code:
go func(crawler *gocrawl.Crawler) {
time.Sleep(1 * time.Hour)
crawler.Stop()
}(c)
// End of new code
c.Run(baseUrl)
... |
I would like the crawls to stop after 1 hour, because some websites have an infinite amount of URLs.
So I think it would be great to have a
CrawlTimeout
option, or an Extender function likeShouldFinishCrawl() bool
.Note: I use it with SameHostOnly = true, to crawl only one site.
The text was updated successfully, but these errors were encountered: