New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scheduled Threads for crawling #919
Comments
Are you looking to start the crawler when a URL is added somewhere? This is currently not supported out of the box. Crawler does support reading URLs dynamically upon startup via IStartURLsProvider. You can also generate a file from your source and feed it to urlsFile via a variable. These will also be read only at startup time. Perhaps there is something else we can suggest if you share more details. |
Yes, I want to make crawling service keep opening which can receive URL from other source continuously, crawl, collect and commit after it receives new URLs. |
This is not currently supported. You will have to build your own Java application that uses the Crawler (Examples here). Further helpful info can be found here. Consider the following idea:
I strongly recommend setting an upper limit on the number of crawler instances this app can spawn. If programming is not your forte, you could also script this. |
Thank you so much. I'm trying to do it. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Hi,
I want to set up a service listening the new URLs from other sources. The service should be kept as an opening situation. And every time I import a URL, It can be executed for crawling and committing.
I have two problems to solve based on the codes. First, I think the stopping option should be closed. But the committers only will be executed after all data extraction has been stopped. Second, I think the URLs' import logic can be regarded as adding into the queue, but I haven't found the location to add. Can you give me some hints?
The text was updated successfully, but these errors were encountered: