Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Elasticsearch spouts don't load newly added seeds #478
Since version 1.5, the spouts try to optimise the query on nextFetchDate by reusing it unless no documents are found. This works fine for cases where all the seeds are injected at the same time, but not when new seeds are added while the crawl is already under way.
We should add a new config indicating that we want to always use NOW as a nextFetchDate or better a max amount of time since the previous value of nextFetchDate so that we'd update it e.g. every few minutes.