Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Purge internal queues of tuples which have already reached time out #564
URLs can sit in the internal queues of the FetcherBolt longer than the value set in topology.message.timeout.secs. This happens for instance when there aren't enough fetching threads or if the server corresponding to the queue is slow. By the time the URL gets fetched, its tuple will have been failed by Storm. Even with ES, where es.status.ttl.purgatory allows a delay until an acked or failed URL is allowed through the topology, we can have the same URL reentering the queues later on. We could deduplicate the URLs in the queues but it is probably better and simpler to simply purge them if they have gone over the timeout. The following URLs will also to have less time to wait.
added a commit
Apr 18, 2018
es.status.ttl.purgatory must be set close to