New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SimpleFetcherBolt to send URLs back to its own queue if time to wait above threshold #582

Closed
jnioche opened this Issue Jun 11, 2018 · 0 comments

Comments

Projects
None yet
1 participant
@jnioche
Member

jnioche commented Jun 11, 2018

The SimpleFetcherBolt is less complex than the standard FetcherBolt as it does not have to hold internal fetch queues but instead has many instances (threads managed by Storm). However, its performance is usually worse as it enforces the politeness by sleeping the necessary amount of time, which in effect, prevents it from processing URLs from other servers.

What we can do is to send any tuple for which the wait time is above a certain threshold back to the queue of the bolt if it is above a certain threshold. This would have the advantage of moving quicker to a URL from a different server, but a possible drawback is that a URL could get a timeout if it gets sent to the back of the queue too often.

By default, the threshold would be set to -1, meaning that the existing behaviour would be preserved and all delays would be slept.

@jnioche jnioche added this to the 1.10 milestone Jun 11, 2018

jnioche added a commit that referenced this issue Jun 11, 2018

@jnioche jnioche closed this Jun 11, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment