Issue tracker [STORM-3362] Solved: eventHubSpout uses a blocking receiver in nextTuple()#2981
Conversation
|
Thanks for the contribution. The change looks fine, but I think we should let people configure the timeout. Could we add it to https://github.com/apache/storm/blob/8d4432233c8776247da520b5935b7eebfc267ba5/external/storm-eventhubs/src/main/java/org/apache/storm/eventhubs/spout/EventHubSpoutConfig.java ? |
8d44322 to
b779c00
Compare
|
Good suggestion. How about this? @srdo |
|
LGTM, thanks. Please update the commit message to contain the JIRA issue (see the other PRs for example). Will merge after the 24 hour waiting period, assuming tests pass :) |
b779c00 to
7419fd1
Compare
|
Sure, done |
|
Great, +1 pending tests. |
|
Test failure is unrelated. |
|
yeah, i didn't get it either. do I need to do anything? |
|
Not for this PR, no. |
|
Alright thx |
HeartSaVioR
left a comment
There was a problem hiding this comment.
The code itself looks good. My 2 cents default value may need to be reconsidered.
| // disabling filter | ||
| private String connectionString; | ||
| private String topologyName; | ||
| private int receiverTimeoutInMillis = 10; // default |
There was a problem hiding this comment.
Is 10ms a realistic value for timeout? Since it's the default value and end users don't touch unless they face timeout error, the value would need to cover most of cases. We know it should be short to let Spout handles ack/fail fast, but 10ms sounds too small to reach remote-end.
There was a problem hiding this comment.
you are right. I think this depends on how big is the payload in event hub. eventhub has a 1 MB limit for packet size. If we considering worst case scenario, then the bandwidth requirements are 1MB/0.01sec = 100M/sec which seems a really high requirements. based on this, let's tune this to like 100mills so the bandwidth requirements become 10M/sec. do you think this is reasonable? let me know. My current application, the payload is pretty small. only like several bytes so it doesn't matter.
There was a problem hiding this comment.
I'm not familiar with the EH client library, so I don't know if it fetches data in a background thread, but assuming it only fetches data when polled, latency to EH will be a concern. With a default of 100ms, you will only be able to fetch data at all if the latency is less than 100ms.
The Kafka spout uses 200ms by default, but Kafka can be run "near" the spout (in network hops terms, e.g. in the same LAN), while EH is a hosted service. It might be good to set it conservatively, e.g. to 500-1000ms, and let people fine tune it manually.
There was a problem hiding this comment.
sounds good. let me set it to 1 sec then to be safe
7419fd1 to
ec0a99a
Compare
No description provided.