Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STORM-583: Initiali check-in for storm-eventhubs. #336

Closed
wants to merge 5 commits into from

Conversation

shanyu
Copy link
Contributor

@shanyu shanyu commented Dec 3, 2014

Signed-off-by: Shanyu Zhao shzhao@microsoft.com

Signed-off-by: Shanyu Zhao <shzhao@microsoft.com>
@revans2
Copy link
Contributor

revans2 commented Dec 5, 2014

@shanyu This is a ton of code. Do you have a JIRA for this? Do you have a design document of some sort describing how this works?

@shanyu
Copy link
Contributor Author

shanyu commented Dec 6, 2014

@revans2 Yes, this is the JIRA:
https://issues.apache.org/jira/browse/STORM-583
A design document is also attached to the JIRA. Thanks!

@lazyval
Copy link
Contributor

lazyval commented Dec 8, 2014

@shanyu it might be a good idea to prefix this PR title with STORM-583: this way jira's github bot and people could relate issue and PR to each other.

@shanyu shanyu changed the title Initiali check-in for storm-eventhubs. STORM-583: Initiali check-in for storm-eventhubs. Dec 8, 2014
@@ -0,0 +1,107 @@
<?xml version="1.0" encoding="UTF-8"?>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need apache license header.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ptgoetz !This is fixed.

@ptgoetz
Copy link
Member

ptgoetz commented Dec 12, 2014

I would support pulling this in, but we need additional committers to approve it, and one or more to volunteer as sponsor(s).

As @revans2 mentioned, this is a sizable amount of code, and effectively amounts to a code donation from Microsoft. As such it needs to go through the IP clearance process before it can be merged.

If there are enough +1 votes from committers (an no vetoes), I will initiate the IP clearance process.

If downstream throughput of a Storm topology is low, the EventHubSpout keeps
sending tuples downstream but not get acked. This can result in OutOfMemoryError
in the worker process that runs EventHubSpout because the spout needs to cache
all pending messages.

We need to introduce a new configuration:
 Eventhubspout.max.pending.messages.per.partition
And the default value of this configuration is 1024. With this configuration,
if the pending messages reach this value, then EH spout won’t fetch more messages
from eventhubs entity.

Signed-off-by: Shanyu Zhao <shzhao@microsoft.com>
Currently when a Storm topology first creates an EventHubSpout (no offset saved
in Zookeeper), it always fetches events from the oldest messages in the EventHubs.

One way to solve this problem is to provide an additional configuration
"eventhub.receiver.enqueuetime.after", which if set AND there is no offset saved
in Zookeeper (e.g. first time running the topology) then we’ll use it as the
enqueue-time-based filter to create EventHubs receiver. With this configuration,
Storm topologies can quickly “bootstrap” and starts from recent messages
(ignoring old messages in EventHubs).

Signed-off-by: Shanyu Zhao <shzhao@microsoft.com>
Added copywrite headers, and also include curator framework jar
in the jars with dependencies (because storm relocates the jar).

Signed-off-by: Shanyu Zhao <shzhao@microsoft.com>
@ghost
Copy link

ghost commented Jan 23, 2015

@ptgoetz Is storm-eventhubs available to use?

@harshach
Copy link
Contributor

@sasincj its in the process of getting into storm/external. But you can use by going into the shanyu's branch directly

@harshach
Copy link
Contributor

@ptgoetz @revans2 I am +1 on merging this in

@revans2
Copy link
Contributor

revans2 commented Jan 23, 2015

@sasincj
I don't understand the EventHub system well enough to really be able to judge if the code is functioning properly in corner cases, etc. I get at a high level what it does (http://azure.microsoft.com/en-us/services/event-hubs/) but I have never used it myself. The code itself looks OK, but I am not comfortable voting on it either way with my current knowledge.

I am a bit nervous about pulling this in as we don't have a very good way to test it beyond the unit tests. EventHubs is a Microsoft Azure specific system. I cannot spin up my own instance of it to validate that the code is doing what I expect prior to a release. Although it may be a bit painful, I can do that for Kafka, Redis, HDFS, and HBase integration that we have accepted in so far. That doesn't mean we cannot/should not pull it in. To me it mostly means that we need more assurances and support prior to doing so.

@ghost
Copy link

ghost commented Jan 24, 2015

@revans2 Without storm-eventhubs, Developers who are using storm with eventhub has to write their own Spout to receive the events. So I thought it would save developer's time when we have it as storm's add-on. For now I'll use it from @shanyu 's branch and let you know my experience...

…anugage

Signed-off-by: Shanyu Zhao <shzhao@microsoft.com>
@ptgoetz
Copy link
Member

ptgoetz commented Jan 28, 2015

+1 for accepting this as a code donation from Microsoft and moving forward with the IP clearance process.

COMMTTERS please note that this must not be merged until the IP clearance process is completed successfully. I am working with @shanyu to complete those requirements.

@ptgoetz
Copy link
Member

ptgoetz commented Jan 28, 2015

@shanyu Also, once the source code tarball for IP clearance has been created, please don't add any commits to this pull request. Any necessary modifications can take place later after IP clearance is complete.

@shanyu
Copy link
Contributor Author

shanyu commented Jan 29, 2015

@ptgoetz Got it. I will keep the pull request intact during the IP clearance phase. Thanks a lot!

@shanyu
Copy link
Contributor Author

shanyu commented Mar 11, 2015

@ptgoetz It has been a while, is the IP clearance done? Thx

@ptgoetz
Copy link
Member

ptgoetz commented Apr 24, 2015

Thanks @shanyu. I merged this into master.

Parth-Brahmbhatt pushed a commit to Parth-Brahmbhatt/incubator-storm that referenced this pull request May 4, 2015
…#336)

Conflicts:
	CHANGELOG.md
	pom.xml
	storm-dist/binary/src/main/assembly/binary.xml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants