Skip to content
This repository has been archived by the owner on Mar 3, 2023. It is now read-only.

Generalize the interface of a Source by allowing it to emit a collection at once #2526

Merged
merged 2 commits into from
Nov 9, 2017

Conversation

srkukarni
Copy link
Contributor

No description provided.

@lucperkins
Copy link
Contributor

@srkukarni I do like the addition of a collections interface here, but I'm not sure how I feel about making Collection<T> get() the sole method for retrieving data from a source. The current T get() method makes a lot more sense for me when it comes to basically all of the use cases that I've worked with. What would be some use cases where the collections interface would be, in fact, required? I suppose batch processing might be an example, but even then it's not yet clear to me what the collections construct yields. I'm open to being convinced, especially because this is a new problem domain for me, but would like to hear more about the rationale.

@srkukarni
Copy link
Contributor Author

@lucperkins This change affects only the complex source. Thus the simple sources are still just having a simple interface. The reason why I propose this change is that most of the real-world spouts (kafka/pulsar) actually prefer to emit a list at a time rather than one element. Thus it naturally fits the use case. Also having the ability to emit multiple means that sources get less complicated:- they dont need to have to buffer data logic in them, etc.

@nwangtw
Copy link
Contributor

nwangtw commented Nov 8, 2017

QQ: I am new to Heron/Storm so this question may not make sense. Collection<> interface makes sense in some cases. I am wondering if efficiency a factor here?

About the interface, I am wondering if some kind of hint about the collection size could be helpful to avoid over-sized batch?

@lucperkins
Copy link
Contributor

@srkukarni Okay, as long as the simple SerializableGenerator option exists, I don't see any reason to oppose this. +1

@srkukarni
Copy link
Contributor Author

@nwangtw I did think about some kind of a hint. The issue is that the impl cannot really make smart choice regarding this hint value. There is nothing that the base api exposes any info about this.
We could in the future think about enhancing the base api to support this, but that probably requires some extra thought.

@nwangtw
Copy link
Contributor

nwangtw commented Nov 9, 2017

SGTM. Thanks.

@srkukarni srkukarni merged commit 2115ca3 into master Nov 9, 2017
@srkukarni srkukarni deleted the sanjeevk/generalize_source branch November 9, 2017 18:09
nicknezis pushed a commit that referenced this pull request Sep 14, 2020
…ion at once (#2526)

* Generalize the interface of a Source by allowing it to emit a collection at once

* Stylecheck fixes
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants