Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-158] add support for bounded sources in streaming #104

Closed
wants to merge 4 commits into from

Conversation

mxm
Copy link
Contributor

@mxm mxm commented Mar 31, 2016

Apart from a few improvements, this PR introduces bounded sources in streaming. The BoundedSource wrapper (SourceInputFormat) is the same as for the batch part of the runner. The translator assigns ingestion time watermarks and processing time timestamps upon reading from the source. We could make this more flexible in terms of watermark generation if we had an UnboundedSource wrapper for a BoundedSource.

Perhaps we could have common utility for runners to deal with serialization of PipelineOptions. At some point, they have to be shipped. I had to change the serialization code because I was experiencing a serialization bug which led to a serialization loop. Debugging this was almost impossible because the stack trace doesn't show all serialization calls due to some magic in the VM. I didn't find any cyclic references between the PipelineOptions and Flink components. I'm assuming this is a bug and the workaround using byte array serialization of the options is fair enough. See SourceInputFormat.

@kennknowles
Copy link
Member

I suggest R=@dhalperi for a second pair of eyes on the I/O bits, and perhaps ideas from @lukecwik on the pipeline options.

@@ -73,8 +69,9 @@ public void configure(Configuration configuration) {}

@Override
public void open(SourceInputSplit<T> sourceInputSplit) throws IOException {
options = new ObjectMapper().readValue(serializedOptions, PipelineOptions.class);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems inefficient to decode pipeline options several times.

Is this to protect the user from mutating it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. This will be called on every input split. We can move the deserialization code to the configure method.

@mxm
Copy link
Contributor Author

mxm commented Apr 6, 2016

Incorporated the suggestions. Would like to merge later on.

WindowedValue.of(value,
BoundedWindow.TIMESTAMP_MIN_VALUE,
GlobalWindow.INSTANCE,
PaneInfo.NO_FIRING));
}
}).assignTimestampsAndWatermarks(new IngestionTimeExtractor<WindowedValue<T>>());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know nothing here, just want to confirm that it's okay to use an "IngestionTimeExtractor" for a collection where all elements have timestamps of MIN_VALUE.

@dhalperi
Copy link
Contributor

dhalperi commented Apr 6, 2016

LGTM. I left one comment, but merge as you see fit!

@dhalperi
Copy link
Contributor

dhalperi commented Apr 6, 2016

(And a gentle reminder to squash CLs as appropriate)

@dhalperi
Copy link
Contributor

Hi @mxm, just a ping that (AFAIK) this is ready for you to merge.

@dhalperi
Copy link
Contributor

Hi @mxm, just a ping that this is ready for you to rebase and merge.

@mxm
Copy link
Contributor Author

mxm commented Apr 18, 2016

Hi @dhalperi, I was completely knocked out for a week. Will merge this later on.

@asfgit asfgit closed this in 56e28a9 Apr 18, 2016
@mxm mxm deleted the BEAM-158 branch April 18, 2016 14:38
@mxm
Copy link
Contributor Author

mxm commented Apr 18, 2016

Rebased and merged accordingly.

@dhalperi
Copy link
Contributor

Hope you're feel better!

@mxm
Copy link
Contributor Author

mxm commented Apr 18, 2016

Thanks, much better :)

iemejia referenced this pull request in iemejia/beam Jan 12, 2018
mareksimunek pushed a commit to mareksimunek/beam that referenced this pull request May 9, 2018
#! Add contact information to README.md
alnzng pushed a commit to alnzng/beam that referenced this pull request Aug 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants