Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-160] NexMark #366

Closed
wants to merge 4 commits into from
Closed

[BEAM-160] NexMark #366

wants to merge 4 commits into from

Conversation

mshields822
Copy link
Contributor

@mshields822 mshields822 commented May 20, 2016

The NexMark integration suit, ported to Beam and generalized to support multiple runners.

@aljoscha
Copy link
Contributor

Wow, that's a big one. Could you give some guidance as to how one should look at this: Where are the starting points, how are benchmarks invoked, what can be the results and what do they mean, how can it be run on different runners, is it for streaming/batch only?

Maybe this is all coming in the README and I'm just to eager ... 😅

@mshields822
Copy link
Contributor Author

Added README.md to the laundry list :-)

@iemejia
Copy link
Member

iemejia commented May 25, 2016

Excellent work Mark, I am also waiting for the README to see how to run this monster, I discussed briefly with Davor about this benchmark at ApacheCon, and we expect to test it in our local company cluster with all possible runners. In addition I can help you if you need to do something extra to run this over spark too (well the cases we can support with spark streaming).

@iemejia
Copy link
Member

iemejia commented May 25, 2016

Oh and one additional question, the queries 9-12 where do they came from ? I was just checking the website/publication but they are not there (or do I have an old ref).
http://datalab.cs.pdx.edu/niagara/NEXMark/
http://datalab.cs.pdx.edu/niagara/pstream/nexmark.pdf

@mshields822
Copy link
Contributor Author

Well spotted. We've added some additional 'queries' to fill in some gaps,
for example session windows, interaction with external systems which makes
'closing' windows very expensive, working purely in processing time.
They've generally been inspired by various customer scenarios we've
encountered here at Google.

I'll be getting back to the README and adding an InProcessDriver once we've
gotten our current release out the door. Adding a SparkDriver should be a
5min exercise.

I should mention my recipe for bringing this up on a flink-on-google-cloud
cluster is very ugly so I'll be asking for feedback on where I've taken the
long way unnecessarily.

Thx for your patience,
-m

On Wed, May 25, 2016 at 6:32 AM, Ismael Mejia notifications@github.com
wrote:

Oh and one additional question, the queries >9 where do they came from ? I
was just checking the website/paper but they are not there (or do I have an
old ref).
http://datalab.cs.pdx.edu/niagara/NEXMark/
http://datalab.cs.pdx.edu/niagara/pstream/nexmark.pdf


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#366 (comment)

@mshields822 mshields822 changed the title [BEAM-160] DRAFT NexMark [BEAM-160] NexMark Jun 2, 2016
@mshields822
Copy link
Contributor Author

Is the mvn failure me? Very weird.

@mshields822
Copy link
Contributor Author

R: @davorbonaci
Parking with you Davor.
I will leave this as is on my account.

@mxm
Copy link
Contributor

mxm commented Jun 8, 2016

Great work @mshields822! This looks like it is almost ready to be merged. I think I found the culprit that currently prevents the Flink Runner from completing the benchmark. I'd like to try this out on GCE and give you feedback within the next week.

dhalperi pushed a commit to dhalperi/beam that referenced this pull request Aug 23, 2016
This PTransform Reifies and gathers all panes produced for each
window, outputting a single pane per window at the time the window is
Garbage Collected.

For use in DataflowAssert, to match over the final contents of each
window.
@asfgit asfgit closed this in 565319b Oct 5, 2016
@iemejia
Copy link
Member

iemejia commented Oct 6, 2016

Hello this one is a really interesting PR in particular to compare runners, I suppose it was closed because it was inactive for a long time, but I think we should revive it, in particular now that runners have matured. Any other person interested ?
[I still have in my personal TODO list to test this with the spark runner]

Launch!
**NOTE:** As of flink-1.0.3 this will throw a `NullPointerException`
in `org.apache.beam.sdk.io.PubsubUnboundedSink$WriterFn.startBundle`.
See Jira issue [BEAM-196](https://issues.apache.org/jira/browse/BEAM-196).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is resolved.

@mxm
Copy link
Contributor

mxm commented Oct 6, 2016

I'd still be interested as well to help solving any remaining problems. Since this uses the Beam API and doesn't integrate with the core, it should still be usable. It just needs some love :)

@davorbonaci
Copy link
Member

It would be great to have this! However, I think the amount of work needed is quite significant. If there's anybody who could take this over, I'd love to assist with that.

@mxm
Copy link
Contributor

mxm commented Oct 13, 2016

I assumed this was already at a pretty mature state (mature enough to merge it and keep improving). Perhaps we can find someone from the community to take this over?

iemejia pushed a commit to iemejia/beam that referenced this pull request Jan 12, 2018
pl04351820 pushed a commit to pl04351820/beam that referenced this pull request Dec 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants