Skip to content

[BEAM-5817] Use an explicit enum for naming Nexmark benchmarks#6780

Merged
echauchot merged 1 commit intoapache:masterfrom
kennknowles:nexmark-join-to-files
Oct 29, 2018
Merged

[BEAM-5817] Use an explicit enum for naming Nexmark benchmarks#6780
echauchot merged 1 commit intoapache:masterfrom
kennknowles:nexmark-join-to-files

Conversation

@kennknowles
Copy link
Member

The "Nexmark" queries are numbers 0 through twelve. Only 1 through 8 are actually from the original Nexmark suite. It makes sense because the data set is so convenient that we might have many interesting queries and use cases to build.

Preparing to add more, I wanted to stop just making up numbers and start giving them names.


Follow this checklist to help us incorporate your contribution quickly and easily:

  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

It will help us expedite review of your Pull Request if you tag someone (e.g. @username) to look at it.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- --- --- --- ---
Java Build Status Build Status Build Status Build Status Build Status Build Status Build Status Build Status
Python Build Status --- Build Status
Build Status
Build Status --- --- ---

@kennknowles
Copy link
Member Author

Could also be a string all the way through. It hardly matters, that.

@echauchot
Copy link
Contributor

@kennknowles Thanks for your work. I finish a thing on the spark runner and I start the review before the end of the week

@echauchot
Copy link
Contributor

echauchot commented Oct 24, 2018

@kennknowles I'm doing a spark runner break :) so I can take a look at your PR after all.

Copy link
Contributor

@echauchot echauchot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Kenn for your work !
Code looks good to me. I see an improvement of using a Map instead of a List in queries list regarding null management. Except that, for ex calling query "3" or "THREE" does not make a big difference IMHO. I would have liked having a more meaningful name but I could not find a concise name neither from the functional description ("Who is selling in particular US states?") nor from the technical one (" Illustrates an incremental join (using per-key state and timer) and filter."). I guess you did not find either, and that is why you used "THREE".
Also can you fix the errors in the build ?

query = options.getQuery();
try {
query = NexmarkQueryName.valueOf(options.getQuery());
} catch (IllegalArgumentException exc) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand properly, user can specify either --query=3 or --query=THREE and it should end up being the same NexmarkConfiguration containing NexmarkQueryName.THREE. And you use the exception to differentiate between the two cases. Right ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea. The only reason I am keeping the numeric version is to have some backwards compatibility since there's probably lots of dashboards and shell scripts that call this.

Copy link
Contributor

@echauchot echauchot Oct 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you were right to keep the numbers also because, as you mentioned, all the dashboards use them and some scripts and docs also. Besides it is also easier from to CLI to specify --query=0 instead of --query=PASSTHROUGH


private NexmarkQueryModel getNexmarkQueryModel() {
List<NexmarkQueryModel> models = createQueryModels();
Map<NexmarkQueryName, NexmarkQueryModel> models = createQueryModels();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks more natural indeed to use a map for that !

.put(NexmarkQueryName.SEVEN, new Query7Model(configuration))
.put(NexmarkQueryName.EIGHT, new Query8Model(configuration))
.put(NexmarkQueryName.NINE, new Query9Model(configuration))
.build();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot better than the list indeed ! Using a list with null values seemed weak and error prone.

Copy link
Member Author

@kennknowles kennknowles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just went back and checked http://datalab.cs.pdx.edu/niagara/pstream/nexmark.pdf and they have pretty good subtitles for the queries. I think it will be good to use them actually.

query = options.getQuery();
try {
query = NexmarkQueryName.valueOf(options.getQuery());
} catch (IllegalArgumentException exc) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea. The only reason I am keeping the numeric version is to have some backwards compatibility since there's probably lots of dashboards and shell scripts that call this.

@kennknowles kennknowles force-pushed the nexmark-join-to-files branch 3 times, most recently from 4c6be19 to d72ee2a Compare October 25, 2018 17:59
@kennknowles
Copy link
Member Author

OK, please have another look. I didn't rename all the Java classes (yet?) but I could.

@echauchot
Copy link
Contributor

This is perfect ! I did not think about searching the names in the original nexmark paper. Well done ! Renaming the java classes is not needed I think because they are referenced all over the place nad maintainers know them by their numeric names.

Copy link
Contributor

@echauchot echauchot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last thing that I forgot in review round one (sorry). NexmarkUtils.tableSpec needs to output the same table name as before otherwise all the historical values of the dashboards (https://apache-beam-testing.appspot.com/dashboard-admin containing more than 100 tables) will be in BigQuery tables named nexmark_0_DirectRunner_batch and new inserted values will be in BigQuery tables named nexmark_PASSTHROUGH_DirectRunner_batch. Can you make that NexmarkUtils.tableSpec outputs names as before (these names are also used to save output to BQ and to name kafka/pubsub topics etc... but it is less crucial because for them there is no 100 tables history)? Thanks

@kennknowles
Copy link
Member Author

I started to work on that, but I noticed that actually the query name is not this one, but it is the NexmarkQuery.getName() which is the string passed here: https://github.com/apache/beam/blob/master/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/queries/Query0.java#L41

@kennknowles
Copy link
Member Author

Hmm I mean the table name isn't that so I am wrong.

@kennknowles kennknowles force-pushed the nexmark-join-to-files branch from d72ee2a to c4667f0 Compare October 26, 2018 19:33
@kennknowles
Copy link
Member Author

OK I have made the table have the same name. I rebased to resolve some conflicts.

Copy link
Contributor

@echauchot echauchot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ! Thanks for that Kenn !
Merging.

@echauchot echauchot merged commit f8bbd8b into apache:master Oct 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants