[BEAM-5817] Use an explicit enum for naming Nexmark benchmarks#6780
[BEAM-5817] Use an explicit enum for naming Nexmark benchmarks#6780echauchot merged 1 commit intoapache:masterfrom
Conversation
05f6eec to
586df61
Compare
|
Could also be a string all the way through. It hardly matters, that. |
|
@kennknowles Thanks for your work. I finish a thing on the spark runner and I start the review before the end of the week |
|
@kennknowles I'm doing a spark runner break :) so I can take a look at your PR after all. |
echauchot
left a comment
There was a problem hiding this comment.
Thanks Kenn for your work !
Code looks good to me. I see an improvement of using a Map instead of a List in queries list regarding null management. Except that, for ex calling query "3" or "THREE" does not make a big difference IMHO. I would have liked having a more meaningful name but I could not find a concise name neither from the functional description ("Who is selling in particular US states?") nor from the technical one (" Illustrates an incremental join (using per-key state and timer) and filter."). I guess you did not find either, and that is why you used "THREE".
Also can you fix the errors in the build ?
| query = options.getQuery(); | ||
| try { | ||
| query = NexmarkQueryName.valueOf(options.getQuery()); | ||
| } catch (IllegalArgumentException exc) { |
There was a problem hiding this comment.
If I understand properly, user can specify either --query=3 or --query=THREE and it should end up being the same NexmarkConfiguration containing NexmarkQueryName.THREE. And you use the exception to differentiate between the two cases. Right ?
There was a problem hiding this comment.
Yea. The only reason I am keeping the numeric version is to have some backwards compatibility since there's probably lots of dashboards and shell scripts that call this.
There was a problem hiding this comment.
Yes you were right to keep the numbers also because, as you mentioned, all the dashboards use them and some scripts and docs also. Besides it is also easier from to CLI to specify --query=0 instead of --query=PASSTHROUGH
|
|
||
| private NexmarkQueryModel getNexmarkQueryModel() { | ||
| List<NexmarkQueryModel> models = createQueryModels(); | ||
| Map<NexmarkQueryName, NexmarkQueryModel> models = createQueryModels(); |
There was a problem hiding this comment.
It looks more natural indeed to use a map for that !
| .put(NexmarkQueryName.SEVEN, new Query7Model(configuration)) | ||
| .put(NexmarkQueryName.EIGHT, new Query8Model(configuration)) | ||
| .put(NexmarkQueryName.NINE, new Query9Model(configuration)) | ||
| .build(); |
There was a problem hiding this comment.
A lot better than the list indeed ! Using a list with null values seemed weak and error prone.
kennknowles
left a comment
There was a problem hiding this comment.
I just went back and checked http://datalab.cs.pdx.edu/niagara/pstream/nexmark.pdf and they have pretty good subtitles for the queries. I think it will be good to use them actually.
| query = options.getQuery(); | ||
| try { | ||
| query = NexmarkQueryName.valueOf(options.getQuery()); | ||
| } catch (IllegalArgumentException exc) { |
There was a problem hiding this comment.
Yea. The only reason I am keeping the numeric version is to have some backwards compatibility since there's probably lots of dashboards and shell scripts that call this.
4c6be19 to
d72ee2a
Compare
|
OK, please have another look. I didn't rename all the Java classes (yet?) but I could. |
|
This is perfect ! I did not think about searching the names in the original nexmark paper. Well done ! Renaming the java classes is not needed I think because they are referenced all over the place nad maintainers know them by their numeric names. |
echauchot
left a comment
There was a problem hiding this comment.
One last thing that I forgot in review round one (sorry). NexmarkUtils.tableSpec needs to output the same table name as before otherwise all the historical values of the dashboards (https://apache-beam-testing.appspot.com/dashboard-admin containing more than 100 tables) will be in BigQuery tables named nexmark_0_DirectRunner_batch and new inserted values will be in BigQuery tables named nexmark_PASSTHROUGH_DirectRunner_batch. Can you make that NexmarkUtils.tableSpec outputs names as before (these names are also used to save output to BQ and to name kafka/pubsub topics etc... but it is less crucial because for them there is no 100 tables history)? Thanks
|
I started to work on that, but I noticed that actually the query name is not this one, but it is the |
|
Hmm I mean the table name isn't that so I am wrong. |
d72ee2a to
c4667f0
Compare
|
OK I have made the table have the same name. I rebased to resolve some conflicts. |
echauchot
left a comment
There was a problem hiding this comment.
LGTM ! Thanks for that Kenn !
Merging.
The "Nexmark" queries are numbers 0 through twelve. Only 1 through 8 are actually from the original Nexmark suite. It makes sense because the data set is so convenient that we might have many interesting queries and use cases to build.
Preparing to add more, I wanted to stop just making up numbers and start giving them names.
Follow this checklist to help us incorporate your contribution quickly and easily:
[BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replaceBEAM-XXXwith the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.It will help us expedite review of your Pull Request if you tag someone (e.g.
@username) to look at it.Post-Commit Tests Status (on master branch)