[BEAM-2855] nexmark python suite implement implement query 3, 4, 5, 6, 7, 8, 11 #12580

leiyiz · 2020-08-14T01:25:07Z

implemented querys
made a little change to nexmark Launcher to fix a bug where the error is not passed out correctly

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

Choose reviewer(s) and mention them in a comment (R: @username).
Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
Update CHANGES.md with noteworthy changes.
If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

Post-Commit Tests Status (on master branch)

Lang	Dataflow	Samza	Twister2
Go	---	---	---
Java
Python		---	---
XLang	---	---	---

Pre-Commit Tests Status (on master branch)

---	Java	Python	Go	Website
Non-portable
Portable	---		---	---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

GitHub Actions Tests Status (on master branch)

See CI.md for more information about GitHub Actions CI.

leiyiz · 2020-08-14T01:25:34Z

R: @y1chi
R: @pabloem

sdks/python/apache_beam/testing/benchmarks/nexmark/models/auction_count.py

sdks/python/apache_beam/testing/benchmarks/nexmark/models/bids_per_session.py

sdks/python/apache_beam/testing/benchmarks/nexmark/models/auction_count.py

sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query3.py

sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query6.py

y1chi

LGTM, just some minor nits.

sdks/python/apache_beam/testing/benchmarks/nexmark/models/result_name.py

sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query3.py

sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query4.py

sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query3.py

leiyiz · 2020-08-20T07:26:19Z

Run Python PreCommit

sdks/python/apache_beam/testing/benchmarks/nexmark/nexmark_launcher.py

pabloem · 2020-08-21T05:41:58Z

sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query7.py

+  return (
+      sliding_bids
+      | 'select_bids' >> beam.ParDo(
+          SelectMaxBidFn(), beam.pvalue.AsSingleton(max_prices)))


Is this how this is implemented in Java? I am wondering if we should make bids comparable. If they were comparable, then you would be able to just return max_prices

e.g.:

@functools.total_ordering class ComparableBidByPrice(object): def __init__(self, bid): self.bid = bid def __eq__(self, other): return self.bid == other.bid def __lt__(self, other): return self.bid.price < other.bid.price

And then you'd do:

max_bids = ( sliding_bids | beam.Map(ComparableBidByPrice) | beam.CombineGlobally(max).without_defaults())

thoughts? The main thing here is having one fewer stage, thus higher performance - but I think the best option is to do whatever Java does.

I think one of the purpose would be to benchmark the performance of side inputs, thus some pipelines are choosing certain beam semantics that may not be the best way.

that's reasonable. Than can you just add a comment @leiyiz ?

Yeah, I read the code and the reason to not use combiner, which is more efficient, is to utilize the side-input functionality

can you add a comment on the file in a follow up PR please?

sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query6.py

pabloem · 2020-08-21T05:54:14Z

sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query6.py

+      | beam.WindowInto(
+          window.GlobalWindows(),
+          trigger=trigger.Repeatedly(trigger.AfterCount(1)),
+          accumulation_mode=trigger.AccumulationMode.ACCUMULATING,
+          allowed_lateness=0)


This trigger is a little hard to wrap my head around : ) can you help me understand it? So I guess we simply accumulate fired panes and fire everything every time? (let's say that the stream contains a new element every second. Would we fire 1000 elements after 1000 seconds?)

yes, so every time an event arrives, it should fire and calculate

because it is calculating the mean, every time something arrives it calculates the mean 1 more time

sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query4.py

pabloem · 2020-08-21T06:03:23Z

sdks/python/apache_beam/testing/benchmarks/nexmark/queries/query3.py

+
+  @on_timer(person_timer_spec)
+  def expiry(self, person_state=beam.DoFn.StateParam(person_spec)):
+    person_state.clear()


Can it happen that the same person creates new auctions after > max_auction_waiting_time ? Will there be a new person event? If we get new auctions after the person is expired, then we'll just keep adding them to auction_state forevetr, no?

I think it would, then the new auction for person is just added to the state and eventually dropped when the pipeline ends. but it is also specified in the nexmark spec as "clear the state after TTL" also the default timer is like 600 seconds long which is way longer than the test duration so I think it is less of a issue?

… implement implement query 3, 4, 5, 6, 7, 8, 11 * implement query 3, 4, 5, 6, 7, 8, 11 * forgot to run pylint2_3 * using dict instead of object for results of query * added to_type_hint for coders, fixed issues brought up in code review * reversed the sorting to not remove from front of list Co-authored-by: Leiyi Zhang <leiyiz@google.com>

implement query 3, 4, 5, 6, 7, 8, 11

3085e4a

probot-autolabeler bot added the python label Aug 14, 2020

forgot to run pylint2_3

4a67c7b

y1chi reviewed Aug 14, 2020

View reviewed changes

leiyiz marked this pull request as draft August 14, 2020 18:26

using dict instead of object for results of query

9aa549f

leiyiz marked this pull request as ready for review August 17, 2020 22:44

y1chi approved these changes Aug 18, 2020

View reviewed changes

added to_type_hint for coders, fixed issues brought up in code review

f972b52

pabloem reviewed Aug 21, 2020

View reviewed changes

reversed the sorting to not remove from front of list

553c078

pabloem merged commit 66055db into apache:master Aug 22, 2020

leiyiz deleted the nexmark_query_implementation branch August 22, 2020 00:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BEAM-2855] nexmark python suite implement implement query 3, 4, 5, 6, 7, 8, 11 #12580

[BEAM-2855] nexmark python suite implement implement query 3, 4, 5, 6, 7, 8, 11 #12580

leiyiz commented Aug 14, 2020 •

edited

leiyiz commented Aug 14, 2020

y1chi left a comment

leiyiz commented Aug 20, 2020

pabloem Aug 21, 2020

y1chi Aug 21, 2020

pabloem Aug 21, 2020

leiyiz Aug 21, 2020

pabloem Aug 22, 2020

pabloem Aug 21, 2020

leiyiz Aug 21, 2020

leiyiz Aug 21, 2020

pabloem Aug 21, 2020

leiyiz Aug 21, 2020

[BEAM-2855] nexmark python suite implement implement query 3, 4, 5, 6, 7, 8, 11 #12580

[BEAM-2855] nexmark python suite implement implement query 3, 4, 5, 6, 7, 8, 11 #12580

Conversation

leiyiz commented Aug 14, 2020 • edited

Post-Commit Tests Status (on master branch)

Pre-Commit Tests Status (on master branch)

GitHub Actions Tests Status (on master branch)

leiyiz commented Aug 14, 2020

y1chi left a comment

Choose a reason for hiding this comment

leiyiz commented Aug 20, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leiyiz commented Aug 14, 2020 •

edited