Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-8575] Added a unit test to CombineTest class to test that Combi… #10159

Conversation

bumblebee-coming
Copy link
Contributor

@bumblebee-coming bumblebee-coming commented Nov 19, 2019

…neGlobally works fine with fanout and in accumulation mode at the same time.

Added a unit test to CombineTest class to test that CombineGlobally works fine with fanout and in accumulation mode at the same time.

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- Build Status --- --- Build Status
Java Build Status Build Status Build Status Build Status
Build Status
Build Status
Build Status Build Status Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
--- Build Status
Build Status
Build Status
Build Status
--- --- Build Status
XLang --- --- --- Build Status --- --- ---

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status
Build Status
Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

@bumblebee-coming
Copy link
Contributor Author

bumblebee-coming commented Nov 19, 2019

The Java parity of this test is:
testHotKeyCombiningWithAccumulationMode
in file:
sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/CombineTest.java

There are three things to clarify:

  1. The Java test used .pane().isLast() to check and then emit the result for the last pane. This is not necessary. Plus the PaneInfo is not easy to get in Python, which requires using WindowedValue from apache_beam.utils. So in the Python code, we compare the result directly.

  2. Although the names contain "HotKey", the Java test combines globally, not combining per key, and uses WithFanout, not WithHotKeyFanout.

  3. Since all the elements have the same timestamp, ACCUMULATING or DISCARDING doesn't change the result. I'm not very sure about the Java test's intention.

The Python test followed the Java test.

@bumblebee-coming
Copy link
Contributor Author

R: @robertwb

@bumblebee-coming
Copy link
Contributor Author

I re-implemented the test in two different ways:

  1. Use TimestampedValue only.
    test_hot_key_combining_with_accumulation_mode
  2. Use TestStream.
    test_hot_key_combining_with_accumulation_mode2

Both tests succeeded no matter the accumulating mode is ACCUMULATING or DISCARDING.
Is that a bug?

@bumblebee-coming
Copy link
Contributor Author

I re-implemented the test using TestStream, and fixed the order of add_elements and advance_watermark_to.
However, elements are still not discarded in DISCARDING mode.

@bumblebee-coming
Copy link
Contributor Author

I re-implemented the test using TestStream, and fixed the order of add_elements and advance_watermark_to.
However, elements are still not discarded in DISCARDING mode.

I also tried mixing and matching FixedWindows, GlobalWindows with trigger=Repeatedly(AfterCount(1)), trigger=AfterWatermark(early=AfterCount(1)).
all cases: RESULT
FixedWindows trigger=Repeatedly(AfterCount(1)) ACCUMULATING [15]
DISCARDING [15]
FixedWindows trigger=AfterWatermark(early=AfterCount(1)) ACCUMULATING [15, 15]
DISCARDING [15, 0]
GlobalWindows trigger=Repeatedly(AfterCount(1)) ACCUMULATING [15]
DISCARDING [15]
GlobalWindows trigger=AfterWatermark(early=AfterCount(1)) ACCUMULATING [15, 15]
DISCARDING [15, 0]

It seems in both ACCUMULATING mode and DISCARDING mode, all the TimestampedValues are in one single pane.

@bumblebee-coming
Copy link
Contributor Author

Solved the previous problem (all elements crowded into a single firing pane) by make the test pipeline viewed as a streaming.
options = PipelineOptions()
options.view_as(StandardOptions).streaming = True
with TestPipeline(options=options) as p:

I also added some code to record firing panes so that the difference between ACCUMULATING and DISCARDING modes is obvious.

@bumblebee-coming
Copy link
Contributor Author

Note that The number of firings is uncertain.
After all elements have been collected (in ACCUMULATING mode), the trigger may still fire one or two times.
The firings could be
[1, 3, 6, 10, 15, 15] (6 firings)
or [1, 3, 6, 10, 15, 15, 15] (7 firings)
Thus the test only makes sure the first 5 firings are correct.

@liumomo315
Copy link
Contributor

Is this PR ready for another round of review?

@bumblebee-coming bumblebee-coming force-pushed the hot-key-combining-with-accumulation-mode branch 2 times, most recently from da6904e to 2c5d1fd Compare December 17, 2019 23:10
@bumblebee-coming
Copy link
Contributor Author

Run Python PreCommit

@bumblebee-coming bumblebee-coming force-pushed the hot-key-combining-with-accumulation-mode branch from 862cf2f to 29b6c39 Compare December 18, 2019 18:53
@bumblebee-coming bumblebee-coming force-pushed the hot-key-combining-with-accumulation-mode branch from fdee934 to dd66920 Compare January 2, 2020 18:45
@bumblebee-coming
Copy link
Contributor Author

Run Python PreCommit

@bumblebee-coming
Copy link
Contributor Author

retest this please

@bumblebee-coming
Copy link
Contributor Author

Run Python PreCommit

@chamikaramj
Copy link
Contributor

Retest this please

1 similar comment
@chamikaramj
Copy link
Contributor

Retest this please

@robertwb robertwb merged commit 0bfa5e7 into apache:master Jan 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants