Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-2606] make WindowFnTestUtils use the value in addition to the timestamp of the elements #3592

Closed

Conversation

echauchot
Copy link
Contributor

Follow this checklist to help us incorporate your contribution quickly and easily:

  • Make sure there is a JIRA issue filed for the change (usually before you start working on it). Trivial changes like typos do not require a JIRA issue. Your pull request should address just this issue, without pulling in other changes.
  • Each commit in the pull request should have a meaningful subject line and body.
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue.
  • Write a pull request description that is detailed enough to understand what the pull request does, how, and why.
  • Run mvn clean verify to make sure basic checks pass. A more thorough check will be performed on your pull request automatically.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

R: @kennknowles
Besides, I saw that the current WindowFnTestUtils has no test. I guess that is because all windows use WindowFnTestUtils to test themselves. So I did not add any test. That is said, the methods added in that PR are actually called by the current windows tests but always with a null value in TimestampedValue. I was thinking I could wait for that PR #3286 to be merged. Indeed it defines a CustomWindowFn in WindowTest that relies on values for window assignment. I could then add a test of this CustomWindowFn with the new WindowFnTestUtils.
@kennknowles WDYT?

@echauchot
Copy link
Contributor Author

retest this please

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling 3f02004 on echauchot:use_timestampeValue_WindowFnTestUtils into ** on apache:master**.

@echauchot
Copy link
Contributor Author

@kennknowles now that PR #3286 is merged, do you think it's useful to add tests for each public method WindowFnTestUtils.*WithValue that use the CustomWindow (https://github.com/apache/beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/windowing/WindowTest.java#L629) to rely on element values ?

@echauchot
Copy link
Contributor Author

+R: @bjchambers

timestampedValues.add(TimestampedValue.of((T) null, new Instant(timestamp)));
}
return runWindowFnWithValue(windowFn, timestampedValues);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unnecessary blank line

@@ -67,14 +68,30 @@
public static <T, W extends BoundedWindow> Map<W, Set<String>> runWindowFn(
WindowFn<T, W> windowFn,
List<Long> timestamps) throws Exception {
// lift List<Timestamp> into List<TimestampedValue> to factorize implementation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: comment is probably unnecessary -- the code is relatively straightforward

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

// lift List<Timestamp> into List<TimestampedValue> to factorize implementation
List<TimestampedValue<T>> timestampedValues = new ArrayList<>();
for (Long timestamp : timestamps){
timestampedValues.add(TimestampedValue.of((T) null, new Instant(timestamp)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this should take the value rather than using null? Alternatively, it could take an Iterable values and pair them up. Then it could use Iterables.cycle((T) null) as a default behavior?

Copy link
Contributor Author

@echauchot echauchot Aug 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand. There is already a runWindowFn() version that takes the values, it is runWindowFnWithValue(). Similarly, I did 2 versions of each public method (one that takes only the timestamps and one that takes also the value with a TimestampedValue). Indeed, I wanted to keep the existing API and enhance it with a version that also relies on values of elements for window assignment. The *withValue version is called with null by the non-value version just to factorize the code.

public static <T, W extends BoundedWindow> Collection<W> assignedWindows(
WindowFn<T, W> windowFn, long timestamp) throws Exception {
return windowFn.assignWindows(new TestAssignContext<T, W>(new Instant(timestamp), windowFn));
// lift Timestamp into TimestampedValue to factorize implementation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this comment is probably unnecessary too

@@ -197,9 +229,21 @@ public void merge(Collection<W> otherWindows, W window) {
*/
public static <T, W extends BoundedWindow> void validateNonInterferingOutputTimes(
WindowFn<T, W> windowFn, long timestamp) throws Exception {
Collection<W> windows = WindowFnTestUtils.<T, W>assignedWindows(windowFn, timestamp);
// lift Timestamp into TimestampedValue to factorize implementation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ditto

*/
public static <T, W extends BoundedWindow> void validateNonInterferingOutputTimesWithValue(
WindowFn<T, W> windowFn, TimestampedValue<T> timestampedValue) throws Exception {
Collection<W> windows = assignedWindowsWithValue(windowFn, timestampedValue);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For methods like this where the value isn't used, I think it would be better to go the other direction -- move the logic into a single method that takes an Instant timestamp, and then call that using timestampedValue.getTimestamp().

It seems odd to have to create a TimestampedValue using null only to discard it and go back to just the timestamp.

Copy link
Contributor Author

@echauchot echauchot Aug 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this method the value is actually used in assignedWindowsWithValue() which creates a TestAssignContext initialized with a proper timestampedValue that has non-null value. It is there to allow assigning elements to windows based on values in addition to timestamp (see WindowTest#testMergingCustomWindows). If I move the code to a method that only takes the timestamp, I'm unable to assign windows based on values anymore. Of course, there is still the version that only takes the timestamp. In order to have for each public method the 2 versions, do you see a better way to factorize than generalizing timestampedValue and pass it null for the timestamp-only version?

@@ -269,15 +333,46 @@ void validateGetOutputTimestamps(
TimestampCombiner timestampCombiner,
List<List<Long>> timestampsPerWindow) throws Exception {

// lift List<List<Timestamp>> into List<List<TimestampedValue>> to factorize implementation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: same here


/**
* Runs the {@link WindowFn} over the provided input, returning a map
* of windows to the timestamps in those windows. This version allows to pass a list of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated, but odd, and maybe worth addressing. This says it is a map from windows to the timestamps in those windows, but it is a Set<String>. Should this be Set<Instant>?

Copy link
Contributor Author

@echauchot echauchot Aug 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this comes from the original API that transforms the timestamp into a String see

private static String timestampValue(long timestamp) {
    return "T" + new Instant(timestamp);
  }

Once again, I tried to not alter the public test API. Maybe using String instead of Instant was chosen to avoid creating many instances of Instant in the tests. See for example FixedWindowsTest or other tests, they use long primitive type and use WindowFnTestUtils.set() which uses WindowFnTestUtils.timestampValue() to convert to String.

@@ -67,14 +68,30 @@
public static <T, W extends BoundedWindow> Map<W, Set<String>> runWindowFn(
WindowFn<T, W> windowFn,
List<Long> timestamps) throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than using List<Long> to represent timestamps, we should probably use List<Instant> so the types are more explicit?

Copy link
Contributor Author

@echauchot echauchot Aug 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I totally agree, more explicit indeed. I did not change it to keep the same test API. I guess it was originally chosen to use long instead of Instant to avoid creating a lot of Instant instances in the tests and also make the test code more concise. But I can change it to Instant and update all the tests that use WindowFnTestUtils if you think it's better

@echauchot
Copy link
Contributor Author

Thanks Ben for your review

@echauchot echauchot force-pushed the use_timestampeValue_WindowFnTestUtils branch from 3f02004 to 396977c Compare August 11, 2017 12:31
@echauchot
Copy link
Contributor Author

@bjchambers, I only rebased on master and cleaned unneeded comments. PTAL at my responses to your comments.
Thanks.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.02%) to 69.987% when pulling 396977c on echauchot:use_timestampeValue_WindowFnTestUtils into 07e8cd5 on apache:master.

@echauchot
Copy link
Contributor Author

@bjchambers can you please take a look at my answers to your comments?

@lukecwik
Copy link
Member

Merging

@asfgit asfgit closed this in da531b7 Sep 27, 2017
@echauchot
Copy link
Contributor Author

Thank you guys :)

@echauchot echauchot deleted the use_timestampeValue_WindowFnTestUtils branch September 29, 2017 14:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants