-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-2914] Add TimestampPrefixingWindowCoder as beam:coder:custom_window:v1. #15137
Conversation
Codecov Report
@@ Coverage Diff @@
## master #15137 +/- ##
==========================================
+ Coverage 83.77% 83.83% +0.06%
==========================================
Files 441 441
Lines 59800 59692 -108
==========================================
- Hits 50095 50043 -52
+ Misses 9705 9649 -56
Continue to review full report at Codecov.
|
a416bd0
to
b7e1e7a
Compare
Run Java PreCommit |
Run GoPortable PreCommit |
...nstruction-java/src/test/java/org/apache/beam/runners/core/construction/CommonCoderTest.java
Outdated
Show resolved
Hide resolved
sdks/java/core/src/main/java/org/apache/beam/sdk/coders/TimestampPrefixingWindowCoder.java
Show resolved
Hide resolved
sdks/java/core/src/test/java/org/apache/beam/sdk/coders/TimestampPrefixingWindowCoderTest.java
Outdated
Show resolved
Hide resolved
sdks/java/core/src/test/java/org/apache/beam/sdk/coders/TimestampPrefixingWindowCoderTest.java
Show resolved
Hide resolved
sdks/java/core/src/main/java/org/apache/beam/sdk/coders/TimestampPrefixingWindowCoder.java
Outdated
Show resolved
Hide resolved
sdks/java/core/src/main/java/org/apache/beam/sdk/coders/TimestampPrefixingWindowCoder.java
Outdated
Show resolved
Hide resolved
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/IntervalWindow.java
Show resolved
Hide resolved
sdks/java/core/src/test/java/org/apache/beam/sdk/coders/TimestampPrefixingWindowCoderTest.java
Outdated
Show resolved
Hide resolved
sdks/java/core/src/test/java/org/apache/beam/sdk/coders/TimestampPrefixingWindowCoderTest.java
Outdated
Show resolved
Hide resolved
sdks/java/core/src/test/java/org/apache/beam/sdk/coders/TimestampPrefixingWindowCoderTest.java
Show resolved
Hide resolved
Run Java PreCommit |
1 similar comment
Run Java PreCommit |
# } | ||
# System.out.println(example); | ||
coder: | ||
urn: "beam:coder:custom_window:v1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be timestamp_prefixed (similar to length_prefixed)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think custom_window makes it a bit more clear that this is a coder for window type. timestamp_prefixed also works as long as sdk and runner agree with the urn I guess.
# System.out.println(example); | ||
coder: | ||
urn: "beam:coder:custom_window:v1" | ||
components: [{urn: "beam:coder:interval_window:v1"}] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be limited to interval_windows, any windowed_coder should do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, using interval_window coder here makes it easier to verify in standard coder test.
// maxTimestamp - A big endian 8 byte integer representing millis-since-epoch. | ||
// The encoded representation is shifted so that the byte representation | ||
// of negative values are lexicographically ordered before the byte | ||
// representation of positive values. This is typically done by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"typically done" makes it sound like any transformation preserving lexicographic ordering is acceptable. There is no choice here.
Perhaps we could reference other coders here (e.g. the encoding used for TimestmpCoder, or in WindowedValueCoder?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggested copying it verbatim as that is the text in those segments already.
|
||
// Encodes an arbitrary user defined window and its max timestamp (inclusive). | ||
// The encoding format is: | ||
// maxTimestamp window |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is the window encoded? Should we length prefix it as well? Or do we expect TimestampPrefixed(LenghtPrefixed(CustomWindowCoder))?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think whether it should be length prefixed can be determined by runner with coder overwrite, since we are not limiting the wrapped window coder type, known coder can be also used(though it is unlikely in real use case, but it is handy for unit tests) which don't need length prefix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather have the length prefix to be a component as Yichi describes since this will allow other length prefixing schemes then the current one we have.
def is_deterministic(self) -> bool: | ||
return self._window_coder.is_deterministic() | ||
|
||
def as_cloud_object(self, coders_context=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed?
@@ -21,6 +21,7 @@ | |||
class CloudObjectKinds { | |||
static final String KIND_GLOBAL_WINDOW = "kind:global_window"; | |||
static final String KIND_INTERVAL_WINDOW = "kind:interval_window"; | |||
static final String KIND_CUSTOM_WINDOW = "kind:custom_window"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For translating it as a known coder on runner side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't needed since we shouldn't need to pass these in anymore to Dataflow since the beam proto -> DFE conversion should happen internally but I didn't ask to remove it since it was already done.
private static class CustomWindowCoder extends CustomCoder<CustomWindow> { | ||
|
||
private static final Coder<IntervalWindow> INTERVAL_WINDOW_CODER = IntervalWindow.getCoder(); | ||
private static final int REGISTER_BYTE_SIZE = 1234; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not following what is meant by the REGISTER_BYTE_SIZE
constant. BYTE_SIZE_TO_REGISTER
?
Add a new common standard coder beam:coder:custom_window:v1. The coder encodes arbitrary bounded window by prefixing the max timestamp to its encoded form using the window's registered coder. e.g.:
max_timestamp
encoded_window_bytes
The idea is that, any arbitrary window extending bounded window must have max timestamp defined and runner only needs to know the max timestamp when processing trigger timers. The encoded bytes using the original custom window's registered coder can be used transparently on runner side as the identity of the window.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
ValidatesRunner
compliance status (on master branch)Examples testing status on various runners
Post-Commit SDK/Transform Integration Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.