New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-3759] Add support for PaneInfo in WindowedValues #4763
Conversation
R: @robertwb |
""" | ||
|
||
def __init__(self, value, timestamp, windows): | ||
def __init__(self, value, timestamp, windows, pane_info=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Robert, can you comment on whether this extra kwarg is suitable, performance-wise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be fine, as long as it's not usually passed as a kwarg. (Also, most of the time we create these with .with_value
which is more optimized.)
@@ -670,6 +670,85 @@ def _construct_from_sequence(self, components): | |||
return components | |||
|
|||
|
|||
# A PaneInfo descriptor can be encoded in three different ways: (1) with a | |||
# single byte (PANE_INFO_ENCODING_FIRST), (2) with a single byte followed by | |||
# a varint describing the a single index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and (3)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or perhaps just remove this, as the comment below is sufficient.
@@ -104,11 +211,12 @@ def __reduce__(self): | |||
|
|||
|
|||
# TODO(robertwb): Move this to a static method. | |||
def create(value, timestamp_micros, windows): | |||
def create(value, timestamp_micros, windows, pane_info=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just let the default value be PANE_INFO_UNKNOWN.
@@ -90,7 +196,8 @@ def __cmp__(left, right): # pylint: disable=no-self-argument | |||
def _typed_eq(left, right): | |||
return (left.timestamp_micros == right.timestamp_micros | |||
and left.value == right.value | |||
and left.windows == right.windows) | |||
and left.windows == right.windows | |||
and left.pane_info == right.pane_info) | |||
|
|||
def with_value(self, new_value): | |||
"""Creates a new WindowedValue with the same timestamps and windows as this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix with_value to propagate the pane info (and test).
@@ -104,11 +211,12 @@ def __reduce__(self): | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix __reduce__
to rememeber the PaneInfo (and test).
|
||
|
||
def _construct_pane_info_map(): | ||
result = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this a list, not a map.
|
||
def _create_impl(self): | ||
return coder_impl.WindowedValueCoderImpl( | ||
self.wrapped_value_coder.get_impl(), | ||
self.timestamp_coder.get_impl(), | ||
self.window_coder.get_impl()) | ||
self.window_coder.get_impl(), | ||
self.pane_info_coder.get_impl()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than introducing a PaneInfoCoder, just instantiate the PaneInfoCoderImpl in WindowedValueCoderImpl's constructor. (There's a TODO to remove the parameterizablility of the TimestampCoder as well.) Actually, we could consider just making {en,de}code_pane_info methods on WindowedValueCoder itself.
@@ -734,13 +814,14 @@ def decode_from_stream(self, in_stream, nested): | |||
# Read PaneInfo encoded byte. | |||
# TODO(BEAM-1522): Ignored for now but should be converted to pane info once |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can now remove this TODO.
@charlesccychen any updates on this? |
077b2dd
to
79e421b
Compare
This change allows triggering information to be stored along with WindowedValues.
79e421b
to
95956ae
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, PTAL.
@@ -90,7 +196,8 @@ def __cmp__(left, right): # pylint: disable=no-self-argument | |||
def _typed_eq(left, right): | |||
return (left.timestamp_micros == right.timestamp_micros | |||
and left.value == right.value | |||
and left.windows == right.windows) | |||
and left.windows == right.windows | |||
and left.pane_info == right.pane_info) | |||
|
|||
def with_value(self, new_value): | |||
"""Creates a new WindowedValue with the same timestamps and windows as this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
robertwb wrote:
Fix with_value to propagate the pane info (and test).
Done.
@@ -30,8 +30,6 @@ | |||
|
|||
from types import NoneType | |||
|
|||
import six | |||
|
|||
from apache_beam.coders import observable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
robertwb wrote:
Or perhaps just remove this, as the comment below is sufficient.
Done.
def __hash__(self): | ||
return hash(type(self)) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
robertwb wrote:
Rather than introducing a PaneInfoCoder, just instantiate the PaneInfoCoderImpl in WindowedValueCoderImpl's constructor. (There's a TODO to remove the parameterizablility of the TimestampCoder as well.) Actually, we could consider just making {en,de}code_pane_info methods on WindowedValueCoder itself.
I am going to get rid of PaneInfoCoder but keep the PaneInfoCoderImpl, because it's more natural to keep it separate, as it would clutter the WindowedValueCoderImpl with encode/decode_paneinfo_to_stream and estimate_paneinfo_size.
@@ -689,11 +766,13 @@ def _from_normal_time(self, value): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
robertwb wrote:
You can now remove this TODO.
Done.
@@ -104,11 +211,12 @@ def __reduce__(self): | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
robertwb wrote:
Fix__reduce__
to rememeber the PaneInfo (and test).
Done.
@@ -104,11 +211,12 @@ def __reduce__(self): | |||
|
|||
|
|||
# TODO(robertwb): Move this to a static method. | |||
def create(value, timestamp_micros, windows): | |||
def create(value, timestamp_micros, windows, pane_info=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
robertwb wrote:
Just let the default value be PANE_INFO_UNKNOWN.
Done.
|
||
|
||
def _construct_pane_info_map(): | ||
result = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
robertwb wrote:
Make this a list, not a map.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks. Just to be sure, could you run the benchmarks at #4741 ?
Thanks. The benchmarks seem very comparable. Without this change (at HEAD^):
With this change:
|
Totally within the margin of error. Thanks. |
This change allows triggering information to be stored along with WindowedValues.