New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-2732][BEAM-4028] Logging relies on StateSampler for context #5356
Conversation
Run Python Dataflow ValidatesRunner |
Run Python PostCommit |
1 similar comment
Run Python PostCommit |
Run Python PostCommit |
Run Python PostCommit |
Run Python Dataflow ValidatesRunner |
This unifies context management in Python, which simplifies further feature work, and also expands the use of NameContext, which should improve the separation of runner and sdk harness. |
cead665
to
84f8fb6
Compare
Run Python Dataflow ValidatesRunner |
Run Python PreCommit |
1 similar comment
Run Python PreCommit |
Run Python PreCommit |
Run Python Dataflow ValidatesRunner |
1 similar comment
Run Python Dataflow ValidatesRunner |
r: @charlesccychen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Pablo! This is a great cleanup. Can you also run Robert's benchmarks here with Cython, with and without this change: #4741?
self.operations = operations | ||
self.stage_name = stage_name | ||
# TODO(BEAM-4028): Remove arguments other than name_contexts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this obsolete? The Jira is still open.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops. Not obsolete. Good catch!
@@ -539,19 +529,14 @@ def __init__(self, | |||
windowing: windowing properties of the output PCollection(s) | |||
tagged_receivers: a dict of tag name to Receiver objects | |||
step_name: the name of this step | |||
logging_context: a LoggingContext object | |||
logging_context: DEPRECATED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a JIRA to remove this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added BEAM-4728.
self.windowed_coder = windowed_coder | ||
self.windowed_coder_impl = windowed_coder.get_impl() | ||
self.step_name = step_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this deletion intentional? If so, can you add a comment / JIRA reference to clean up step_name in the arguments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is part of my goal with BEAM-4028. The step name is meant to only be retrievable through the name context.
@@ -49,7 +48,7 @@ def get_data(self): | |||
per_thread_worker_data = _PerThreadWorkerData() | |||
|
|||
|
|||
class PerThreadLoggingContext(LoggingContext): | |||
class PerThreadLoggingContext(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we able to get rid of this class entirely? It looks like you removed the only usage in operations.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class is used internally at google, so we need to remove it from there first.
@@ -34,7 +34,6 @@ class _PerThreadWorkerData(threading.local): | |||
|
|||
def __init__(self): | |||
super(_PerThreadWorkerData, self).__init__() | |||
# TODO(robertwb): Consider starting with an initial (ignored) ~20 elements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accidental deletion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logging context will be removed once it's no longer useful (after it's removed from Google code) so optimizations should not be considered anymore. I'll remove it ASAP as part of BEAM-4728
@@ -62,12 +64,22 @@ def __init__(self, prefix, counter_factory, | |||
sampling_period_ms=DEFAULT_SAMPLING_PERIOD_MS): | |||
self.states_by_name = {} | |||
self._prefix = prefix | |||
self._counter_factory = counter_factory | |||
self._counter_factory = counter_factory or CounterFactory() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the second branch is only used by tests. Can we have the tests pass empty CounterFactory()
s instead of adding this optional behavior in the actual code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Charles!
@@ -49,7 +48,7 @@ def get_data(self): | |||
per_thread_worker_data = _PerThreadWorkerData() | |||
|
|||
|
|||
class PerThreadLoggingContext(LoggingContext): | |||
class PerThreadLoggingContext(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class is used internally at google, so we need to remove it from there first.
@@ -539,19 +529,14 @@ def __init__(self, | |||
windowing: windowing properties of the output PCollection(s) | |||
tagged_receivers: a dict of tag name to Receiver objects | |||
step_name: the name of this step | |||
logging_context: a LoggingContext object | |||
logging_context: DEPRECATED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added BEAM-4728.
self.windowed_coder = windowed_coder | ||
self.windowed_coder_impl = windowed_coder.get_impl() | ||
self.step_name = step_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is part of my goal with BEAM-4028. The step name is meant to only be retrievable through the name context.
@@ -34,7 +34,6 @@ class _PerThreadWorkerData(threading.local): | |||
|
|||
def __init__(self): | |||
super(_PerThreadWorkerData, self).__init__() | |||
# TODO(robertwb): Consider starting with an initial (ignored) ~20 elements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logging context will be removed once it's no longer useful (after it's removed from Google code) so optimizations should not be considered anymore. I'll remove it ASAP as part of BEAM-4728
self.operations = operations | ||
self.stage_name = stage_name | ||
# TODO(BEAM-4028): Remove arguments other than name_contexts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops. Not obsolete. Good catch!
Also removed the extra path in |
Run Python Dataflow ValidatesRunner |
Results of the On master:
With these changes:
|
Thanks Pablo! This LGTM. Can you rebase? It looks like this is a great performance improvement too, cutting down processing overhead. |
Run Python Dataflow ValidatesRunner |
Tests passing. I'll squash and merge this after lunch. |
Squashed commits and resolved conflicts. Reruning tests. |
Run Python Dataflow ValidatesRunner |
Merged. Thanks @charlesccychen for reviewing the large-ish change : ) |
This was meant to be removed, because optimizations are no longer needed.
This list only ever contains one element per work item.
On Wed, Aug 15, 2018 at 2:23 PM Charles Chen ***@***.***> wrote:
***@***.**** commented on this pull request.
In sdks/python/apache_beam/runners/worker/logger.py
<#5356 (comment)>:
> @@ -34,7 +34,6 @@ class _PerThreadWorkerData(threading.local):
def __init__(self):
super(_PerThreadWorkerData, self).__init__()
- # TODO(robertwb): Consider starting with an initial (ignored) ~20 elements
Part of this TODO was accidentally deleted. Please fix.
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#5356 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABPc7Ot3bFoE0T5jPC3Z5l-RCPxci4NFks5uRJFZgaJpZM4T-S3o>
.
--
Got feedback? go/pabloem-feedback
|
The Logging module will no longer implement its own context tracking for step and stage names.
Because it no longer plays part in high-performance operations, I'm removing it and its cython annotations.
Passing PostCommit: https://builds.apache.org/job/beam_PostCommit_Python_Verify/4985/