Skip to content

[BEAM-2718] Add test to fix partial writeouts after a bundle retry#3833

Closed
mariapython wants to merge 5 commits intoapache:masterfrom
mariapython:retry_tests
Closed

[BEAM-2718] Add test to fix partial writeouts after a bundle retry#3833
mariapython wants to merge 5 commits intoapache:masterfrom
mariapython:retry_tests

Conversation

@mariapython
Copy link
Contributor

@mariapython mariapython commented Sep 11, 2017

No description provided.

@mariapython mariapython changed the title Add test to fix partial writeouts after a bundle retry [BEAM-2718] Add test to fix partial writeouts after a bundle retry Sep 12, 2017
@mariapython
Copy link
Contributor Author

R: @charlesccychen

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.01%) to 69.503% when pulling af94b7b on mariapython:retry_tests into 8e391d9 on apache:master.

Copy link
Contributor

@charlesccychen charlesccychen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!


def start_bundle(self):
self.step_context = self._execution_context.get_step_context()
self.step_context.clear_partial_states()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to have a reset() method on _ExecutionContext that can be called to clear the step context and remove clear_partial_states(). We can then call this in attempt_call().

from collections import defaultdict
from apache_beam.runners.direct.transform_evaluator import _TransformEvaluator
from apache_beam.runners.direct.transform_evaluator import _GroupByKeyOnlyEvaluator
from apache_beam.runners.direct.evaluation_context import _ExecutionContext
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These can be at the top of the file (underscore imports are ok in tests).

self.partial_keyed_state[key] = self.existing_keyed_state[key].clone()
return self.partial_keyed_state[key]

def clear_partial_states(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to have a reset() method on _ExecutionContext that can be called to clear the step context and remove clear_partial_states(). We can then call this in attempt_call().

Copy link
Contributor Author

@mariapython mariapython left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL

from collections import defaultdict
from apache_beam.runners.direct.transform_evaluator import _TransformEvaluator
from apache_beam.runners.direct.transform_evaluator import _GroupByKeyOnlyEvaluator
from apache_beam.runners.direct.evaluation_context import _ExecutionContext
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

charlesccychen wrote:
These can be at the top of the file (underscore imports are ok in tests).

Done.

self.partial_keyed_state[key] = self.existing_keyed_state[key].clone()
return self.partial_keyed_state[key]

def clear_partial_states(self):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

charlesccychen wrote:
I think it would be better to have a reset() method on _ExecutionContext that can be called to clear the step context and remove clear_partial_states(). We can then call this in attempt_call().

Done.


def start_bundle(self):
self.step_context = self._execution_context.get_step_context()
self.step_context.clear_partial_states()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

charlesccychen wrote:
I think it would be better to have a reset() method on _ExecutionContext that can be called to clear the step context and remove clear_partial_states(). We can then call this in attempt_call().

Done.

@mariapython
Copy link
Contributor Author

Retest this please

1 similar comment
@mariapython
Copy link
Contributor Author

Retest this please

@coveralls
Copy link

Coverage Status

Coverage increased (+0.02%) to 69.543% when pulling 3000042 on mariapython:retry_tests into bd0facc on apache:master.

Copy link
Contributor

@charlesccychen charlesccychen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!


At sdks/python/apache_beam/runners/direct/evaluation_context.py:344:

          self.existing_keyed_state[key].copy())

This can fit on just one line.

side_input_values, scoped_metrics_container)
evaluator._execution_context.reset()
if hasattr(evaluator, 'step_context'):
evaluator.global_state = evaluator.step_context.get_keyed_state(None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we set the global_state property here? The evaluator should be responsible for doing this.

return self._step_context

def reset(self):
if self._step_context:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method can be simplified to:

# Reset step context, which may contain partial state.
self._step_context = None

Copy link
Contributor Author

@mariapython mariapython left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At sdks/python/apache_beam/runners/direct/evaluation_context.py:344:

          self.existing_keyed_state[key].copy())

charlesccychen wrote:
This can fit on just one line.

Done.

side_input_values, scoped_metrics_container)
evaluator._execution_context.reset()
if hasattr(evaluator, 'step_context'):
evaluator.global_state = evaluator.step_context.get_keyed_state(None)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

charlesccychen wrote:
Why do we set the global_state property here? The evaluator should be responsible for doing this.

Done.

return self._step_context

def reset(self):
if self._step_context:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

charlesccychen wrote:
This method can be simplified to:

# Reset step context, which may contain partial state.
self._step_context = None

Done.

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling 2bc3b24 on mariapython:retry_tests into ** on apache:master**.

Copy link
Contributor

@charlesccychen charlesccychen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

if not self.partial_keyed_state.get(key):
self.partial_keyed_state[key] = (
self.existing_keyed_state[key].copy())
self.partial_keyed_state[key] = (self.existing_keyed_state[key].copy())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for extra parentheses.

return self._step_context

def reset(self):
if self._step_context:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need the if here.


def __init__(self):
self._execution_context = _ExecutionContext(None, {})
self._execution_context.get_step_context().get_keyed_state(None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we do this here? Can you clarify or comment?

from collections import defaultdict
from apache_beam.runners.direct.transform_evaluator import _TransformEvaluator
from apache_beam.runners.direct.transform_evaluator import _GroupByKeyOnlyEvaluator
from apache_beam.runners.direct.evaluation_context import _ExecutionContext
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mariapython wrote:
Done.

I don't see the change here.

Copy link
Contributor Author

@mariapython mariapython left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL

if not self.partial_keyed_state.get(key):
self.partial_keyed_state[key] = (
self.existing_keyed_state[key].copy())
self.partial_keyed_state[key] = (self.existing_keyed_state[key].copy())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

charlesccychen wrote:
No need for extra parentheses.

Done.

return self._step_context

def reset(self):
if self._step_context:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

charlesccychen wrote:
You don't need the if here.

Done.


def __init__(self):
self._execution_context = _ExecutionContext(None, {})
self._execution_context.get_step_context().get_keyed_state(None)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

charlesccychen wrote:
Why do we do this here? Can you clarify or comment?

Done.

from collections import defaultdict
from apache_beam.runners.direct.transform_evaluator import _TransformEvaluator
from apache_beam.runners.direct.transform_evaluator import _GroupByKeyOnlyEvaluator
from apache_beam.runners.direct.evaluation_context import _ExecutionContext
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

charlesccychen wrote:
I don't see the change here.

Done.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.02%) to 69.545% when pulling be5e425 on mariapython:retry_tests into 352f106 on apache:master.

Copy link
Contributor

@charlesccychen charlesccychen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! R: @chamikaramj for merge.


mariapython wrote:
PTAL

Done.

@asfgit asfgit closed this in 2aa3b5c Sep 26, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants