Skip to content

[BEAM-2718] Improve performance of bundle retry#3815

Closed
mariapython wants to merge 4 commits intoapache:masterfrom
mariapython:retry_custom
Closed

[BEAM-2718] Improve performance of bundle retry#3815
mariapython wants to merge 4 commits intoapache:masterfrom
mariapython:retry_custom

Conversation

@mariapython
Copy link
Contributor

@mariapython mariapython commented Sep 7, 2017

In order to keep a consistent state while handling bundles, every time the state is accessed, a copy is made and kept until the bundle is committed. Here a custom copy is used replicating only the inner structures that are needed and shallow-copying the rest.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.007%) to 69.691% when pulling eba397b on mariapython:retry_custom into 6dfd3a2 on apache:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.003%) to 69.693% when pulling 8650ea6 on mariapython:retry_custom into df1476d on apache:master.

@mariapython
Copy link
Contributor Author

R: @charlesccychen

Copy link
Contributor

@charlesccychen charlesccychen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

cloned_object.global_state = copy.deepcopy(self.global_state)
cloned_object.state = copy.copy(self.state)
for window in self.state:
cloned_object.state[window] = copy.copy(self.state[window])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can just be an empty dictionary.

def custom_copy(self):
cloned_object = copy.copy(self)
cloned_object.timers = copy.deepcopy(self.timers)
cloned_object.global_state = copy.deepcopy(self.global_state)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should do similar logic to the inner loop below here too.

cloned_object = copy.copy(self)
cloned_object.timers = copy.deepcopy(self.timers)
cloned_object.global_state = copy.deepcopy(self.global_state)
cloned_object.state = copy.copy(self.state)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just make this an empty dictionary.

self.global_state = {}
self.defensive_copy = defensive_copy

def custom_copy(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just rename this to clone().

if not self.partial_keyed_state.get(key):
self.partial_keyed_state[key] = self.existing_keyed_state[key].clone()
self.partial_keyed_state[key] = (
self.existing_keyed_state[key].custom_copy())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just rename this to clone().

Copy link
Contributor Author

@mariapython mariapython left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL

cloned_object = copy.copy(self)
cloned_object.timers = copy.deepcopy(self.timers)
cloned_object.global_state = copy.deepcopy(self.global_state)
cloned_object.state = copy.copy(self.state)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

charlesccychen wrote:
We can just make this an empty dictionary.

Done.

self.global_state = {}
self.defensive_copy = defensive_copy

def custom_copy(self):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

charlesccychen wrote:
Let's just rename this to clone().

See my comments above.

if not self.partial_keyed_state.get(key):
self.partial_keyed_state[key] = self.existing_keyed_state[key].clone()
self.partial_keyed_state[key] = (
self.existing_keyed_state[key].custom_copy())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

charlesccychen wrote:
Let's just rename this to clone().

I preferred to avoid that name, as it was not really a deep copy. I changed it to copy().

cloned_object.global_state = copy.deepcopy(self.global_state)
cloned_object.state = copy.copy(self.state)
for window in self.state:
cloned_object.state[window] = copy.copy(self.state[window])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

charlesccychen wrote:
This can just be an empty dictionary.

Done.

def custom_copy(self):
cloned_object = copy.copy(self)
cloned_object.timers = copy.deepcopy(self.timers)
cloned_object.global_state = copy.deepcopy(self.global_state)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

charlesccychen wrote:
We should do similar logic to the inner loop below here too.

global_state doesn't appear to be used by the transform evaluators. In fact, I was thinking of eliminating the line completely (same reason why defensive_copy is not copied).

Copy link
Contributor Author

@mariapython mariapython left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mariapython wrote:
PTAL

Done.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.01%) to 69.493% when pulling 3f1a714 on mariapython:retry_custom into 50532f0 on apache:master.

Copy link
Contributor

@charlesccychen charlesccychen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

cloned_object.state = (
collections.defaultdict(lambda: collections.defaultdict(list)))
for window in self.state:
cloned_object.state[window] = collections.defaultdict(list)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can omit this line--the dict is created for you by the defaultdict mechanism on cloned_object.state.

cloned_object.timers = copy.deepcopy(self.timers)
cloned_object.global_state = copy.deepcopy(self.global_state)
cloned_object.state = (
collections.defaultdict(lambda: collections.defaultdict(list)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove these two lines if you use the approach in the comment above.

self.defensive_copy = defensive_copy

def copy(self):
cloned_object = copy.copy(self)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's omit the copy and just create a new object:

cloned_object = InMemoryUnmergedState(defensive_copy=self.defensive_copy)

cloned_object.timers = copy.deepcopy(self.timers)
cloned_object.global_state = copy.deepcopy(self.global_state)
cloned_object.state = (
collections.defaultdict(lambda: collections.defaultdict(list)))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

charlesccychen wrote:
You can remove these two lines if you use the approach in the comment above.

Done.

cloned_object.state = (
collections.defaultdict(lambda: collections.defaultdict(list)))
for window in self.state:
cloned_object.state[window] = collections.defaultdict(list)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

charlesccychen wrote:
You can omit this line--the dict is created for you by the defaultdict mechanism on cloned_object.state.

Done.

self.defensive_copy = defensive_copy

def copy(self):
cloned_object = copy.copy(self)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

charlesccychen wrote:
Let's omit the copy and just create a new object:

cloned_object = InMemoryUnmergedState(defensive_copy=self.defensive_copy)

Done.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.01%) to 69.496% when pulling 585d8f9 on mariapython:retry_custom into 307b036 on apache:master.

Copy link
Contributor

@charlesccychen charlesccychen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

R: @chamikaramj for merge

@asfgit asfgit closed this in 979923a Sep 15, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants