Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address #54 and improve interrupted shutdown #56

Merged
merged 6 commits into from Dec 16, 2019
Merged

Address #54 and improve interrupted shutdown #56

merged 6 commits into from Dec 16, 2019

Conversation

@romain-intel
Copy link
Contributor

romain-intel commented Dec 15, 2019

Although not fully guaranteeing a clean shutdown every time, this patch makes the shutdown a little more robust (giving time to children process to shutdown), prevents multiple CTRL-C from interrupting the shutdown (in most cases) and is possibly faster in the presence of large fan outs.

Romain Cledat added 2 commits Dec 14, 2019
The patch:
  - continuously polls all subprocesses until they terminate (within some time
    limit)
  - kills any stragglers
@romain-intel romain-intel requested a review from savingoyal Dec 15, 2019
metaflow/runtime.py Outdated Show resolved Hide resolved
metaflow/runtime.py Outdated Show resolved Hide resolved
metaflow/runtime.py Outdated Show resolved Hide resolved
Copy link
Member

savingoyal left a comment

Changes look good to me! I will test them out later today.

for worker in self._workers.values():
worker.kill()
temp_workers[worker] = 1
live_workers = temp_workers.keys()

This comment has been minimized.

Copy link
@savingoyal

savingoyal Dec 15, 2019

Member

isn't live_workers = self._workers.values()?

This comment has been minimized.

Copy link
@romain-intel

romain-intel Dec 15, 2019

Author Contributor

Not quite; in workers, they appear twice (one for each FD). This dedups them.

This comment has been minimized.

Copy link
@savingoyal

savingoyal Dec 16, 2019

Member

so, live_workers = set(self._workers.values())

This comment has been minimized.

Copy link
@seeravikiran

seeravikiran Dec 16, 2019

Contributor

Could you also detail what temp_workers is for? If it's for the sole purpose of creating the set, I'd instead do what @savingoyal suggests w.r.t set.

metaflow/runtime.py Outdated Show resolved Hide resolved
metaflow/runtime.py Outdated Show resolved Hide resolved
Copy link
Contributor Author

romain-intel left a comment

Thanks for the review. I'll address the comments and push an update today.

metaflow/runtime.py Outdated Show resolved Hide resolved
for worker in self._workers.values():
worker.kill()
temp_workers[worker] = 1
live_workers = temp_workers.keys()

This comment has been minimized.

Copy link
@romain-intel

romain-intel Dec 15, 2019

Author Contributor

Not quite; in workers, they appear twice (one for each FD). This dedups them.

metaflow/runtime.py Outdated Show resolved Hide resolved
metaflow/runtime.py Outdated Show resolved Hide resolved
metaflow/runtime.py Outdated Show resolved Hide resolved
for worker in self._workers.values():
worker.kill()
temp_workers[worker] = 1
live_workers = temp_workers.keys()

This comment has been minimized.

Copy link
@savingoyal

savingoyal Dec 16, 2019

Member

so, live_workers = set(self._workers.values())

system_msg=True, bad=True)
for worker in live_workers:
worker.kill()
self._logger('Flushing logs -- you may see messages from the tasks that were shutdown...',

This comment has been minimized.

Copy link
@savingoyal

savingoyal Dec 16, 2019

Member

Should we remove this logging message?

This comment has been minimized.

Copy link
@romain-intel

romain-intel Dec 16, 2019

Author Contributor

I wanted to keep it to help users understand why they see printout after things have been waited for and terminated.

This comment has been minimized.

Copy link
@savingoyal

savingoyal Dec 16, 2019

Member

I am worried that this log is too verbose. Maybe just? - Flushing logs...

metaflow/runtime.py Outdated Show resolved Hide resolved
for worker in self._workers.values():
worker.kill()
temp_workers[worker] = 1
live_workers = temp_workers.keys()

This comment has been minimized.

Copy link
@seeravikiran

seeravikiran Dec 16, 2019

Contributor

Could you also detail what temp_workers is for? If it's for the sole purpose of creating the set, I'd instead do what @savingoyal suggests w.r.t set.

system_msg=True, bad=True)
while live_workers and int(time.time()) - now < 5:
# While not all workers are dead and we have waited less than 5 seconds
live_workers = [worker for worker in live_workers if not worker.clean()]

This comment has been minimized.

Copy link
@seeravikiran

seeravikiran Dec 16, 2019

Contributor

Feel free to reject this suggestion -

This seems like a good use-case for filter:
live_workers = filter(lambda worker: not worker.clean(), live_workers)
I don't know the mutability guarantees for list iteration, so I'm not sure if using a function like filter is better.

As bonus it might work on set(s) or list(s).

This comment has been minimized.

Copy link
@romain-intel

romain-intel Dec 16, 2019

Author Contributor

I read a bit that filter may be a tad slower (not a huge deal here though) but I find list comprehension to be more readable so if OK, I would prefer to leave it as such.

worker.kill()
# If we are here, all children have received a signal and are shutting down.
# We want to give them an opportunity to do so and then kill
live_workers = list(set(self._workers.values()))

This comment has been minimized.

Copy link
@seeravikiran

seeravikiran Dec 16, 2019

Contributor

Why do you need this be a list? It seems you just need iteration, and the below suggested filter.

This comment has been minimized.

Copy link
@romain-intel

romain-intel Dec 16, 2019

Author Contributor

sure, I can change it. It made more sense for it to be a list when I had a loop with add but now I can put just a set.

self.killed = False
self.cleaned = False
Comment on lines 787 to 788

This comment has been minimized.

Copy link
@seeravikiran

seeravikiran Dec 16, 2019

Contributor

Could you add a short comment on what self.killed=True represents vs self.cleaned=True?
It also seems if self.killed = True, then self.cleaned must be True?

Defining the state transition perhaps in a short comment helps understand why you are tracking two states.

Romain Cledat
@savingoyal savingoyal merged commit f6dd413 into master Dec 16, 2019
2 checks passed
2 checks passed
Test on ubuntu-latest
Details
Test on macos-latest
Details
@savingoyal savingoyal deleted the issue/54 branch Dec 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants
You can’t perform that action at this time.