Skip to content

Don't use argparse.FileType #141

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 0 commits into from
Closed

Don't use argparse.FileType #141

wants to merge 0 commits into from

Conversation

ryansm1
Copy link
Contributor

@ryansm1 ryansm1 commented Nov 26, 2018

Using argparse.FileType for argparse's log-file argument stores an unserializable TextIOWrapper on args (see https://docs.python.org/3/library/argparse.html#argparse.FileType). This breaks the spawn multiprocessing creation method (i.e. Windows).

This commit reverts to the prior method of handling args.log_file and attempts to close the log file handle when exiting.

@Bogdanp
Copy link
Owner

Bogdanp commented Nov 28, 2018

Thanks and apologies for changing this without asking. :D

I'll merge this this weekend.

@ryansm1
Copy link
Contributor Author

ryansm1 commented Nov 29, 2018

No apology necessary! The FileType setup is cleaner, but I didn't know it wouldn't serialize until I tried it on my Windows box. Thanks for your consideration of the PR!

@ryansm1
Copy link
Contributor Author

ryansm1 commented Nov 30, 2018

Commit 38a23e3 addresses #137. Introducing a slight delay between worker process kills lets the process end cleanly (cleaner?) and prevents Windows from returning SIGTERM (15) instead of the intended return code (RET_IMPORT/2).

I ran this a large number of times on Windows and Linux and wasn't able to get it to fail, but it can be easily reverted if you see any issues.

dramatiq/cli.py Outdated
os.kill(proc.pid, signum)
if proc.is_alive():
os.kill(proc.pid, signum)
# For Windows, give the process time to clean up
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we potentially use os.waitpid here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears that this issue is related to timing and how Windows handles the kill signal. The child process sometimes terminates between the check (proc.is_alive()) and the kill. If the main process sends the kill signal, Windows assigns the signal (SIGTERM is 15 in this case) to the child process return code, so the main process return code gets set to 15.

Depending upon the process timing, the child may have terminated before proc.is_alive(), so we never send a kill signal and don't unintentionally modify the return code--the happy path. Occasionally, the process hasn't terminated and gets killed, causing the intermittent failures.

Using a one-second timeout on a join call is cleaner and achieves the same result.

@Bogdanp
Copy link
Owner

Bogdanp commented Dec 22, 2018

@ryansm1 sorry, I accidentally pushed the wrong thing (the current master) to your branch and now GH won't let me rectify that. I'll open a new PR with your changes and some of my own additions on top.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants