Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pid port + duplicate rank_zero logging #2231

Merged
merged 7 commits into from
Jun 18, 2020
Merged

Pid port + duplicate rank_zero logging #2231

merged 7 commits into from
Jun 18, 2020

Conversation

williamFalcon
Copy link
Contributor

@williamFalcon williamFalcon commented Jun 18, 2020

Implements #2140 in a way that works.

For local ddp, we now use the process id as a numpy sub-seed to generate a random port. This way since the user sets the full seed we can use the parent process id as a seed before spawning off children.

Also solves an issue where on init when ranks are know, we shouldn't see duplicate messages

@mergify mergify bot requested a review from a team June 18, 2020 03:42
@williamFalcon williamFalcon mentioned this pull request Jun 18, 2020
5 tasks
@williamFalcon
Copy link
Contributor Author

@ZhaofengWu mind putting your Full name + email here for the commit?

@ZhaofengWu
Copy link
Contributor

Zhaofeng Wu; zfw7@cs.washington.edu :)

@ZhaofengWu
Copy link
Contributor

@williamFalcon williamFalcon changed the title Pid port Pid port + duplicate rank_zero logging Jun 18, 2020
@codecov
Copy link

codecov bot commented Jun 18, 2020

Codecov Report

Merging #2231 into master will decrease coverage by 0%.
The diff coverage is 22%.

@@          Coverage Diff           @@
##           master   #2231   +/-   ##
======================================
- Coverage      88%     88%   -0%     
======================================
  Files          70      70           
  Lines        5415    5421    +6     
======================================
+ Hits         4759    4761    +2     
- Misses        656     660    +4     

@williamFalcon williamFalcon merged commit 476911d into master Jun 18, 2020
@Borda Borda added the bug Something isn't working label Jun 18, 2020
@Borda Borda added this to the 0.8.0 milestone Jun 18, 2020
@Borda Borda deleted the pid_port branch June 18, 2020 07:53
@Borda
Copy link
Member

Borda commented Jun 18, 2020

@ZhaofengWu mind putting your Full name + email here for the commit?

unfortunately, this way of co-authoring is pure git thing and dos do not appear in any GitHub stats, see my comment here #1920 (comment)

if 'LOCAL_RANK' in os.environ:
rank_zero_only.rank = os.environ['LOCAL_RANK']
if 'SLURM_JOB_ID' in os.environ:
rank_zero_only.rank = os.environ['SLURM_JOB_ID']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line hear breaks logging in my setup. rank_zero_only.rank is set to a non-zero value (the slurm job id) and thus non of the logging functions are ever executed. Did you mean SLURM_PROCID which is the MPI rank?

@ExpectationMax
Copy link
Contributor

ExpectationMax commented Jun 20, 2020

I think this pull request breaks logging in some setups (when running on slurm clusters). See https://github.com/PyTorchLightning/pytorch-lightning/pull/2231/files#r443131338

(sorry for the double post, I did not realize that the review also shows up in this thread)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants