Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quick-fix for --gpus flag bug #2674

Merged
merged 6 commits into from
Jul 24, 2020
Merged

quick-fix for --gpus flag bug #2674

merged 6 commits into from
Jul 24, 2020

Conversation

ananyahjha93
Copy link
Contributor

@ananyahjha93 ananyahjha93 commented Jul 22, 2020

What does this PR do?

Prevents setting self.on_gpu as True when --gpus flag is not passed by the user in the case where the following way is used to create the trainer

parser = Trainer.add_argparse_args(parser)
args = parser.parse_args()

trainer = Trainer.from_argparse_args(args)

Fixes #2669

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together? Otherwise, we ask you to create a separate PR for every change.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?
  • Did you verify new and existing tests pass locally with your changes?
  • If you made a notable change (that affects users), did you update the CHANGELOG?

@mergify mergify bot requested a review from a team July 22, 2020 19:16
@codecov
Copy link

codecov bot commented Jul 22, 2020

Codecov Report

Merging #2674 into master will increase coverage by 0%.
The diff coverage is 100%.

@@          Coverage Diff           @@
##           master   #2674   +/-   ##
======================================
  Coverage      92%     92%           
======================================
  Files          74      74           
  Lines        6381    6382    +1     
======================================
+ Hits         5850    5851    +1     
  Misses        531     531           

@ananyahjha93 ananyahjha93 changed the title [wip] quick-fix for --gpus flag bug quick-fix for --gpus flag bug Jul 22, 2020
@awaelchli
Copy link
Member

I think the true fix should be to parse the gpus flag first and then set the on_gpu property, right?

@ananyahjha93
Copy link
Contributor Author

@awaelchli In this case type takes a callable as a value in form Trainer._arg_default which then evaluates to the gpus flag values if something is passed to the script by the user. If the user does not include the --gpus flag, args.gpus refers to Trainer._arg_default callable.

https://github.com/PyTorchLightning/pytorch-lightning/blob/bugfix/gpus-flag/pytorch_lightning/trainer/trainer.py#L786

@awaelchli
Copy link
Member

awaelchli commented Jul 22, 2020

Yes I understand that. We also have a function that takes the user input and converts it to a list.
See distrib_parts.py::_parse_gpu_ids.
There, a callable results in None, which is what we want, right?
It happens later in the trainer init, so if you just move the on_gpu down there after it was parsed, it should be fine, no?
https://github.com/PyTorchLightning/pytorch-lightning/blob/62ce00f96c09de6d137c810921a6cd9e7b60aff5/pytorch_lightning/trainer/trainer.py#L531
Basically after here

@Borda
Copy link
Member

Borda commented Jul 22, 2020

yes, I would not pass the function at all, it makes unnecessary complex...

@ananyahjha93
Copy link
Contributor Author

@awaelchli yeah, you are right, thanks for pointing this out.

@Borda what's your suggestion here, if not pass a function?

@@ -532,6 +532,10 @@ def __init__(
self.root_gpu = determine_root_gpu_device(self.data_parallel_device_ids)
self.root_device = torch.device("cpu")

# self.data_parallel_device_ids is None if gpus is callable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't understand what this means

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.data_parallel_device_ids = _parse_gpu_ids(self.gpus)

_parse_gpu_ids returns None is self.gpus is callable, if you don't pass --gpus flag then self.gpus points to Trainer._arg_default

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could move the self.on_gpu = True if (gpus and torch.cuda.is_available()) else False found a few lines above
down here and simply write

self.on_gpu = True if (self.data_parallel_device_ids and torch.cuda.is_available()) else False

then there is no need for an extra if clause or a comment :)

@mergify mergify bot requested a review from a team July 23, 2020 11:10
Copy link
Member

@awaelchli awaelchli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really nice this is fixed, it was annoying with the default arg not properly set.

@mergify mergify bot requested a review from a team July 24, 2020 07:14
@mergify
Copy link
Contributor

mergify bot commented Jul 24, 2020

Great job! =)

@mergify mergify bot merged commit 6780214 into master Jul 24, 2020
@mergify
Copy link
Contributor

mergify bot commented Jul 24, 2020

Great job! =)

@Borda Borda deleted the bugfix/gpus-flag branch July 24, 2020 08:33
@Borda
Copy link
Member

Borda commented Jul 24, 2020

we shall re-count nb tests we are running as this was merged automatically even one test was failing...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

--gpus flag with add_argparse_args bug
5 participants