Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

LightningCli: Passing trainer_defaults #8008

Closed
tchaton opened this issue Jun 16, 2021 · 14 comments 路 Fixed by #11509 or #13283
Closed

LightningCli: Passing trainer_defaults #8008

tchaton opened this issue Jun 16, 2021 · 14 comments 路 Fixed by #11509 or #13283
Assignees
Labels
3rd party Related to a 3rd-party bug Something isn't working lightningcli pl.cli.LightningCLI priority: 1 Medium priority task
Milestone

Comments

@tchaton
Copy link
Contributor

tchaton commented Jun 16, 2021

馃悰 Bug

Please reproduce using the BoringModel

To Reproduce

It seems that PyTorchProfiler can't be passed as profiler argument in:

                           [--data.test_transforms TEST_TRANSFORMS]
                           [--data.dims DIMS]
profiler_example.py: error: Configuration check failed :: Parser key "trainer.profiler": Value "<pytorch_lightning.profiler.pytorch.PyTorchProfiler object at 0x7faeafcc7910>" does not validate against any of the types in typing.Union[pytorch_lightning.profiler.base.BaseProfiler, str, NoneType]:
  - Expected an str or a Dict with a class_path entry but got "<pytorch_lightning.profiler.pytorch.PyTorchProfiler object at 0x7faeafcc7910>"
  - Expected a <class 'str'> but got "<pytorch_lightning.profiler.pytorch.PyTorchProfiler object at 0x7faeafcc7910>"
  - Expected a <class 'NoneType'> but got "<pytorch_lightning.profiler.pytorch.PyTorchProfiler object at 0x7faeafcc7910>"
from pytorch_lightning.profiler import PyTorchProfiler

LightningCLI(ModelClass, DataModuleClass, trainer_defaults={"profiler": PyTorchProfiler(profile_memory=True)})

Is it a mis-use of LightningCLI or a bug ? @carmocca

Use following BoringModel and post here

Expected behavior

Environment

Note: Bugs with code are solved faster ! Colab Notebook should be made public !

You can get the script and run it with:

wget https://raw.githubusercontent.com/PyTorchLightning/pytorch-lightning/master/tests/collect_env_details.py
# For security purposes, please check the contents of collect_env_details.py before running it.
python collect_env_details.py
  • PyTorch Version (e.g., 1.0):
  • OS (e.g., Linux):
  • How you installed PyTorch (conda, pip, source):
  • Build command you used (if compiling from source):
  • Python version:
  • CUDA/cuDNN version:
  • GPU models and configuration:
  • Any other relevant information:

Additional context

@tchaton tchaton added bug Something isn't working help wanted Open to be worked on priority: 1 Medium priority task labels Jun 16, 2021
@tchaton tchaton changed the title LightningCli LightningCli: Passing trainer_defaults Jun 16, 2021
@carmocca carmocca added the argparse (removed) Related to argument parsing (argparse, Hydra, ...) label Jun 16, 2021
@carmocca
Copy link
Contributor

Seems to me like this is a bug upstream. This produces the same error:

from typing import Optional

import jsonargparse

print(jsonargparse.__version__)

class A:
    ...

def fn(arg: Optional[A] = None):
    print(arg)

parser = jsonargparse.ArgumentParser()
parser.add_function_arguments(fn)
parser.set_defaults({'arg': A()})
parser.parse_args()

cc: @mauvilsa

@carmocca carmocca added the 3rd party Related to a 3rd-party label Jun 17, 2021
@mauvilsa
Copy link
Contributor

Yes, this is an issue that relates to jsonargparse. In general having a class instance as a default is not a good idea because it is mutable. Furthermore, this breaks print_config because there is no way to convert the class instance into a class_path and init_args dict. Most likely what is needed is a default factory class that would be used in these cases, and giving a clear error message if a class instance is used for a default.

@carmocca
Copy link
Contributor

So a function like parser.set_default_factory(profiler=callable)?

or parser.set_default_factory(profiler={'class': PyTorchProfiler, 'init_args': ...})?

@mauvilsa
Copy link
Contributor

Something more like

my_default_factory = default_class_factory(MyClass, init_args={...})

which then could be used to override a default as

parser.set_defaults({'my_module.my_class': my_default_factory})

or to use as default in a signature as

class MyModule:
    def __init__(
        self,
        my_class: MyClass = my_default_factory,
    ):
        ...

Something analogous to the field in dataclasses.

@tchaton
Copy link
Contributor Author

tchaton commented Jun 21, 2021

Yes, this is an issue that relates to jsonargparse. In general having a class instance as a default is not a good idea because it is mutable. Furthermore, this breaks print_config because there is no way to convert the class instance into a class_path and init_args dict. Most likely what is needed is a default factory class that would be used in these cases, and giving a clear error message if a class instance is used for a default.

Hey @mauvilsa,
I would argue that even if default_class_factory is definitely the recommended approach, we should still support trainer_defaults with arguments provided by users.
Especially as the path can be inferred automatically as trainer.profiler and we can inspect the class to get the proper print_config.
When a user provides directly arguments in this format, we should print a warning with the recommended approach.
Currently, I believe not supporting passing trainer_kwargs seems counter-intuitive and users would become frustrated.

Best,
T.C

@Borda Borda added good first issue Good for newcomers and removed good first issue Good for newcomers labels Jun 21, 2021
@mauvilsa
Copy link
Contributor

mauvilsa commented Jun 21, 2021

@tchaton

I would argue that even if default_class_factory is definitely the recommended approach, we should still support trainer_defaults with arguments provided by users.

I agree that if a class instance of the correct type is given then it shouldn't fail like it does now.

Especially as the path can be inferred automatically as trainer.profiler and we can inspect the class to get the proper print_config.

The class path can be inferred, but the init args can't be inferred. The file that SaveConfigCallback saves has two main uses. One is knowing the exact settings that were used when it was run and the second is reproducibility. If only the class path is inferred and included in the saved config, then it is not reproducible since some custom init args could have been used for the default. On the other hand, if this config key is not included in the saved config then it will be reproducible because when excluded the correct default would be used. But then by looking at the saved config you don't know all the settings that were used. Probably the best would be some solution in the middle. When I have a better idea how to make everything fit together I will comment.

When a user provides directly arguments in this format, we should print a warning with the recommended approach. Currently, I believe not supporting passing trainer_kwargs seems counter-intuitive and users would become frustrated.

I agree we should do this.

@stale
Copy link

stale bot commented Jul 22, 2021

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!

@stale stale bot added the won't fix This will not be worked on label Jul 22, 2021
@carmocca carmocca added this to the v1.5 milestone Jul 22, 2021
@stale stale bot removed the won't fix This will not be worked on label Jul 22, 2021
@quancs
Copy link
Member

quancs commented Oct 18, 2021

Hi, I meet another problem similar to this one for the 1.5.0rc0 version
config file:

fit:
  seed_everything: null
  trainer:
    plugins:
      - class_path: pytorch_lightning.plugins.DDPPlugin
        init_args:
          kwargs:
            find_unused_parameters: false

or

fit:
  seed_everything: null
  trainer:
    plugins:
      - class_path: pytorch_lightning.plugins.DDPPlugin
        init_args:
          find_unused_parameters: false

are both not ok to parse the DDPPlugin ( Configuration check failed :: No action for key "find_unused_parameters" to check its value. ) Is this a misconfiguration?

Error for the first config file

usage: NBSS.py [-h] [--config CONFIG]
               [--print_config [={comments,skip_null}+]]
               {fit,validate,test,predict,tune} ...
NBSS.py: error: Parser key "trainer.plugins": Value "[{'class_path': 'pytorch_lightning.plugins.DDPPlugin', 'init_args': {'find_unused_parameters': False}}]" does not validate against any of the types in typing.Union[pytorch_lightning.plugins.training_type.training_type_plugin.TrainingTypePlugin, pytorch_lightning.plugins.precision.precision_plugin.PrecisionPlugin, pytorch_lightning.plugins.environments.cluster_environment.ClusterEnvironment, pytorch_lightning.plugins.io.checkpoint_plugin.CheckpointIO, str, typing.List[typing.Union[pytorch_lightning.plugins.training_type.training_type_plugin.TrainingTypePlugin, pytorch_lightning.plugins.precision.precision_plugin.PrecisionPlugin, pytorch_lightning.plugins.environments.cluster_environment.ClusterEnvironment, pytorch_lightning.plugins.io.checkpoint_plugin.CheckpointIO, str]], NoneType]:
  - Type <class 'pytorch_lightning.plugins.training_type.training_type_plugin.TrainingTypePlugin'> expects an str or a Dict with a class_path entry but got "[{'class_path': 'pytorch_lightning.plugins.DDPPlugin', 'init_args': {'find_unused_parameters': False}}]"
  - Type <class 'pytorch_lightning.plugins.precision.precision_plugin.PrecisionPlugin'> expects an str or a Dict with a class_path entry but got "[{'class_path': 'pytorch_lightning.plugins.DDPPlugin', 'init_args': {'find_unused_parameters': False}}]"
  - Type <class 'pytorch_lightning.plugins.environments.cluster_environment.ClusterEnvironment'> expects an str or a Dict with a class_path entry but got "[{'class_path': 'pytorch_lightning.plugins.DDPPlugin', 'init_args': {'find_unused_parameters': False}}]"
  - Type <class 'pytorch_lightning.plugins.io.checkpoint_plugin.CheckpointIO'> expects an str or a Dict with a class_path entry but got "[{'class_path': 'pytorch_lightning.plugins.DDPPlugin', 'init_args': {'find_unused_parameters': False}}]"
  - Expected a <class 'str'> but got "[{'class_path': 'pytorch_lightning.plugins.DDPPlugin', 'init_args': {'find_unused_parameters': False}}]"
  - Value "{'class_path': 'pytorch_lightning.plugins.DDPPlugin', 'init_args': {'find_unused_parameters': False}}" does not validate against any of the types in typing.Union[pytorch_lightning.plugins.training_type.training_type_plugin.TrainingTypePlugin, pytorch_lightning.plugins.precision.precision_plugin.PrecisionPlugin, pytorch_lightning.plugins.environments.cluster_environment.ClusterEnvironment, pytorch_lightning.plugins.io.checkpoint_plugin.CheckpointIO, str]:
    - Problem with given class_path "pytorch_lightning.plugins.DDPPlugin":
      - 'Configuration check failed :: No action for key "find_unused_parameters" to check its value.'
    - "pytorch_lightning.plugins.DDPPlugin" is not a subclass of PrecisionPlugin
    - "pytorch_lightning.plugins.DDPPlugin" is not a subclass of ClusterEnvironment
    - "pytorch_lightning.plugins.DDPPlugin" is not a subclass of CheckpointIO
    - Expected a <class 'str'> but got "{'class_path': 'pytorch_lightning.plugins.DDPPlugin', 'init_args': {'find_unused_parameters': False}}"
  - Expected a <class 'NoneType'> but got "[{'class_path': 'pytorch_lightning.plugins.DDPPlugin', 'init_args': {'find_unused_parameters': False}}]"

Error for the second

usage: NBSS.py [-h] [--config CONFIG]
               [--print_config [={comments,skip_null}+]]
               {fit,validate,test,predict,tune} ...
NBSS.py: error: Parser key "trainer.plugins": Value "[{'class_path': 'pytorch_lightning.plugins.DDPPlugin', 'init_args': {'kwargs': {'find_unused_parameters': False}}}]" does not validate against any of the types in typing.Union[pytorch_lightning.plugins.training_type.training_type_plugin.TrainingTypePlugin, pytorch_lightning.plugins.precision.precision_plugin.PrecisionPlugin, pytorch_lightning.plugins.environments.cluster_environment.ClusterEnvironment, pytorch_lightning.plugins.io.checkpoint_plugin.CheckpointIO, str, typing.List[typing.Union[pytorch_lightning.plugins.training_type.training_type_plugin.TrainingTypePlugin, pytorch_lightning.plugins.precision.precision_plugin.PrecisionPlugin, pytorch_lightning.plugins.environments.cluster_environment.ClusterEnvironment, pytorch_lightning.plugins.io.checkpoint_plugin.CheckpointIO, str]], NoneType]:
  - Type <class 'pytorch_lightning.plugins.training_type.training_type_plugin.TrainingTypePlugin'> expects an str or a Dict with a class_path entry but got "[{'class_path': 'pytorch_lightning.plugins.DDPPlugin', 'init_args': {'kwargs': {'find_unused_parameters': False}}}]"
  - Type <class 'pytorch_lightning.plugins.precision.precision_plugin.PrecisionPlugin'> expects an str or a Dict with a class_path entry but got "[{'class_path': 'pytorch_lightning.plugins.DDPPlugin', 'init_args': {'kwargs': {'find_unused_parameters': False}}}]"
  - Type <class 'pytorch_lightning.plugins.environments.cluster_environment.ClusterEnvironment'> expects an str or a Dict with a class_path entry but got "[{'class_path': 'pytorch_lightning.plugins.DDPPlugin', 'init_args': {'kwargs': {'find_unused_parameters': False}}}]"
  - Type <class 'pytorch_lightning.plugins.io.checkpoint_plugin.CheckpointIO'> expects an str or a Dict with a class_path entry but got "[{'class_path': 'pytorch_lightning.plugins.DDPPlugin', 'init_args': {'kwargs': {'find_unused_parameters': False}}}]"
  - Expected a <class 'str'> but got "[{'class_path': 'pytorch_lightning.plugins.DDPPlugin', 'init_args': {'kwargs': {'find_unused_parameters': False}}}]"
  - Value "{'class_path': 'pytorch_lightning.plugins.DDPPlugin', 'init_args': {'kwargs': {'find_unused_parameters': False}}}" does not validate against any of the types in typing.Union[pytorch_lightning.plugins.training_type.training_type_plugin.TrainingTypePlugin, pytorch_lightning.plugins.precision.precision_plugin.PrecisionPlugin, pytorch_lightning.plugins.environments.cluster_environment.ClusterEnvironment, pytorch_lightning.plugins.io.checkpoint_plugin.CheckpointIO, str]:
    - Problem with given class_path "pytorch_lightning.plugins.DDPPlugin":
      - 'Configuration check failed :: No action for key "kwargs.find_unused_parameters" to check its value.'
    - "pytorch_lightning.plugins.DDPPlugin" is not a subclass of PrecisionPlugin
    - "pytorch_lightning.plugins.DDPPlugin" is not a subclass of ClusterEnvironment
    - "pytorch_lightning.plugins.DDPPlugin" is not a subclass of CheckpointIO
    - Expected a <class 'str'> but got "{'class_path': 'pytorch_lightning.plugins.DDPPlugin', 'init_args': {'kwargs': {'find_unused_parameters': False}}}"
  - Expected a <class 'NoneType'> but got "[{'class_path': 'pytorch_lightning.plugins.DDPPlugin', 'init_args': {'kwargs': {'find_unused_parameters': False}}}]"

@carmocca
Copy link
Contributor

@quancs Your issue is #8561, not this one

You can work around it by passing --trainer.plugins ddp_find_unused_parameters_false

@quancs
Copy link
Member

quancs commented Oct 18, 2021

@quancs Your issue is #8561, not this one

You can work around it by passing --trainer.plugins ddp_find_unused_parameters_false

Aha, sorry....

@carmocca
Copy link
Contributor

No problem!

@mauvilsa
Copy link
Contributor

An update on this. jsonargparse was modified such that if a default is an instance of a class, parsing does not fail and when the config is printed it shows the string representation of the default instance. Also the lazy_instance function was implemented to allow creating a proper printed config, however, it is not currently working for nn.Module classes, see omni-us/jsonargparse#96.

@awaelchli awaelchli modified the milestones: v1.5, 1.5.x Nov 4, 2021
@mauvilsa
Copy link
Contributor

The lazy_instance now works for nn.Module classes. This is part of jsonargparse v4.0.0.

@carmocca carmocca removed the argparse (removed) Related to argument parsing (argparse, Hydra, ...) label Nov 16, 2021
@carmocca
Copy link
Contributor

@mauvilsa can you open a PR with a test using it with some of our components?

we'd need to merge #10426 first

@carmocca carmocca added lightningcli pl.cli.LightningCLI and removed help wanted Open to be worked on labels Nov 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3rd party Related to a 3rd-party bug Something isn't working lightningcli pl.cli.LightningCLI priority: 1 Medium priority task
Projects
None yet
6 participants