Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

ImageNet training example crashes on evaluation #10168

Closed
lengstrom opened this issue Oct 27, 2021 · 0 comments 路 Fixed by #10179
Closed

ImageNet training example crashes on evaluation #10168

lengstrom opened this issue Oct 27, 2021 · 0 comments 路 Fixed by #10179
Assignees
Labels
bug Something isn't working help wanted Open to be worked on

Comments

@lengstrom
Copy link

馃悰 Bug

ImageNet training example does not evaluate correctly. I get the error:

Traceback (most recent call last):
  File "inet_orig.py", line 260, in <module>
    run_cli()
  File "inet_orig.py", line 255, in run_cli
    main(args)
  File "inet_orig.py", line 239, in main
    trainer.test(model)
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 705, in test
    results = self._run(model)
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 922, in _run
    self._dispatch()
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 986, in _dispatch
    self.accelerator.start_evaluating(self)
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 95, in start_evaluating
    self.training_type_plugin.start_evaluating(trainer)
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 165, in start_evaluating
    self._results = trainer.run_stage()
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 997, in run_stage
    return self._run_evaluate()
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1083, in _run_evaluate
    eval_loop_results = self._evaluation_loop.run()
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 118, in run
    output = self.on_run_end()
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 133, in on_run_end
    self.evaluation_epoch_end(outputs)
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 238, in evaluation_epoch_end
    model.test_epoch_end(outputs)
  File "inet_orig.py", line 178, in test_epoch_end
    "test_loss": outputs["val_loss"],
TypeError: 'NoneType' object is not subscriptable

To Reproduce

Go to the examples directory:
https://github.com/PyTorchLightning/pytorch-lightning/blob/master/pl_examples/domain_templates/
Then run:

CUDA_VISIBLE_DEVICES=0 python imagenet.py --data-path /mnt/n1/imagenet --gpus=1 --workers=60 --max_epochs=2
CUDA_VISIBLE_DEVICES=0 python imagenet.py --data-path /mnt/n1/imagenet --gpus=1 --workers=60 --evaluate --default_root_dir=lightning_logs/version_0/checkpoints

Expected behavior

It doesn't crash and instead gets the accuracy of the model.

Environment

  • CUDA:
    - GPU:
    - A100-SXM4-40GB
    - A100-SXM4-40GB
    - A100-SXM4-40GB
    - A100-SXM4-40GB
    - A100-SXM4-40GB
    - A100-SXM4-40GB
    - A100-SXM4-40GB
    - A100-SXM4-40GB
    - available: True
    - version: 11.1
  • Packages:
    - numpy: 1.19.2
    - pyTorch_debug: False
    - pyTorch_version: 1.8.1+cu111
    - pytorch-lightning: 1.4.9
    - tqdm: 4.56.0
  • System:
    - OS: Linux
    - architecture:
    - 64bit
    -
    - processor: x86_64
    - python: 3.7.10
    - version: fix req. in setup聽#60~18.04.1-Ubuntu SMP Thu Sep 9 20:38:09 UTC 2021

Additional context

Happy to elaborate.

@lengstrom lengstrom added bug Something isn't working help wanted Open to be worked on labels Oct 27, 2021
@justusschock justusschock self-assigned this Oct 27, 2021
@justusschock justusschock mentioned this issue Oct 27, 2021
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Open to be worked on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants