ImageNet training example crashes on evaluation #10168

lengstrom · 2021-10-27T02:40:18Z

🐛 Bug

ImageNet training example does not evaluate correctly. I get the error:

Traceback (most recent call last):
  File "inet_orig.py", line 260, in <module>
    run_cli()
  File "inet_orig.py", line 255, in run_cli
    main(args)
  File "inet_orig.py", line 239, in main
    trainer.test(model)
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 705, in test
    results = self._run(model)
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 922, in _run
    self._dispatch()
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 986, in _dispatch
    self.accelerator.start_evaluating(self)
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 95, in start_evaluating
    self.training_type_plugin.start_evaluating(trainer)
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 165, in start_evaluating
    self._results = trainer.run_stage()
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 997, in run_stage
    return self._run_evaluate()
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1083, in _run_evaluate
    eval_loop_results = self._evaluation_loop.run()
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 118, in run
    output = self.on_run_end()
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 133, in on_run_end
    self.evaluation_epoch_end(outputs)
  File "/home/ubuntu/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 238, in evaluation_epoch_end
    model.test_epoch_end(outputs)
  File "inet_orig.py", line 178, in test_epoch_end
    "test_loss": outputs["val_loss"],
TypeError: 'NoneType' object is not subscriptable

To Reproduce

Go to the examples directory:
https://github.com/PyTorchLightning/pytorch-lightning/blob/master/pl_examples/domain_templates/
Then run:

CUDA_VISIBLE_DEVICES=0 python imagenet.py --data-path /mnt/n1/imagenet --gpus=1 --workers=60 --max_epochs=2
CUDA_VISIBLE_DEVICES=0 python imagenet.py --data-path /mnt/n1/imagenet --gpus=1 --workers=60 --evaluate --default_root_dir=lightning_logs/version_0/checkpoints

Expected behavior

It doesn't crash and instead gets the accuracy of the model.

Environment

CUDA:
- GPU:
- A100-SXM4-40GB
- A100-SXM4-40GB
- A100-SXM4-40GB
- A100-SXM4-40GB
- A100-SXM4-40GB
- A100-SXM4-40GB
- A100-SXM4-40GB
- A100-SXM4-40GB
- available: True
- version: 11.1
Packages:
- numpy: 1.19.2
- pyTorch_debug: False
- pyTorch_version: 1.8.1+cu111
- pytorch-lightning: 1.4.9
- tqdm: 4.56.0
System:
- OS: Linux
- architecture:
- 64bit
-
- processor: x86_64
- python: 3.7.10
- version: fix req. in setup #60~18.04.1-Ubuntu SMP Thu Sep 9 20:38:09 UTC 2021

Additional context

Happy to elaborate.

The text was updated successfully, but these errors were encountered:

lengstrom added bug Something isn't working help wanted Open to be worked on labels Oct 27, 2021

justusschock self-assigned this Oct 27, 2021

justusschock mentioned this issue Oct 27, 2021

Fix Imagenet example #10179

Merged

12 tasks

justusschock closed this as completed in #10179 Oct 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ImageNet training example crashes on evaluation #10168

ImageNet training example crashes on evaluation #10168

lengstrom commented Oct 27, 2021

ImageNet training example crashes on evaluation #10168

ImageNet training example crashes on evaluation #10168

Comments

lengstrom commented Oct 27, 2021

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context