Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training fails if the --validate flag is set #39

Open
klois opened this issue Apr 30, 2020 · 3 comments
Open

Training fails if the --validate flag is set #39

klois opened this issue Apr 30, 2020 · 3 comments

Comments

@klois
Copy link

klois commented Apr 30, 2020

I got the training on the coco dataset working as expected.
However, when I try to set the --validate flag, as recommended in the documentation, the training fails as soon as it starts to do the first validation step.

The command

./tools/dist_train.sh configs/solo/decoupled_solo_light_r50_fpn_8gpu_3x.py 2

works while

./tools/dist_train.sh configs/solo/decoupled_solo_light_r50_fpn_8gpu_3x.py 2 --validate

produces the following error

[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 58/58, 40.1 task/s, elapsed: 1s, ETA:     0s

Traceback (most recent call last):
  File "./tools/train.py", line 125, in <module>
    main()
  File "./tools/train.py", line 121, in main
    timestamp=timestamp)
  File "/home/klauskofler/SOLO/mmdet/apis/train.py", line 103, in train_detector
    timestamp=timestamp)
  File "/home/klauskofler/SOLO/mmdet/apis/train.py", line 250, in _dist_train
    runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
  File "/home/klauskofler/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/runner.py", line 364, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/home/klauskofler/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/runner.py", line 278, in train
    self.call_hook('after_train_epoch')
  File "/home/klauskofler/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/runner.py", line 231, in call_hook
    getattr(hook, fn_name)(self)
  File "/home/klauskofler/SOLO/mmdet/core/evaluation/eval_hooks.py", line 64, in after_train_epoch
    self.evaluate(runner, results)
  File "/home/klauskofler/SOLO/mmdet/core/evaluation/eval_hooks.py", line 124, in evaluate
    result_files = results2json(self.dataset, results, tmp_file)
  File "/home/klauskofler/SOLO/mmdet/core/evaluation/coco_utils.py", line 224, in results2json
    json_results = det2json(dataset, results)
  File "/home/klauskofler/SOLO/mmdet/core/evaluation/coco_utils.py", line 153, in det2json
    for i in range(bboxes.shape[0]):
AttributeError: 'NoneType' object has no attribute 'shape'
Traceback (most recent call last):
  File "/home/klauskofler/anaconda3/envs/open-mmlab/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/klauskofler/anaconda3/envs/open-mmlab/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/klauskofler/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/launch.py", line 263, in <module>
    main()
  File "/home/klauskofler/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/launch.py", line 259, in main
    cmd=cmd)
subprocess.CalledProcessError: Command '['/home/klauskofler/anaconda3/envs/open-mmlab/bin/python', '-u', './tools/train.py', '--local_rank=1', 'configs/solo/decoupled_solo_light_r50_fpn_8gpu_3x.py', '--launcher', 'pytorch', '--validate']' returned non-zero exit status 1.

It seems to me, that the validation wants to calculate KPIs based on the bounding boxes produced by the network, while the network does not produce any bounding boxes. Is this behavior expected or am I doing something wrong?

@WXinlong
Copy link
Owner

WXinlong commented May 5, 2020

@klois The validation during training is not supported yet.

@klois
Copy link
Author

klois commented May 5, 2020

Thanks for the information.
Maybe the documentation at https://github.com/WXinlong/SOLO/blob/master/docs/GETTING_STARTED.md should be updated to avoid further confusion.

@WXinlong
Copy link
Owner

@klois Yes, thanks for pointing it out.:)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants