Skip to content

AttributeError: 'DistributedDataParallel' object has no attribute 'version' at the time of model checkpointing #475

@rushiagr

Description

@rushiagr

Full stacktrace:

... <snipped> ...
Epoch: [1][97/804]      Time 1.485 (1.640)      Data 0.006 (0.069)      Loss 339.3378 (444.7247)
Epoch: [1][98/804]      Time 1.753 (1.641)      Data 0.012 (0.069)      Loss 372.9253 (443.9921)
Epoch: [1][99/804]      Time 1.500 (1.640)      Data 0.017 (0.068)      Loss 276.2173 (442.2974)
Epoch: [1][100/804]     Time 1.461 (1.638)      Data 0.006 (0.068)      Loss 314.9874 (441.0243)
Saving checkpoint model to /datasets/deepspeech/librispeech/deepspeech_checkpoint_epoch_1_iter_100.pth
Traceback (most recent call last):
  File "train.py", line 284, in <module>
    wer_results=wer_results, cer_results=cer_results, avg_loss=avg_loss),
  File "/workspace/src/deepspeech/deepspeech.pytorch/model.py", line 251, in serialize
    'version': model.version,
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 585, in __getattr__
    type(self).__name__, name))
AttributeError: 'DistributedDataParallel' object has no attribute 'version'
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/opt/conda/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/workspace/src/deepspeech/deepspeech.pytorch/multiproc.py", line 46, in <module>
    cmd=p.args)
subprocess.CalledProcessError: Command '['/opt/conda/bin/python', 'train.py', '--rnn-type', 'lstm', '--hidden-size', '1024', '--hidden-layers', '5', '--train-manifest', '/datasets/deepspeech/librispeech/libri_train_manifest.csv', '--val-manifest', '/datasets/deepspeech/librispeech/libri_val_manifest.csv', '--epochs', '60', '--num-workers', '16', '--cuda', '--learning-anneal', '1.01', '--batch-size', '64', '--no-sortaGrad', '--visdom', '--opt-level', 'O1', '--loss-scale', '1', '--id', 'libri', '--checkpoint', '--save-folder', '/datasets/deepspeech/librispeech', '--model-path', '/datasets/deepspeech/librispeech/deepspeech_final.pth', '--checkpoint-per-batch', '100', '--opt-level', 'O1', '--loss-scale', '1.0', '--world-size', '4', '--rank', '0', '--gpu-rank', '0']' returned non-zero exit status 1.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions