-
Notifications
You must be signed in to change notification settings - Fork 628
Closed
Labels
Description
Full stacktrace:
... <snipped> ...
Epoch: [1][97/804] Time 1.485 (1.640) Data 0.006 (0.069) Loss 339.3378 (444.7247)
Epoch: [1][98/804] Time 1.753 (1.641) Data 0.012 (0.069) Loss 372.9253 (443.9921)
Epoch: [1][99/804] Time 1.500 (1.640) Data 0.017 (0.068) Loss 276.2173 (442.2974)
Epoch: [1][100/804] Time 1.461 (1.638) Data 0.006 (0.068) Loss 314.9874 (441.0243)
Saving checkpoint model to /datasets/deepspeech/librispeech/deepspeech_checkpoint_epoch_1_iter_100.pth
Traceback (most recent call last):
File "train.py", line 284, in <module>
wer_results=wer_results, cer_results=cer_results, avg_loss=avg_loss),
File "/workspace/src/deepspeech/deepspeech.pytorch/model.py", line 251, in serialize
'version': model.version,
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 585, in __getattr__
type(self).__name__, name))
AttributeError: 'DistributedDataParallel' object has no attribute 'version'
Traceback (most recent call last):
File "/opt/conda/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/opt/conda/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/workspace/src/deepspeech/deepspeech.pytorch/multiproc.py", line 46, in <module>
cmd=p.args)
subprocess.CalledProcessError: Command '['/opt/conda/bin/python', 'train.py', '--rnn-type', 'lstm', '--hidden-size', '1024', '--hidden-layers', '5', '--train-manifest', '/datasets/deepspeech/librispeech/libri_train_manifest.csv', '--val-manifest', '/datasets/deepspeech/librispeech/libri_val_manifest.csv', '--epochs', '60', '--num-workers', '16', '--cuda', '--learning-anneal', '1.01', '--batch-size', '64', '--no-sortaGrad', '--visdom', '--opt-level', 'O1', '--loss-scale', '1', '--id', 'libri', '--checkpoint', '--save-folder', '/datasets/deepspeech/librispeech', '--model-path', '/datasets/deepspeech/librispeech/deepspeech_final.pth', '--checkpoint-per-batch', '100', '--opt-level', 'O1', '--loss-scale', '1.0', '--world-size', '4', '--rank', '0', '--gpu-rank', '0']' returned non-zero exit status 1.