Division by Zero when training #12

LukeB42 · 2018-01-13T18:02:02Z

  File "samplernn-pytorch/trainer/__init__.py", line 45, in call_plugins
    getattr(plugin, queue_name)(*args)
  File "/usr/local/lib/python3.6/site-packages/torch/utils/trainer/plugins/monitor.py", line 56, in epoch
    stats['epoch_mean'] = epoch_stats[0] / epoch_stats[1]
ZeroDivisionError: division by zero

This is with PyTorch 0.3.0.post4.

The text was updated successfully, but these errors were encountered:

sbl · 2018-01-21T16:18:24Z

Same behavior here:

  File "train.py", line 337, in <module>
    main(**vars(parser.parse_args()))
  File "train.py", line 235, in main
    trainer.run(params['epoch_limit'])
  File "/home/stephen/src/samplernn-pytorch/trainer/__init__.py", line 57, in run
    self.call_plugins('epoch', self.epochs)
  File "/home/stephen/src/samplernn-pytorch/trainer/__init__.py", line 44, in call_plugins
    getattr(plugin, queue_name)(*args)
  File "/home/stephen/anaconda3/lib/python3.6/site-packages/torch/utils/trainer/plugins/monitor.py", line 56, in epoch
    stats['epoch_mean'] = epoch_stats[0] / epoch_stats[1]
ZeroDivisionError: division by zero

koz4k · 2018-01-25T21:30:43Z

Duplicate of #10.

The problem is that for validation we discard the last (incomplete) minibatch so it doesn't skew the result, as it might be smaller than the rest and we average the loss over minibatches with equal weights. Specifically, if you only have one minibatch, it tries to average over an empty set, hence division by zero. This could be handled better and we're planning to do that in the near future.

LukeB42 · 2018-01-31T19:04:21Z

@koz4k thanks for the response but what do you suggest for fixing this myself for the time being?

returning if args is empty doesn't work and wrapping the function body in a try / except causes the program to exit at around the 1,000 exceptions mark.

koz4k · 2018-01-31T20:02:54Z

Sorry, I was wrong - this is related to the size of the training set, not validation set. Either way, the solution is to lower the batch size or use a bigger dataset. I would recommend a bigger dataset, because with such a small one you might not be able to achieve good results anyway.

LukeB42 · 2018-02-01T10:12:41Z

@koz4k OK, thanks for explaining that.

LukeB42 · 2018-02-01T14:59:26Z

@koz4k Following your suggestion using

python train.py --exp TEST --frame_sizes 16 4 --n_rnn 2 --dataset custom --batch_size 64

I'm getting the following result:

Traceback (most recent call last):
  File "train.py", line 360, in <module>
    main(**vars(parser.parse_args()))
  File "train.py", line 258, in main
    trainer.run(params['epoch_limit'])
  File "pytorch-samplernn/trainer/__init__.py", line 56, in run
    self.train()
  File "pytorch-samplernn/trainer/__init__.py", line 61, in train
    enumerate(self.dataset, self.iterations + 1):
  File "pytorch-samplernn/dataset.py", line 51, in __iter__
    for batch in super().__iter__():
  File "/usr/local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 188, in __next__
    batch = self.collate_fn([self.dataset[i] for i in indices])
  File "/usr/local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 96, in default_collate
    return torch.stack(batch, 0, out=out)
  File "/usr/local/lib/python3.6/site-packages/torch/functional.py", line 64, in stack
    return torch.cat(inputs, dim)
RuntimeError: inconsistent tensor sizes at /pytorch/torch/lib/TH/generic/THTensorMath.c:2864

What do you suggest I do to fix this for the time being?

comeweber · 2018-02-01T15:29:11Z

Are you sure that all the .wav files in your dataset directory have the same duration ?

LukeB42 · 2018-02-01T19:11:50Z

@comeweber @koz4k Many thanks for your help, both of you. It's now stably training!
Using wav files that're 8 seconds long and --batch_size of 32. Many thanks.

niuqun · 2018-05-11T12:37:38Z

@LukeB42 Could you share your file structure with custom folder? I cannot use the youtube-dl to generate the training data right now, so I download a audio file myself. Although I have 8 seconds chunks, the training goes wrong with following errors:

Traceback (most recent call last):
File "train.py", line 360, in
main(**vars(parser.parse_args()))
File "train.py", line 258, in main
trainer.run(params['epoch_limit'])
File "/root/Documents/samplernn-pytorch-master/trainer/init.py", line 56, in run
self.train()
File "/root/Documents/samplernn-pytorch-master/trainer/init.py", line 61, in train
enumerate(self.dataset, self.iterations + 1):
File "/root/Documents/samplernn-pytorch-master/dataset.py", line 51, in iter
for batch in super().iter():
File "/root/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 264, in next
batch = self.collate_fn([self.dataset[i] for i in indices])
File "/root/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 115, in default_collate
return torch.stack(batch, 0, out=out)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 353344 and 352320 in dimension 1 at /opt/conda/conda-bld/pytorch_1524584710464/work/aten/src/TH/generic/THTensorMath.c:3586

koz4k · 2018-09-21T19:51:31Z

You most likely have chunks of not exactly equal length. Many tools for chunking audio files tend to do that. You can use ffmpeg, it cuts the files cleanly. See the downloading script for an example.

koz4k added the duplicate label Jan 25, 2018

koz4k closed this as completed Sep 21, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Division by Zero when training #12

Division by Zero when training #12

LukeB42 commented Jan 13, 2018

sbl commented Jan 21, 2018

koz4k commented Jan 25, 2018

LukeB42 commented Jan 31, 2018 •

edited

koz4k commented Jan 31, 2018

LukeB42 commented Feb 1, 2018

LukeB42 commented Feb 1, 2018

comeweber commented Feb 1, 2018

LukeB42 commented Feb 1, 2018

niuqun commented May 11, 2018

koz4k commented Sep 21, 2018

Division by Zero when training #12

Division by Zero when training #12

Comments

LukeB42 commented Jan 13, 2018

sbl commented Jan 21, 2018

koz4k commented Jan 25, 2018

LukeB42 commented Jan 31, 2018 • edited

koz4k commented Jan 31, 2018

LukeB42 commented Feb 1, 2018

LukeB42 commented Feb 1, 2018

comeweber commented Feb 1, 2018

LukeB42 commented Feb 1, 2018

niuqun commented May 11, 2018

koz4k commented Sep 21, 2018

LukeB42 commented Jan 31, 2018 •

edited