-
Notifications
You must be signed in to change notification settings - Fork 620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sampler that splits utterances into buckets #83
Conversation
Thanks for this @EgorLakomkin, has this shown an improvement of memory usage when profiled or checking nvidia-smi once this kicks in? EDIT: initial tests has shown improvements in memory usage over random sampling across the entire dataset. Just going to add a few comments! |
I am also testing on librispeech+ted+voxforge for a couple of epochs to measure if memory blows up as in random sampling. |
I've run a few epochs of librispeech 1k hrs training with this branch and still ended up with a random OOM. I think this is a step in the right direction, but I wonder if we can't take it a step further to try to guarantee a maximum sequence length. |
Strange, mine is working quit well. |
Nope. Not limiting the length anywhere (where are you doing that, by the way?). I think the longest utterance in the LibriSpeech corpus is something like 35 seconds. |
There is an option in merge_manifests script to limit minimum or maximum length. 35 seconds is indeed long, I guess for such long utterances the only solution would be to decrease that batch size. |
…king for files with space in the path
I have tried several times and also caught OOM. Set-up is batch_size 16, gru 5 layer, 1024 hidden size, max length 15. Now I have added one more thing: del loss & del out references during training. I have seen somewhere that it might help with OOMs. The same set-up is working for me now without OOMs |
Also saw that in the forums recently. Glad that's helping. |
…rd, small style changes
# Conflicts: # train.py
data/bucketing_sampler.py
Outdated
self.bins_to_samples[bin_id].append(idx) | ||
|
||
class BucketingSampler(Sampler): | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mind adding a little description as to what this bucket sampler does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added few comments on SpectrogramDatasetWithLength
train.py
Outdated
@@ -52,6 +53,7 @@ | |||
parser.add_argument('--tensorboard', dest='tensorboard', action='store_true', help='Turn on tensorboard graphing') | |||
parser.add_argument('--log_dir', default='visualize/deepspeech_final', help='Location of tensorboard log') | |||
parser.add_argument('--log_params', dest='log_params', action='store_true', help='Log parameter values and gradients') | |||
parser.add_argument('--bucketing', dest='bucketing', action='store_true', help='Split utterances into buckets and sample from them') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think 'bucketing' should replace sorta grad due to the save in memory, would it make sense to make this the default, and replace this parameter with --no_sorta_grad
that replaces the sampler before starting the training process?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concur
Thought I'd help out, finished the review and dealt with merge commits on the main branch. Let me know what you think! |
Looks good! Thank you :) |
Pulling in this now and thanks for the |
I think it might be a solution to #30. Instead of random sampling, we could split firstly the whole dataset into several buckets depending on the length of the utterance and then randomly sample from each of the buckets. As a result, we can have batches with similar length utterances.