-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use of function "calculateInputSizes(sizes)" in DeepSpeechModel.lua? #54
Comments
A way around this would be to do something like:
If each sample of your output has the same length. Hopefully this helps! |
I have variable length image-samples (same height, varying widths) so the alternate trick won't work. What should be passed in the sizes parameter to the CTCCriterion for loss calculation? (here) From what you suggest, it is the sequence length of the input samples. Can you please confirm? So, in my case of images, since I pass a column-strip of the image at each time-step, sizes would be the width of each image in the batch after having been passed through the SpatialConv layer? |
Sorry for the late response! From what I can tell you will not need to touch the And just to confirm, it is the true length of the input samples AFTER going through the convolutional layers (which reduces the number of timesteps, that's why this is necessary). |
Thank @SeanNaren :) calculateInputSize is really a pretty neat hack! Turns out my problem were the noisy samples in my dataset which had an image-width less than the width of the convolution kernels I was using. Simply removing these corrupted samples from the dataset did the job for me. Thanks again! |
Ah that is a good point! I think it be nice to add this somewhere into the documentation where appropriate, I ran into the same issue a lot when training these models! |
@SeanNaren I can send you a PR once I myself get the codes working fine. The model trains in a weird fashion currently for me. The training loss keeps fluctuating between really small values and inf :/ (Take a look at the train-logs below). Any tips on what might be going wrong? I am checking if this is indeed exploding gradients (not hopeful of exploding-grads as the loss shouldn't have come back to 'non-inf' values once it exploded right?) Training Epoch: 3 Average Loss: 3.046032 Average Validation WER: inf Average Validation CER: inf .. Training Epoch: 33 Average Loss: 0.000093 Average Validation WER: nan Average Validation CER: nan |
Those are some fun losses, have you tried changing |
@SeanNaren I haven't tried that yet. On it. Btw, by cutoff you mean the MaxNorm right? For normalizing gradients? |
Sorry exactly! That is what I meant :) From tests I've done if you keep trying to lower the maxNorm it helps prevent gradients from exploding! |
@SeanNaren I've tested running the codes by linearly bringing down the MaxNorm to a value as low as 10 but I still face the nan losses and inf WER/CER issue. From your experience, should I keep going down further or this is probably not the bug/parameter-tuning I am after? Please help. |
Also, I have tried to reduce the number of RNN-hidden layers to something like 3 instead of the 7 originally. Still no positive signs though. |
This goes against the grain of DS2, but could you try using cudnn.LSTMs instead of RNNs? Try keep the number of parameters around 80 million. LSTMs might help out since they have a lot of improvements to the standard recurrent net! |
@SeanNaren will simply changing this line do the trick here? Replacing that line to, I see that there are BLSTM implementations also available, so just confirming. |
Ah my apologies that would be a bit strange, I'd suggest doing this in the Change: local function RNNModule(inputDim, hiddenDim, opt)
if opt.nGPU > 0 then
require 'BatchBRNNReLU'
return cudnn.BatchBRNNReLU(inputDim, hiddenDim)
else
require 'rnn'
return nn.SeqBRNN(inputDim, hiddenDim)
end
end to something like: local function RNNModule(inputDim, hiddenDim, opt)
require 'cudnn'
local rnn = nn.Sequential()
rnn:add(cudnn.BLSTM(inputDim, hiddenDim, 1)
rnn:add(nn.View(-1, 2, outputDim):setNumInputDims(2)) -- have to sum activations
rnn:add(nn.Sum(3))
return rnn
end I would suggest changing the hidden size dimension to around 700 as the default for an LSTM would be pretty large! |
Thanks a lot for the clarification! Will update with results 😄 |
Open a new issue (maybe something like better convergence with custom dataset), I think people will find this useful! |
@SeanNaren Can you tell me what role does the |
@SeanNaren I would like to know what exactly is the use of the function calculateInputSizes. I am using my own image data for a scenetext task (I have updated the spatial-conv params accordingly).
It looks like it calculates the size of the tensors obtained after passing the inputs through the 2 spat-conv layers. However, this function is called just before doing the forward-backward passes (here) and that 'sizes' parameter is passed to the CTC-criterion.
AFAIK, the sizes passed in the CTC-criterion are the size of the target labels (as shown here) [NOTE : I might be getting it wrong. I posted the PR for updation of documentation on CTC-readme. So if I'm getting this all wrong, I need to update that readme too :P ]. So shouldn't the size-calculation code be something like the one below? (note that I take 'targets' as inputs instead of 'sizes' as was previously)
Please let me know what is going wrong here. I get an error saying...
If I go to the CTCCriterion.lua file at line-74, I see that its simply creating a new tensor
local result = tensor.new():resize(sizes):zero()
. By using the originalcalculateInputSizes
function, my sizes tensor has -ve values and hence there are CUDA out of memory errors being thrown. If I however use my variation of thecalculateInputSizes
function, I'm getting the above stated invalid arguments error. Please help.The text was updated successfully, but these errors were encountered: