Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot build model from audio files with a length of 3 seconds #98

Closed
m-haecker opened this issue Dec 14, 2019 · 6 comments
Closed

Cannot build model from audio files with a length of 3 seconds #98

m-haecker opened this issue Dec 14, 2019 · 6 comments

Comments

@m-haecker
Copy link

I'm trying to create my own model. Google's Command Speech Set serves as the basis. Additionally I have six keywords (alexa / jarvis / computer are three of them), which are longer than 1 second. Therefore I brought all WAVs to a length of 3 seconds (many have silence at the end). Then I call:

python -m utils.train --wanted_words alexa jarvis computer down left right learn dog sheila marvin --dev_every 1 --n_labels 12 --n_epochs 26 --weight_decay 0.00001 --lr 0.1 0.01 0.001 --schedule 3000 6000 --input_length 48000 --model res8 --no_cuda true --pos_key_size 1000 --data_folder ./speech_commands_v0.02/ --output_file ./speech_commands_v0.02/model.pt
(input_length is set to 48000 because of the audio lengths)

However, this leads to the following error:

File "workspace/voice/honk/utils/model.py", line 258, in collate_fn
audio_tensor = torch.from_numpy(self.audio_processor.compute_mfccs(audio_data).reshape(1, 101, 40))
ValueError: cannot reshape array of size 12040 into shape (1,101,40)

I don't know what to do with the message or how to fix it.
When adding param "--audio_preprocess_type PCEN" I am able to create the model. From this I can also create the file with the weights and use it in Honkling. But the recognition doesn't work at all. It constantly recognizes "computer" and nothing else, even if this keyword is not spoken at all or something is spoken at all.

What can I do to make it work?

@daemon
Copy link
Member

daemon commented Dec 14, 2019

Ah, that's a bug. Can you try replacing that line with
audio_tensor = torch.from_numpy(self.audio_processor.compute_mfccs(audio_data).reshape(1, -1, 40))?

@m-haecker
Copy link
Author

Yes, thank you very much, the change allowed me to create the model 👍🏻
The final test accuracy was 0.921, which sounds not too bad. (I have used only 10 epochs).

So I have created the JS-file with the weigths based on this model and fed it into Honkling. But with Honkling the recognition is poorly bad and almost all keywords are recognized wrong.

Maybe this is because I padded all WAVs to a length of 3 seconds and all WAVs from Google's Speech Command Set now have 2 seconds of silence at the end? But the additional keywords I added just have a length of 1.5 to 3 seconds and I didn't know what a better way would have been.

Or do I have to adjust the code in Honk or Honkling to get along with WAVs > 1 second?

@daemon
Copy link
Member

daemon commented Dec 15, 2019

@ljj7975 can best answer any Honkling questions. My guess is that Honkling also needs to be modified to support three-second audio.

Alternatively, if you'd like to support variable-length audio, you can use either a CTC-based decoder or one of those streaming seq2seq models.

@ljj7975
Copy link
Member

ljj7975 commented Dec 15, 2019

Though I tried my best to make Honkling configurable,
I have never tried supporting audio longer than 1 second.

Can you try updating these?
https://github.com/castorini/honkling/blob/master/common/config.js#L9
https://github.com/castorini/honkling/blob/master/common/config.js#L186
https://github.com/castorini/honkling/blob/master/common/offlineAudioProcessor.js#L12

@m-haecker
Copy link
Author

I haven't managed it yet. But now I know where to start, hopefully to get it to work.
Thank you very much!

@ljj7975
Copy link
Member

ljj7975 commented Dec 17, 2019

That is great to hear.
Please let us know if you get it working so we can add the instruction on our page.
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants