New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot build model from audio files with a length of 3 seconds #98
Comments
Ah, that's a bug. Can you try replacing that line with |
Yes, thank you very much, the change allowed me to create the model 👍🏻 So I have created the JS-file with the weigths based on this model and fed it into Honkling. But with Honkling the recognition is poorly bad and almost all keywords are recognized wrong. Maybe this is because I padded all WAVs to a length of 3 seconds and all WAVs from Google's Speech Command Set now have 2 seconds of silence at the end? But the additional keywords I added just have a length of 1.5 to 3 seconds and I didn't know what a better way would have been. Or do I have to adjust the code in Honk or Honkling to get along with WAVs > 1 second? |
@ljj7975 can best answer any Honkling questions. My guess is that Honkling also needs to be modified to support three-second audio. Alternatively, if you'd like to support variable-length audio, you can use either a CTC-based decoder or one of those streaming seq2seq models. |
Though I tried my best to make Honkling configurable, Can you try updating these? |
I haven't managed it yet. But now I know where to start, hopefully to get it to work. |
That is great to hear. |
I'm trying to create my own model. Google's Command Speech Set serves as the basis. Additionally I have six keywords (alexa / jarvis / computer are three of them), which are longer than 1 second. Therefore I brought all WAVs to a length of 3 seconds (many have silence at the end). Then I call:
python -m utils.train --wanted_words alexa jarvis computer down left right learn dog sheila marvin --dev_every 1 --n_labels 12 --n_epochs 26 --weight_decay 0.00001 --lr 0.1 0.01 0.001 --schedule 3000 6000 --input_length 48000 --model res8 --no_cuda true --pos_key_size 1000 --data_folder ./speech_commands_v0.02/ --output_file ./speech_commands_v0.02/model.pt
(input_length is set to 48000 because of the audio lengths)
However, this leads to the following error:
I don't know what to do with the message or how to fix it.
When adding param "--audio_preprocess_type PCEN" I am able to create the model. From this I can also create the file with the weights and use it in Honkling. But the recognition doesn't work at all. It constantly recognizes "computer" and nothing else, even if this keyword is not spoken at all or something is spoken at all.
What can I do to make it work?
The text was updated successfully, but these errors were encountered: