-
Notifications
You must be signed in to change notification settings - Fork 3.1k
support other audio format #1616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: khursani8 <khursani8@gmail.com>
Signed-off-by: khursani8 <khursani8@gmail.com>
Signed-off-by: khursani8 <khursani8@gmail.com>
Signed-off-by: khursani8 <khursani8@gmail.com>
Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. It looks good to me.
I just added a slight check to ensure that it will continue to work with io.bufferedreader that we use for tarred datasets.
* support other audio format Signed-off-by: khursani8 <khursani8@gmail.com> * wrong order Signed-off-by: khursani8 <khursani8@gmail.com> * load until duration Signed-off-by: khursani8 <khursani8@gmail.com> * convert duration to seconds Signed-off-by: khursani8 <khursani8@gmail.com> * add check that audio_file is a path Signed-off-by: Jason <jasoli@nvidia.com> Co-authored-by: Jason <jasoli@nvidia.com> Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
|
This MR, make training so much slower. Pydub takes too much of CPU to convert media files. Need to make sure pydub is only used for unsupported media formats, right now we using FLAC, and it goes through pydub, but libsnd supports flac. Part in the middle is with MR, and parts before and after are without it. Training done on DGX A100. |
|
Ok that's a problem. I'll go ahead and change the logic a bit |
|
Will test as soon as we will have free video card available on DGX (tomorrow most likely) |
|
Yea looks like it works as before (maybe a little bit slower, but can be related to task/node load). |
|
Is there a reason this isn't using torchaudio? It supports multiple formats in a single library. |
|
I don’t have experience using torchaudio, after take a look at their repo I think torchaudio will be better for this, support multiple audio format including kaldi format. Maybe someone know how to use torchaudio better can contribute to this |
Hi torchaudio maintainer here. I commented in #1549 but the current release of torchaudio does not support file-like object. I am adding that support right now and if will be released in the next release. (Early March). |
|
Hmmm, I'm not related to NeMo actually, I also don't know the requirements, I think @blisc can answer this question. |
|
Theres a very detailed and helpful response from @titu1994 on why torchaudio isn't used as a dependency. |

Might solve #1549
Currently I let pydub handle file that is not .wav format because there is something I'm not understand with current implementation.
So far I only tested inference with mp3 audio from Common Voice with model trained on wav file format, there is no problem with the prediction result.