Skip to content

support other audio format #1616

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jan 11, 2021
Merged

support other audio format #1616

merged 7 commits into from
Jan 11, 2021

Conversation

khursani8
Copy link
Contributor

@khursani8 khursani8 commented Jan 9, 2021

Might solve #1549

Currently I let pydub handle file that is not .wav format because there is something I'm not understand with current implementation.

  1. "samples.transpose()" before init with cls
  2. User also need to install ffmpeg to be able to load non wav file format, I don't know where to put the warning or write a script when user pip install
  3. I'm not sure if NeMo dataloader will cache the loaded samples because it might convert using ffmpeg everytime it try to load non wav file.

So far I only tested inference with mp3 audio from Common Voice with model trained on wav file format, there is no problem with the prediction result.

Signed-off-by: khursani8 <khursani8@gmail.com>
Signed-off-by: khursani8 <khursani8@gmail.com>
Signed-off-by: khursani8 <khursani8@gmail.com>
Signed-off-by: khursani8 <khursani8@gmail.com>
@blisc blisc self-requested a review January 11, 2021 18:22
Copy link
Collaborator

@blisc blisc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. It looks good to me.
I just added a slight check to ensure that it will continue to work with io.bufferedreader that we use for tarred datasets.

@blisc blisc merged commit 212b1f3 into NVIDIA:main Jan 11, 2021
borisfom pushed a commit to borisfom/NeMo that referenced this pull request Jan 12, 2021
* support other audio format

Signed-off-by: khursani8 <khursani8@gmail.com>

* wrong order

Signed-off-by: khursani8 <khursani8@gmail.com>

* load until duration

Signed-off-by: khursani8 <khursani8@gmail.com>

* convert duration to seconds

Signed-off-by: khursani8 <khursani8@gmail.com>

* add check that audio_file is a path

Signed-off-by: Jason <jasoli@nvidia.com>

Co-authored-by: Jason <jasoli@nvidia.com>
Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
@yoks
Copy link

yoks commented Jan 13, 2021

This MR, make training so much slower. Pydub takes too much of CPU to convert media files. Need to make sure pydub is only used for unsupported media formats, right now we using FLAC, and it goes through pydub, but libsnd supports flac.

image

Part in the middle is with MR, and parts before and after are without it. Training done on DGX A100.

@blisc
Copy link
Collaborator

blisc commented Jan 13, 2021

Ok that's a problem. I'll go ahead and change the logic a bit

@khursani8
Copy link
Contributor Author

khursani8 commented Jan 13, 2021

Hi @blisc,
I create hotfix for this bug

@blisc
Copy link
Collaborator

blisc commented Jan 13, 2021

@yoks, can you try #1630 and see if performance returns?

@yoks
Copy link

yoks commented Jan 13, 2021

Will test as soon as we will have free video card available on DGX (tomorrow most likely)

@yoks
Copy link

yoks commented Jan 13, 2021

Yea looks like it works as before (maybe a little bit slower, but can be related to task/node load).

@rbracco
Copy link
Contributor

rbracco commented Jan 22, 2021

Is there a reason this isn't using torchaudio? It supports multiple formats in a single library.

@khursani8
Copy link
Contributor Author

I don’t have experience using torchaudio, after take a look at their repo I think torchaudio will be better for this, support multiple audio format including kaldi format. Maybe someone know how to use torchaudio better can contribute to this

@mthrok
Copy link

mthrok commented Jan 22, 2021

I don’t have experience using torchaudio, after take a look at their repo I think torchaudio will be better for this, support multiple audio format including kaldi format. Maybe someone know how to use torchaudio better can contribute to this

Hi

torchaudio maintainer here. I commented in #1549 but the current release of torchaudio does not support file-like object. I am adding that support right now and if will be released in the next release. (Early March).
Is there any other requirements that NeMo has?

@khursani8
Copy link
Contributor Author

Hmmm, I'm not related to NeMo actually, I also don't know the requirements, I think @blisc can answer this question.

@rbracco
Copy link
Contributor

rbracco commented Jan 28, 2021

Theres a very detailed and helpful response from @titu1994 on why torchaudio isn't used as a dependency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants