Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vocal tract length perturbation #139

Closed
wants to merge 3 commits into from

Conversation

akashrajkn
Copy link
Contributor

@akashrajkn akashrajkn commented May 1, 2022

Implemented VLTP as introduced in http://www.cs.toronto.edu/~hinton/absps/perturb.pdf. Adopted from the numpy code here: https://github.com/makcedward/nlpaug/blob/master/nlpaug/model/audio/vtlp.py

Additional notes:

  • Only supports single channel for now
  • At the moment, the entire batch will use one (randomized) warp factor
  • Added a sanity test case

Fixes #115

@akashrajkn akashrajkn changed the title Implemented asteroid-team/torch-audiomentations#115: Vocal tract length perturbation Vocal tract length perturbation May 1, 2022
@iver56 iver56 self-requested a review May 1, 2022 18:29
@iver56
Copy link
Collaborator

iver56 commented May 3, 2022

Thanks for the pull request :) I'll find time to review this

Copy link
Collaborator

@iver56 iver56 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting! I will have another look at this soon, but I've left some initial comments.

  • I also want to add a mono voice recording in the demo, so I can try it on that.

I understand that this transform does not support stereo/multichannel audio yet. Is that right? Would it be hard to add support for that? The other transforms in torch-audiomentation support multichannel audio

torch_audiomentations/augmentations/vtlp.py Outdated Show resolved Hide resolved
torch_audiomentations/augmentations/vtlp.py Outdated Show resolved Hide resolved
@iver56
Copy link
Collaborator

iver56 commented May 6, 2022

I added VTLP to the demo script and added a speech example there:
akashrajkn#1

I listened to the outputs, and it very much resembles a band stop filter. Is that what you intended? I've attached the sounds in this zip:

vtlp_demo_output.zip

And I have another question: is this a different technique? https://www.isca-speech.org/archive/pdfs/interspeech_2019/kim19_interspeech.pdf

@iver56
Copy link
Collaborator

iver56 commented May 6, 2022

vtlp_spectrograms

class VTLP(BaseWaveformTransform):
"""
Apply Vocal Tract Length Perturbation as defined in
http://www.cs.toronto.edu/~hinton/absps/perturb.pdf
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if you could explain a bit more what this transform actually does, in an "explain it like I'm five" fashion, so that the average developer (including me) can understand

@iver56
Copy link
Collaborator

iver56 commented May 6, 2022

I'm trying to learn about what Vocal Tract Length Perturbation is. Reading the paper, I get the idea that it's about frequency warping

For VTLP, we generate a random warp factor α for each utterance, and warp the freqency axis, such that a frequency f is mapped to a new frequency f ′

But I don't see frequencies getting re-mapped in the spectrogram gif I posted above

I also found this video online:

https://www.youtube.com/watch?v=vCDnfUM6gn8

Could you try to enlighten me? Do you have some reference examples of what VTLP should sound like? Am I missing something, or is there a bug in your implementation?

@akashrajkn
Copy link
Contributor Author

akashrajkn commented May 6, 2022

I understand that this transform does not support stereo/multichannel audio yet. Is that right?

  • That is right. I will modify it to support stereo audio

Is that the case here, or is it enough to just use the sample_rate you get from forward calls?

  • You are right, it is not required in the init function. I'll change it accordingly

And I have another question: is this a different technique? https://www.isca-speech.org/archive/pdfs/interspeech_2019/kim19_interspeech.pdf

I wasn't aware of this paper, I will read it and get back to you

Do you have some reference examples of what VTLP should sound like?

  • I postponed adding tests in this PR. I will add some reference examples

Am I missing something, or is there a bug in your implementation?

I think maybe there is an issue with my implementation. I will check it out and comment here (probably on the weekend)

@iver56
Copy link
Collaborator

iver56 commented Feb 7, 2024

Closing for inactivity. Feel free to suggest a reopen later if the work gets picked up again

@iver56 iver56 closed this Feb 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement Vocal Tract Length Perturbation
2 participants