Vocal tract length perturbation #139

akashrajkn · 2022-05-01T16:47:55Z

Implemented VLTP as introduced in http://www.cs.toronto.edu/~hinton/absps/perturb.pdf. Adopted from the numpy code here: https://github.com/makcedward/nlpaug/blob/master/nlpaug/model/audio/vtlp.py

Additional notes:

Only supports single channel for now
At the moment, the entire batch will use one (randomized) warp factor
Added a sanity test case

Fixes #115

iver56 · 2022-05-03T15:27:14Z

Thanks for the pull request :) I'll find time to review this

iver56

Interesting! I will have another look at this soon, but I've left some initial comments.

I also want to add a mono voice recording in the demo, so I can try it on that.

I understand that this transform does not support stereo/multichannel audio yet. Is that right? Would it be hard to add support for that? The other transforms in torch-audiomentation support multichannel audio

torch_audiomentations/augmentations/vtlp.py

iver56 · 2022-05-06T07:07:40Z

I added VTLP to the demo script and added a speech example there:
akashrajkn#1

I listened to the outputs, and it very much resembles a band stop filter. Is that what you intended? I've attached the sounds in this zip:

vtlp_demo_output.zip

And I have another question: is this a different technique? https://www.isca-speech.org/archive/pdfs/interspeech_2019/kim19_interspeech.pdf

iver56 · 2022-05-06T07:15:24Z

iver56 · 2022-05-06T07:41:37Z

torch_audiomentations/augmentations/vtlp.py

+class VTLP(BaseWaveformTransform):
+    """
+    Apply Vocal Tract Length Perturbation as defined in 
+    http://www.cs.toronto.edu/~hinton/absps/perturb.pdf


It would be nice if you could explain a bit more what this transform actually does, in an "explain it like I'm five" fashion, so that the average developer (including me) can understand

iver56 · 2022-05-06T08:07:23Z

I'm trying to learn about what Vocal Tract Length Perturbation is. Reading the paper, I get the idea that it's about frequency warping

For VTLP, we generate a random warp factor α for each utterance, and warp the freqency axis, such that a frequency f is mapped to a new frequency f ′

But I don't see frequencies getting re-mapped in the spectrogram gif I posted above

I also found this video online:

https://www.youtube.com/watch?v=vCDnfUM6gn8

Could you try to enlighten me? Do you have some reference examples of what VTLP should sound like? Am I missing something, or is there a bug in your implementation?

akashrajkn · 2022-05-06T13:19:05Z

I understand that this transform does not support stereo/multichannel audio yet. Is that right?

That is right. I will modify it to support stereo audio

Is that the case here, or is it enough to just use the sample_rate you get from forward calls?

You are right, it is not required in the init function. I'll change it accordingly

And I have another question: is this a different technique? https://www.isca-speech.org/archive/pdfs/interspeech_2019/kim19_interspeech.pdf

I wasn't aware of this paper, I will read it and get back to you

Do you have some reference examples of what VTLP should sound like?

I postponed adding tests in this PR. I will add some reference examples

Am I missing something, or is there a bug in your implementation?

I think maybe there is an issue with my implementation. I will check it out and comment here (probably on the weekend)

iver56 · 2024-02-07T10:25:16Z

Closing for inactivity. Feel free to suggest a reopen later if the work gets picked up again

Implemented asteroid-team#115: Vocal tract length perturbation

bb5a083

akashrajkn changed the title ~~Implemented asteroid-team/torch-audiomentations#115: Vocal tract length perturbation~~ Vocal tract length perturbation May 1, 2022

iver56 self-requested a review May 1, 2022 18:29

iver56 requested changes May 4, 2022

View reviewed changes

torch_audiomentations/augmentations/vtlp.py Outdated Show resolved Hide resolved

torch_audiomentations/augmentations/vtlp.py Outdated Show resolved Hide resolved

iver56 reviewed May 6, 2022

View reviewed changes

akashrajkn added 2 commits May 10, 2022 19:22

Renamed VTLP to VocalTractLengthPerturbation

cbc3b0b

Removed sample_rate from init

05c3f41

iver56 force-pushed the master branch from 57fd377 to 643f320 Compare June 29, 2022 09:15

iver56 closed this Feb 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vocal tract length perturbation #139

Vocal tract length perturbation #139

akashrajkn commented May 1, 2022 •

edited

iver56 commented May 3, 2022

iver56 left a comment •

edited

iver56 commented May 6, 2022

iver56 commented May 6, 2022

iver56 May 6, 2022

iver56 commented May 6, 2022

akashrajkn commented May 6, 2022 •

edited

iver56 commented Feb 7, 2024

Vocal tract length perturbation #139

Vocal tract length perturbation #139

Conversation

akashrajkn commented May 1, 2022 • edited

iver56 commented May 3, 2022

iver56 left a comment • edited

Choose a reason for hiding this comment

iver56 commented May 6, 2022

iver56 commented May 6, 2022

iver56 May 6, 2022

Choose a reason for hiding this comment

iver56 commented May 6, 2022

akashrajkn commented May 6, 2022 • edited

iver56 commented Feb 7, 2024

akashrajkn commented May 1, 2022 •

edited

iver56 left a comment •

edited

akashrajkn commented May 6, 2022 •

edited