from mel_spectrogram to wav again #10

kimchi88 · 2019-06-11T07:00:09Z

Hi,
Do you have any suggestion about how to re-build the audio file after augmentation?

KnowBetterHelps · 2019-07-02T12:05:28Z

The same question I want to ask,too. In my case, use librosa.feature.melspectrogram and then to compute librosa.feature.mfcc is not equal with kaldi's process.

BTW, did you find the way to re-build audio?

kimchi88 · 2019-07-03T03:10:01Z

Hi,
nope.. still nothing.. but I've read some other post and it doesn't seems trivial.. there is a post in Kaldi github repository where developers are discussing about their findings after applying specaugment to existing kaldi recipes. Hope it helps!

KnowBetterHelps · 2019-07-03T04:47:57Z

thank you for your kind reply

I will looking for it

dkakaie · 2019-07-03T08:02:44Z

I spent a few hours yesterday for this. This is what I finally settled upon at least for now. Sorry for the delay in sharing this.
New version of librosa seems to include the functionality we need here, see #844. However this is unreleased yet so you have to install from source. Version 0.7.0rc1 is what I used.
You could do

recov = librosa.feature.inverse.mel_to_audio (M=warped_masked_spectrogram, 
    hop_length=128, sr=sampling_rate)

and use this function to save it

def save_wav (wav, path):
        wav *= 32767 / max (0.01, np.max(np.abs(wav)))
        scipy.io.wavfile.write (path, 16000, wav.astype(np.int16))

kimchi88 · 2019-07-05T03:28:35Z

Hi Roxima,
Thanks for sharing! I'll give it a try :)

dkakaie · 2019-07-05T09:26:48Z

@kimchi88 Great. Looking forward to your results.

kimchi88 · 2019-07-05T16:02:12Z

confirmed! It works perfectly.. next step will be use the augmented audio to improve ASR. thanks for help!

darisettysuneel · 2019-07-23T10:39:48Z

Hi @roxima / @kimchi88,

Can you please confirm the time taken to convert from mel-spectrogram to wav and what is hardware configuration? bcs for me it is taking 2 to 3 min on cpu with 6 cores and 8 gb ram.

dkakaie · 2019-07-23T21:26:04Z

@darisettysuneel As much as I can remember it finishes very quickly. What takes time was augmentation and not saving resulting audio. I'll try to report back to you with a simple benchmark.

darisettysuneel · 2019-08-10T11:20:57Z

Hi @roxima

Any statistics can I get?

Lomax314 · 2019-08-17T05:03:19Z

@roxima Hi, I waste more time when convert mel_spectrogram to wav than augment the wav. Do you have any better solution? Thanks

dkakaie · 2019-08-20T13:46:52Z

@darisettysuneel @Lomax314 So sorry for being late, was as busy as a bee.
I'm on Windows 10, x64, i3-6100U, 8Gb DDR4 RAM, 128GB SSD storage
This is the result for the default sample audio in the repository:

Loaded audio in  0:00:00.509608
Tensorflow finished in  0:00:02.145270
librosa reconstructed audio in  0:00:25.873811
Audio saved in  0:00:00.005016
PyTorch finished in  0:00:00.050832
librosa reconstructed audio in  0:00:29.923980
Audio saved in  0:00:00.004015

As can be seen, reconstructing audio takes much more time compared with augmentations. However I noticed that running this script uses more than 8Gb of my OS drive free space, maybe there is a IO bottleneck?! Running this I get only 141Mb free space.
No, have not found a better solution. Maybe librosa isn't still fully optimized for this stage.

dkakaie · 2019-08-20T13:55:23Z

Previous one used librosa 0.7.0RC1 and this is for the latest 0.7.0 release:

Loaded audio in  0:00:00.512629
Tensorflow finished in  0:00:02.180432
librosa reconstructed audio in  0:00:20.358577
Audio saved in  0:00:00.006011
PyTorch finished in  0:00:00.045847
librosa reconstructed audio in  0:00:43.839765
Audio saved in  0:00:00.004988

One more

Loaded audio in  0:00:00.505621
Tensorflow finished in  0:00:02.230296
librosa reconstructed audio in  0:00:32.860149
Audio saved in  0:00:00.006980
PyTorch finished in  0:00:00.052857
librosa reconstructed audio in  0:00:46.224405
Audio saved in  0:00:00.005985

darisettysuneel · 2019-08-20T14:05:54Z

@roxima Thanks for sharing the statistics! May I know the length of the audio files for provided results.

dkakaie · 2019-08-20T14:09:04Z

@darisettysuneel Your're welcome. Exactly 2s970ms

darisettysuneel · 2019-08-20T14:18:34Z

@roxima For me it is taking ~1.5 minutes for 8-10sec audio. I need to take a look at input data to reconstruction function. Once again thanks.

Lomax314 · 2019-08-20T14:27:33Z

@roxima Very thanks for ur reply! the function of the librosa takes much time for me so that i wish i can find other solution. Once again thanks.

AASHISHAG · 2019-11-27T18:18:50Z

@darisettysuneel @Lomax314 : Did you find any other better method to achieve it?

Lomax314 · 2019-11-28T07:57:29Z

@AASHISHAG I'm sorry about that the answer is NO.However,this method seemd to be implemented in function of the kaldi'repository

AASHISHAG · 2019-11-28T17:59:02Z

@Lomax314 : Thank you for the reply. I will have a look.

If you still have the setup running, could you please help me with the tensorflow and tensorflow_addons and gcc version. I am trying to run the test script as given in the readme but getting some errors on from specAugment import spec_augment_tensorflow

import glob
import scipy
import librosa
import numpy as np
from specAugment import spec_augment_tensorflow

mozilla_augmented = '/mozilla_augmented/clips/*.wav'

for audio_path in glob.iglob(mozilla_augmented):
    print(audio_path)
    audio, sampling_rate = librosa.load(audio_path)
    mel_spectrogram = librosa.feature.melspectrogram(y=audio,
                                                     sr=sampling_rate,
                                                     n_mels=256,
                                                     hop_length=128,
                                                     fmax=8000)
    warped_masked_spectrogram = spec_augment_tensorflow.spec_augment(mel_spectrogram=mel_spectrogram)
    wav = librosa.feature.inverse.mel_to_audio (M=warped_masked_spectrogram, hop_length=128, sr=sampling_rate)
    wav *= 32767 / max (0.01, np.max(np.abs(wav)))
    scipy.io.wavfile.write (audio_path, 16000, wav.astype(np.int16))

junaedifahmi · 2019-12-12T06:37:05Z

@roxima For me it is taking ~1.5 minutes for 8-10sec audio. I need to take a look at input data to reconstruction function. Once again thanks.

It takes me 10 minutes for 10 sec audio for me, the machine have 88 cores with 500GB memory, I use the last code to convert to audio, do you have any better solution? maybe with torch audio? thanks.

AASHISHAG · 2019-12-12T08:29:47Z

@juunnn : Could you please confirm your tensorflow and gcc version? I am facing some dependency issue. I think it has to do with tensorflow and gcc.
The best would be, if you can give the output of the following command: pip3 list

This will list all the versions.

junaedifahmi · 2019-12-16T03:18:17Z

I still have problem with tf dependenci, that's why I use pytorch for them. It works, and don't have a long time to execute, but for some audio it says "output have no finite value everywhere" while compiling back to audio. I dont know what to do,

AASHISHAG · 2019-12-16T08:37:05Z

@juunnn : Could you please share your code, that you wrote with PyTorch dependencies. I don't have exposure to either PyTorch or Tensorflow. It would be really helpful.

I am using the below code and facing dependencies issues.

import glob
import scipy
import librosa
import numpy as np
from specAugment import spec_augment_tensorflow

mozilla_augmented = '/mozilla_augmented/clips/*.wav'

for audio_path in glob.iglob(mozilla_augmented):
    print(audio_path)
    audio, sampling_rate = librosa.load(audio_path)
    mel_spectrogram = librosa.feature.melspectrogram(y=audio,
                                                     sr=sampling_rate,
                                                     n_mels=256,
                                                     hop_length=128,
                                                     fmax=8000)
    warped_masked_spectrogram = spec_augment_tensorflow.spec_augment(mel_spectrogram=mel_spectrogram)
    wav = librosa.feature.inverse.mel_to_audio (M=warped_masked_spectrogram, hop_length=128, sr=sampling_rate)
    wav *= 32767 / max (0.01, np.max(np.abs(wav)))
    scipy.io.wavfile.write (audio_path, 16000, wav.astype(np.int16))

ma7555 · 2020-08-15T19:18:10Z

it indeed takes a lot of time to convert from mel_spectogram to audio, if someone gets across a faster way instead of librosa built in please share.

For a 1 minute audio with 128 mels

CPU times: user 8min 32s, sys: 5min 11s, total: 13min 43s
Wall time: 7min 14s

neel04 · 2021-04-20T18:12:45Z

Any new updates for possibly faster implementations?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

from mel_spectrogram to wav again #10

from mel_spectrogram to wav again #10

kimchi88 commented Jun 11, 2019

KnowBetterHelps commented Jul 2, 2019

kimchi88 commented Jul 3, 2019

KnowBetterHelps commented Jul 3, 2019

dkakaie commented Jul 3, 2019

kimchi88 commented Jul 5, 2019

dkakaie commented Jul 5, 2019

kimchi88 commented Jul 5, 2019

darisettysuneel commented Jul 23, 2019

dkakaie commented Jul 23, 2019

darisettysuneel commented Aug 10, 2019

Lomax314 commented Aug 17, 2019

dkakaie commented Aug 20, 2019

dkakaie commented Aug 20, 2019

darisettysuneel commented Aug 20, 2019

dkakaie commented Aug 20, 2019

darisettysuneel commented Aug 20, 2019 •

edited

Loading

Lomax314 commented Aug 20, 2019

AASHISHAG commented Nov 27, 2019 •

edited

Loading

Lomax314 commented Nov 28, 2019

AASHISHAG commented Nov 28, 2019 •

edited

Loading

junaedifahmi commented Dec 12, 2019

AASHISHAG commented Dec 12, 2019 •

edited

Loading

junaedifahmi commented Dec 16, 2019

AASHISHAG commented Dec 16, 2019

ma7555 commented Aug 15, 2020 •

edited

Loading

neel04 commented Apr 20, 2021

from mel_spectrogram to wav again #10

from mel_spectrogram to wav again #10

Comments

kimchi88 commented Jun 11, 2019

KnowBetterHelps commented Jul 2, 2019

kimchi88 commented Jul 3, 2019

KnowBetterHelps commented Jul 3, 2019

dkakaie commented Jul 3, 2019

kimchi88 commented Jul 5, 2019

dkakaie commented Jul 5, 2019

kimchi88 commented Jul 5, 2019

darisettysuneel commented Jul 23, 2019

dkakaie commented Jul 23, 2019

darisettysuneel commented Aug 10, 2019

Lomax314 commented Aug 17, 2019

dkakaie commented Aug 20, 2019

dkakaie commented Aug 20, 2019

darisettysuneel commented Aug 20, 2019

dkakaie commented Aug 20, 2019

darisettysuneel commented Aug 20, 2019 • edited Loading

Lomax314 commented Aug 20, 2019

AASHISHAG commented Nov 27, 2019 • edited Loading

Lomax314 commented Nov 28, 2019

AASHISHAG commented Nov 28, 2019 • edited Loading

junaedifahmi commented Dec 12, 2019

AASHISHAG commented Dec 12, 2019 • edited Loading

junaedifahmi commented Dec 16, 2019

AASHISHAG commented Dec 16, 2019

ma7555 commented Aug 15, 2020 • edited Loading

neel04 commented Apr 20, 2021

darisettysuneel commented Aug 20, 2019 •

edited

Loading

AASHISHAG commented Nov 27, 2019 •

edited

Loading

AASHISHAG commented Nov 28, 2019 •

edited

Loading

AASHISHAG commented Dec 12, 2019 •

edited

Loading

ma7555 commented Aug 15, 2020 •

edited

Loading