Will this work with 44100hz audio? #23

dillfrescott · 2023-08-25T03:07:08Z

Just wondering because this project seems great but 16000hz is a bit too low frequency for my needs.

RF5 · 2023-08-25T19:58:03Z

TL;DR: the kNN-VC method does work for any sample rate, but the pretrained models we provide are only trained for 16kHz.

Slightly longer version: the current componets of the pretrained model that assume 16kHz are (1) the WavLM encoder, and (2) the hifigan vocoder -- both were trained on 16kHz audio and won't work nicely if you just feed in 44.1kHz audio to the existing 16kHz models. The good news is that both components (hifigan and WavLM) can fairly easily be adapted to work for 44.1kHz by just changing the audio and mel-spectrogram formats, so to get a 44.1kHz kNN-VC you would just need to train them on 44.1kHz audio.

Training WavLM on 44.1kHz is a bit tough if your hardware is limited, so as a shortcut you can just train the HiFiGAN to vocode 44.1kHz audio. This will probably work reasonably well, but not as well as training both WavLM and hifigan on 44.1kHz. To train the HiFiGAN to vocode to 44.1kHz, you would need to do a process roughly like: (1) get 44.1khz dataset, (2) downsample dataset to 16khz and compute WavLM features for the 16kHz data, (3) modify hifigan config here for 44.1kHz audio format, (4) run the hifigan training script in this repo (as described in the readme) on the wavlm features and 44.1khz audio. In the last step you might need to slightly modify the hifigan training code to allow for WavLM features to be from different sample rate audio than the hifigan output.

Hope that helps!

dillfrescott · 2023-08-26T06:42:35Z

Thank you so much for the detailed response! <3 I really appreciate it!

EmreOzkose · 2023-12-17T19:21:47Z

Hi, @RF5 should I change hop_size or win_size to train with different sampling rate? Or, is it enough to change 'sampling_rate' parameter?

RF5 · 2023-12-17T20:22:21Z

It might be a good idea to change both hop size and win size to keep the number of milliseconds in each frame constant, but to just get it to work, just changing the sampling_rate param should be sufficient.

dillfrescott closed this as completed Aug 26, 2023

youssefabdelm mentioned this issue Nov 22, 2023

How to plug-in new finetuned HiFiGAN? #34

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Will this work with 44100hz audio? #23

Will this work with 44100hz audio? #23

dillfrescott commented Aug 25, 2023

RF5 commented Aug 25, 2023

dillfrescott commented Aug 26, 2023

EmreOzkose commented Dec 17, 2023

RF5 commented Dec 17, 2023

Will this work with 44100hz audio? #23

Will this work with 44100hz audio? #23

Comments

dillfrescott commented Aug 25, 2023

RF5 commented Aug 25, 2023

dillfrescott commented Aug 26, 2023

EmreOzkose commented Dec 17, 2023

RF5 commented Dec 17, 2023