Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Will this work with 44100hz audio? #23

Closed
dillfrescott opened this issue Aug 25, 2023 · 4 comments
Closed

Will this work with 44100hz audio? #23

dillfrescott opened this issue Aug 25, 2023 · 4 comments

Comments

@dillfrescott
Copy link

Just wondering because this project seems great but 16000hz is a bit too low frequency for my needs.

@RF5
Copy link
Collaborator

RF5 commented Aug 25, 2023

Hi @dillfrescott

TL;DR: the kNN-VC method does work for any sample rate, but the pretrained models we provide are only trained for 16kHz.

Slightly longer version: the current componets of the pretrained model that assume 16kHz are (1) the WavLM encoder, and (2) the hifigan vocoder -- both were trained on 16kHz audio and won't work nicely if you just feed in 44.1kHz audio to the existing 16kHz models. The good news is that both components (hifigan and WavLM) can fairly easily be adapted to work for 44.1kHz by just changing the audio and mel-spectrogram formats, so to get a 44.1kHz kNN-VC you would just need to train them on 44.1kHz audio.

Training WavLM on 44.1kHz is a bit tough if your hardware is limited, so as a shortcut you can just train the HiFiGAN to vocode 44.1kHz audio. This will probably work reasonably well, but not as well as training both WavLM and hifigan on 44.1kHz. To train the HiFiGAN to vocode to 44.1kHz, you would need to do a process roughly like: (1) get 44.1khz dataset, (2) downsample dataset to 16khz and compute WavLM features for the 16kHz data, (3) modify hifigan config here for 44.1kHz audio format, (4) run the hifigan training script in this repo (as described in the readme) on the wavlm features and 44.1khz audio. In the last step you might need to slightly modify the hifigan training code to allow for WavLM features to be from different sample rate audio than the hifigan output.

Hope that helps!

@dillfrescott
Copy link
Author

Thank you so much for the detailed response! <3 I really appreciate it!

@EmreOzkose
Copy link

Hi, @RF5 should I change hop_size or win_size to train with different sampling rate? Or, is it enough to change 'sampling_rate' parameter?

@RF5
Copy link
Collaborator

RF5 commented Dec 17, 2023

It might be a good idea to change both hop size and win size to keep the number of milliseconds in each frame constant, but to just get it to work, just changing the sampling_rate param should be sufficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants