How to Compute Speaker Embeddings in 24K? #2552
Replies: 3 comments 3 replies
-
You can found this in speaker config file: |
Beta Was this translation helpful? Give feedback.
-
The spk-encoder model (that creates spk-embedding) has been trained on 16k hz sample rate. You will have to resample the dataset for creating embeddings. You can train TTS on 22kHz. |
Beta Was this translation helpful? Give feedback.
-
if you want to use different sample rates, you should use your own custom formatter functions for two methods, one for load_tts_samples:
and one for embedding compute:
But it is not all, you also have to change mapping that will map audio file embedding to absolute filename path. So my method was: so this is the example of my script, to start train DE with thorsten and custom_formatters with different sample rates for train and for embedding. |
Beta Was this translation helpful? Give feedback.
-
I set SAMPLE_RATE = 24000 in recipes/vctk/yourtts/train_yourtts.py.
However, it loads AudioProcessor in 16k and computes speaker embeddings in 16k.
Where is this 16k sample rate hard coded?
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions