Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speaker id argument #7

Open
ilnmtlbnm opened this issue May 15, 2020 · 8 comments
Open

Speaker id argument #7

ilnmtlbnm opened this issue May 15, 2020 · 8 comments

Comments

@ilnmtlbnm
Copy link

ilnmtlbnm commented May 15, 2020

There is a Speaker id argument in inference.py : parser.add_argument('-i', '--id', help='Speaker id', type=int).

Whenever I try to change it to something other than 0, I get the following error :

Traceback (most recent call last):
  File "inference.py", line 122, in <module>
    args.n_frames, args.sigma, args.seed)
  File "inference.py", line 63, in infer
    speaker_vecs = trainset.get_speaker_id(speaker_id).cuda()
  File "/data/code/flowtron/data.py", line 83, in get_speaker_id
    return torch.LongTensor([self.speaker_ids[int(speaker_id)]])
KeyError: 2
@karkirowle
Copy link

karkirowle commented May 15, 2020

If you are using the LJS model that might be expected as it is a single speaker model. You could try using the LibrITTS.

@Quasimondo
Copy link

Just a note - when using LibrITTS you will also have to change the n_speakers parameter in config.json to 123:

"model_config": { "n_speakers": 123, "n_speaker_dim": 128, "n_text": 185, "n_text_dim": 512, "n_flows": 2, "n_mel_channels": 80, "n_attn_channels": 640, "n_hidden": 1024, "n_lstm_layers": 2, "mel_encoder_n_hidden": 512, "n_components": 0, "mean_scale": 0.0, "fixed_gaussian": true, "dummy_speaker_embedding": false, "use_gate_layer": true }

@ilnmtlbnm
Copy link
Author

If you are using the LJS model that might be expected as it is a single speaker model. You could try using the LibrITTS.

image

Of course, thanks @karkirowle !
And thanks @Quasimondo for precising n_speakers for LibrITTS.

@ilnmtlbnm
Copy link
Author

ilnmtlbnm commented May 15, 2020

DOH! again, I closed to fast, still doesn't with LibrITTS.

python inference.py -c config.json -f models/flowtron_libritts.pt -w models/waveglow_256channels_v4.pt -t "But the machine only creates what humans have taught it to " -i 15 -n 777 -s 0.5

@ilnmtlbnm ilnmtlbnm reopened this May 15, 2020
@Quasimondo
Copy link

Quasimondo commented May 15, 2020

Yeah - I realized that you will also have to adjust the "data_config" section:
"training_files": "filelists/libritts_train_clean_100_audiopath_text_sid_shorterthan10s_atleast5min_train_filelist.txt"

And lastly you will have to pick a speaker ID that actually exists. They are not numbered consecutively, but you have to look them up in that filelist (it's the numbers at the end of each line)

@ilnmtlbnm
Copy link
Author

ilnmtlbnm commented May 15, 2020

Thanks again @Quasimondo

For reference, here are the valid ids for LibriTTS :

40 78 83 87 118 125 196 200 250 254 374 405 446 460 587 669 696 730 831 887 1069 1088 1116 1246 1263
 1502 1578 1841 1867 1963 1970 2092 2136 2182 2196 2289 2416 2436 2836 2843 2911 2952 3240 3242 3259
 3436 3486 3526 3664 3857 3879 3982 3983 4018 4051 4088 4160 4195 4267 4297 4362 4397 4406 4640 4680
 4788 5022 5104 5322 5339 5393 5652 5678 5703 5750 5808 6019 6064 6078 6081 6147 6181 6209 6272 6367
 6385 6415 6437 6454 6476 6529 6818 6836 6848 7059 7067 7078 7178 7190 7226 7278 7302 7367 7402 7447
 7505 7511 7794 7800 8051 8088 8098 8108 8123 8238 8312 8324 8419 8468 8609 8629 8770 8838

@rafaelvalle
Copy link
Contributor

Thank you for compiling this list!

@yhgon
Copy link

yhgon commented May 19, 2020

I add additional script extract available sid. See below

https://github.com/yhgon/flowtron/blob/master/inference_colab.ipynb

import os
import sys

import pandas as pd 
import numpy as np 
import random
from itertools import cycle
from data import  load_filepaths_and_text

!cat /content/flowtron/filelists/libritts_speakerinfo.txt | tail -n +12  | head -n 10

filelist_path = "/content/flowtron/filelists/libritts_train_clean_100_audiopath_text_sid_shorterthan10s_atleast5min_train_filelist.txt"

def create_speaker_lookup_table(audiopaths_and_text):
    speaker_ids = np.sort(np.unique([x[2] for x in audiopaths_and_text]))
    d = {int(speaker_ids[i]): i for i in range(len(speaker_ids))}
    print("Number of speakers :", len(d))
    return d

audiopaths_and_text = load_filepaths_and_text(filelist_path)
speaker_ids  = create_speaker_lookup_table(audiopaths_and_text).keys() 
print(speaker_ids)
speakers = pd.read_csv('/content/flowtron/filelists/libritts_speakerinfo.txt', engine='python',header=None, comment=';', sep=' *\| *',  names=['ID', 'SEX', 'SUBSET', 'MINUTES', 'NAME'])
speakers['FLOWTRON_ID'] = speakers['ID'].apply(lambda x: x if x in speaker_ids else -1)

female_speakers =   speakers.query("SEX == 'F' and MINUTES > 20 and FLOWTRON_ID >= 0")['FLOWTRON_ID'].sample(frac=1).tolist() 
male_speakers   =   speakers.query("SEX == 'M' and MINUTES > 20 and FLOWTRON_ID >= 0")['FLOWTRON_ID'].sample(frac=1).tolist() 

print("females speakers : ", len(female_speakers), female_speakers )
print("male speakers    : ", len(male_speakers), male_speakers )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants