Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tensorflow-tts-normalize: "UnboundLocalError: local variable 'subdir' referenced before assignment" #19

Closed
ZDisket opened this issue May 31, 2020 · 15 comments
Assignees
Labels
bug 🐛 Something isn't working question ❓ Further information is requested

Comments

@ZDisket
Copy link
Collaborator

ZDisket commented May 31, 2020

I've formatted my dataset like the LJSpeech one in the README so I can skip writing a dataloader for finetuning.
This is my directory
image
And this is my metadata.csv. I've made it fileid|transcription|transcription because in ljspeech.py there was text = parts[2] which was giving me index out of range errors with just fileid|trans
image
And this is a small portion of os.listdir("wavs")

file0816.wav
file0039.wav
file2292.wav
file2433.wav
file0794.wav
file1314.wav
file2486.wav
file0695.wav
file2564.wav

All the preprocessing steps run fine until the normalization one:

Traceback (most recent call last):
  File "/usr/local/bin/tensorflow-tts-normalize", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_tts/bin/normalize.py", line 115, in main
    np.save(os.path.join(args.outdir, subdir, "norm-feats", f"{utt_id}-norm-feats.npy"),
UnboundLocalError: local variable 'subdir' referenced before assignment

Am I doing something wrong?

@dathudeptrai
Copy link
Collaborator

@ZDisket can u check the code in tensorflow_tts/bin/normalize.py. That mean somehow ur utt_id return not exist in both train_utt_ids.npy and valid_utt_ids.npy

@ZDisket
Copy link
Collaborator Author

ZDisket commented May 31, 2020

@dathudeptrai
I've gone over the function, added subdir = "train" to stop it from erroring out and some functions to dump the lists:
allids.txt is the ids dumped inside the for items in tqdm(dataset): loop
validids.txt and trainids.txt is the ids for each list defined beforehand.
validids.txt
allids.txt
trainids.txt
I'll add a step to remove the -raw. That seems to be causing the problem.

@ZDisket
Copy link
Collaborator Author

ZDisket commented May 31, 2020

I've replaced utt_id = utt_id[0].numpy().decode("utf-8") with utt_id = utt_id[0].numpy().decode("utf-8").replace("-raw","") and everything worked.

@dathudeptrai dathudeptrai self-assigned this May 31, 2020
@dathudeptrai dathudeptrai added the bug 🐛 Something isn't working label May 31, 2020
@dathudeptrai
Copy link
Collaborator

@ZDisket so it's not my bug, right ?. BTW, what model you will training ?

@ZDisket
Copy link
Collaborator Author

ZDisket commented May 31, 2020

@dathudeptrai

so it's not my bug, right ?.

I think it's your bug, I haven't done anything to change its behavior so that it would act like that, only to fix it.

BTW, what model you will training ?

I'm trying to fine-tune the pre-existing LJSpeech model with that of a fictional character. By the way, is it normal for it to take 1 hour and 30 minutes for 1200 iterations (12 epochs) on a Tesla P100? Seeing the alignment figures, I'm assuming 1 iteration from this implementation is worth like 20 or 30 from the others I'm used to.

@dathudeptrai
Copy link
Collaborator

dathudeptrai commented May 31, 2020

@ZDisket For tacotron-2, the training speed is 4s/1it on 2080Ti, Fs is 3it/1s, melgan is 5it/s, melgan.stft is 2.5it/s. As far as i know, the speed of tacotron-2 is comparable with nvidia-pytorch, fs and melgan is a fastest training speed rightnow over all framework i tried. BTW, I don't know why ur utt_ids have "raw". On ljspeech processor, you can see:

utt_id":` self.items[idx][1].split("/")[-1].split('.')[0]

that mean if ur wav_path is ./.../file2292.wav then utt_id return file2292.

@ZDisket
Copy link
Collaborator Author

ZDisket commented May 31, 2020

@dathudeptrai
That's good, mine is 4.50s/it in Tacotron2.

I don't know why ur utt_ids have "raw".

Neither do I.
And one question just to know: since there's only one pretrained model available, would that make mine the 2nd one to ever be trained with this repo?

@dathudeptrai
Copy link
Collaborator

@ZDisket

And one question just to know: since there's only one pretrained model available, would that make mine the 2nd one to ever be trained with this repo?

Maybe :)). The repo just released 18 hours ago.

BTW, if you see any models here training slower than other repo, let me know :)) i will focus to make it training and convergence faster :D

@ZDisket
Copy link
Collaborator Author

ZDisket commented Jun 1, 2020

@dathudeptrai

BTW, if you see any models here training slower than other repo, let me know :))

I can't say what compared to NVIDIA/Tacotron2, but your repo is already looking better than ESPNet.
predictions.zip
I'll say when I start doing inference at 20 epochs.

@dathudeptrai
Copy link
Collaborator

@ZDisket what is ur dataset do u use ? and how many samples ?

@ZDisket
Copy link
Collaborator Author

ZDisket commented Jun 1, 2020

@dathudeptrai
9019 seconds of high-quality audio of a female speaker on varying emotions distributed over 3489 files.

@dathudeptrai
Copy link
Collaborator

dathudeptrai commented Jun 1, 2020

@ZDisket seem dataset small :)).

@dathudeptrai dathudeptrai reopened this Jun 1, 2020
@dathudeptrai dathudeptrai added the question ❓ Further information is requested label Jun 1, 2020
@ZDisket
Copy link
Collaborator Author

ZDisket commented Jun 1, 2020

@dathudeptrai

seem dataset small :)).

That's the best-case scenario of all the datasets I have. By the way, to fine-tune MelGAN-STFT, I load the pretrained weights for both the discriminator and generator or only one?

@dathudeptrai
Copy link
Collaborator

@ZDisket

By the way, to fine-tune MelGAN-STFT, I load the pretrained weights for both the discriminator and generator or only one?

That is ur choice :))), i think both :D

@dathudeptrai
Copy link
Collaborator

i close issue :D.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 Something isn't working question ❓ Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants