Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to train your own model and apply it? I have come so far but having problem at solver.py #28

Open
FurkanGozukara opened this issue Jan 17, 2021 · 8 comments

Comments

@FurkanGozukara
Copy link

FurkanGozukara commented Jan 17, 2021

Ok I have downloaded visual studio code to debug and understand

I see that make_spect_f0.py is used to generate raptf0 and spmel folders with values

So this make_spect_f0 reads a folder and decides whether it is male voice or female voice from spk2gen.pkl file

So as a beginning I have deleted all folders raptf0 and spmel and wavs

then composed a wavs folder and composed another folder inside wavs as p285 which is a male assigned folder

Then inside p285 I have put my more than 2 hours long wav file myfile.wav

Question 1 : Does it have to be 16k hz and mono? or We can use maximum quality?

After I run make_spect_0.py, it has composed myfile.npy and myfile.npy in raptf0 and spmel folders

Then I did run make_metadata.py and it has composed train.pkl inside spmel

Then when I run main.py I get this below error at solver.py

I want to train a model. I don't want test.

Then I want to use this model to convert style of a speech to the trained model

So I need help thank you

@auspicious3000

image

@FurkanGozukara
Copy link
Author

Here the console output of the run main.py

PS C:\SpeechSplit>  c:; cd 'c:\SpeechSplit'; & 'C:/Python37/python.exe' 'c:\Users\King\.vscode\extensions\ms-python.python-2020.12.424452561\pythonFiles\lib\python\debugpy\launcher' '55577' '--' 'c:\SpeechSplit\main.py'
Namespace(beta1=0.9, beta2=0.999, device_id=0, g_lr=0.0001, log_dir='run/logs', log_step=10, model_save_dir='run/models', model_save_step=1000, num_iters=1000000, resume_iters=None, sample_dir='run/samples', sample_step=1000, use_tensorboard=False)
Hyperparameters:
  freq: 8       
  dim_neck: 8   
  freq_2: 8     
  dim_neck_2: 1 
  freq_3: 8     
  dim_neck_3: 32
  dim_enc: 512  
  dim_enc_2: 128
  dim_enc_3: 256
  dim_freq: 80
  dim_spk_emb: 82
  dim_f0: 257
  dim_dec: 512
  len_raw: 128
  chs_grp: 16
  min_len_seg: 19
  max_len_seg: 32
  min_len_seq: 64
  max_len_seq: 128
  max_len_pad: 192
  root_dir: assets/spmel
  feat_dir: assets/raptf0
  batch_size: 16
  mode: train
  shuffle: True
  num_workers: 0
  samplier: 8
Finished loading train dataset...
Generator_3(
  (encoder_1): Encoder_7(
    (convolutions_1): ModuleList(
      (0): Sequential(
        (0): ConvNorm(
          (conv): Conv1d(80, 512, kernel_size=(5,), stride=(1,), padding=(2,))
        )
        (1): GroupNorm(32, 512, eps=1e-05, affine=True)
      )
      (1): Sequential(
        (0): ConvNorm(
          (conv): Conv1d(512, 512, kernel_size=(5,), stride=(1,), padding=(2,))
        )
        (1): GroupNorm(32, 512, eps=1e-05, affine=True)
      )
      (2): Sequential(
        (0): ConvNorm(
          (conv): Conv1d(512, 512, kernel_size=(5,), stride=(1,), padding=(2,))
        )
        (1): GroupNorm(32, 512, eps=1e-05, affine=True)
      )
    )
    (lstm_1): LSTM(512, 8, num_layers=2, batch_first=True, bidirectional=True)
    (convolutions_2): ModuleList(
      (0): Sequential(
        (0): ConvNorm(
          (conv): Conv1d(257, 256, kernel_size=(5,), stride=(1,), padding=(2,))
        )
        (1): GroupNorm(16, 256, eps=1e-05, affine=True)
      )
      (1): Sequential(
        (0): ConvNorm(
          (conv): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,))
        )
        (1): GroupNorm(16, 256, eps=1e-05, affine=True)
      )
      (2): Sequential(
        (0): ConvNorm(
          (conv): Conv1d(256, 256, kernel_size=(5,), stride=(1,), padding=(2,))
        )
        (1): GroupNorm(16, 256, eps=1e-05, affine=True)
      )
    )
    (lstm_2): LSTM(256, 32, batch_first=True, bidirectional=True)
    (interp): InterpLnr()
  )
  (encoder_2): Encoder_t(
    (convolutions): ModuleList(
      (0): Sequential(
        (0): ConvNorm(
          (conv): Conv1d(80, 128, kernel_size=(5,), stride=(1,), padding=(2,))
        )
        (1): GroupNorm(8, 128, eps=1e-05, affine=True)
      )
    )
    (lstm): LSTM(128, 1, batch_first=True, bidirectional=True)
  )
  (decoder): Decoder_3(
    (lstm): LSTM(164, 512, num_layers=3, batch_first=True, bidirectional=True)
    (linear_projection): LinearNorm(
      (linear_layer): Linear(in_features=1024, out_features=80, bias=True)
    )
  )
)
G
The number of parameters: 19437800
Current learning rates, g_lr: 0.0001.
Start training...
We've got an error while stopping in unhandled exception: <class 'StopIteration'>.
Traceback (most recent call last):
  File "c:\Users\King\.vscode\extensions\ms-python.python-2020.12.424452561\pythonFiles\lib\python\debugpy\_vendored\pydevd\pydevd.py", line 1994, in do_stop_on_unhandled_exception
    self.do_wait_suspend(thread, frame, 'exception', arg, EXCEPTION_TYPE_UNHANDLED)
  File "c:\Users\King\.vscode\extensions\ms-python.python-2020.12.424452561\pythonFiles\lib\python\debugpy\_vendored\pydevd\pydevd.py", line 1855, in do_wait_suspend
    keep_suspended = self._do_wait_suspend(thread, frame, event, arg, suspend_type, from_this_thread, frames_tracker)
  File "c:\Users\King\.vscode\extensions\ms-python.python-2020.12.424452561\pythonFiles\lib\python\debugpy\_vendored\pydevd\pydevd.py", line 1890, in _do_wait_suspend
    time.sleep(0.01)

@FurkanGozukara
Copy link
Author

data_loader object

image

@FurkanGozukara
Copy link
Author

Error from powershell

Man I will break my teeth if only 1 time such open source project works when instructions are followed

image

@vlad-i
Copy link

vlad-i commented Jan 18, 2021

Man I will break my teeth if only 1 time such open source project works when instructions are followed

There are some that work, some even provide environment files and require very little effort. It's free cutting edge technology, so I can't complain 😅

Would be really cool to be able to make this one work.

@tejuafonja
Copy link

Hi,

I don't know if this will help but I thought to mention that you should also check the make_metadat.py file as well (if you haven't already) because as it is, it's hardcoded - maybe that'll help debug the error.

I'm able to train with the test folder provided and I haven't tried training with custom data yet. I'll be sure to come back and update you if I run into the same error when I do.

Screenshot 2021-01-23 at 23 13 15

@FurkanGozukara
Copy link
Author

FurkanGozukara commented Feb 3, 2021

@tejuafonja yes I have seen it. It defines sound file is male or female. I have given same name for male one. I am still getting error though.

I have uploaded my test here so you can check : https://github.com/FurkanGozukara/SpeechSplitTest

I will delete the repository once I can run it

Thank you very much

@yenebeb
Copy link

yenebeb commented May 11, 2021

Hi @FurkanGozukara,

Propably a bit late but maybe for anyone out there stumbling on this problem here's a fix.

The main problem you have is 2hours long wav file.
In make_spect_f0 it reads the file and computes the spectogram and the f0.
This will however only generate one training file.

The error you're getting (stopIteration) is exactly because of that.
When you try to run the code it will fetch your data (only one in your case):
line 113 in solver.py:
data_iter = iter(data_loader)
And set the iterator on the first value

Further down line 141-145 you see this:

try:
                x_real_org, emb_org, f0_org, len_org = next(data_iter)
            except:
                data_iter = iter(data_loader)
                x_real_org, emb_org, f0_org, len_org = next(data_iter)

Here we try to get the next iterator, which isn't possible (since there's only 1 file for training).
We catch the exception, load the data again and do exactly the same.

Normally (with more than 1 training file) this would solve the problem since we start at the beginning of our training data. But with only 1 file for training it will throw an stopIteration again.

So to solve this, just use more than 1 file. You can for example cut your 2hours long wav file in pieces and put them all in the p285 map (it's important that the same voices goes in the same folder).

@FurkanGozukara
Copy link
Author

@yenebeb so basically if i duplicate my training file it should work

i will test ty

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants