Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The poor performance of DeepClustering #138

Closed
hangtingchen opened this issue Jun 3, 2020 · 13 comments
Closed

The poor performance of DeepClustering #138

hangtingchen opened this issue Jun 3, 2020 · 13 comments

Comments

@hangtingchen
Copy link
Contributor

Hi
First thanks a lot for such an excellent tool for speech separation. I have tried the deep clustering part of wsj0-mix
https://github.com/mpariente/asteroid/tree/master/egs/wsj0-mix/DeepClustering
My performance was poor (si-sdr=3.5, sdr=4.5 in 35 epochs with 1 gpu for training). As reported here, the sdr is expected to be closed to 10dB. I am wondering the reason of the failure. Is there any tricks for training, or more epochs are needed for improvement?

Thanks a lot.

@mpariente
Copy link
Collaborator

Hey, thanks for opening an issue.
Let me paste my training config here:

data:
  n_src: 2
  sample_rate: 8000
  train_dir: data/2speakers/wav8k/min/tr
  valid_dir: data/2speakers/wav8k/min/cv
filterbank:
  kernel_size: 256
  n_filters: 256
  stride: 64
main_args:
  exp_dir: exp/train_chimera_dcalone_newlr/
  help: null
masknet:
  dropout: 0.3
  embedding_dim: 40
  hidden_size: 600
  n_layers: 4
  n_src: 2
  rnn_type: lstm
  take_log: true
optim:
  lr: 0.0001
  optimizer: adam
  weight_decay: 0.0
positional arguments: {}
training:
  batch_size: 32
  early_stop: true
  epochs: 200
  half_lr: true
  loss_alpha: 1.0
  num_workers: 8

Most importantly, the learning rate is 1e-4 instead of 1e-5 in the recipe. I'll change the default now.

Here are the metrics

{
"si_sdr": 9.846718255569847,
"si_sdr_imp": 9.84787033402749,
"sdr": 10.364474047942608,
"sdr_imp": 10.213430163640966,
"sir": 19.018816263769104,
"sir_imp": 18.867772082693932,
"sar": 11.336185832787844,
"sar_imp": -64.43806377592414,
"stoi": 0.8787521784931114,
"stoi_imp": 0.1407063969268085
}

My best val loss was around 2930 and I trained for 130 epochs.

I you come over slack, I can share the pretrained model folder (under research only license) to you.

@hangtingchen
Copy link
Contributor Author

thanks a lot, I am to try this config. The results may come out tomorrow. If it is fine, I will close this issue. One more question,
https://github.com/mpariente/asteroid/blob/d106902c9c1c939dc70f9ff1b963223c42d61547/asteroid/data/wsj0_mix.py#L33
the dataset utilized 4-second segments. Do you adopt the same setting?

@mpariente
Copy link
Collaborator

the dataset utilized 4-second segments. Do you adopt the same setting?

Yes, I did.

@hangtingchen
Copy link
Contributor Author

Sorry for bothering again. After the 1st epoch, my loss was around 6000. Is this normal?
Could you share the model and log if it is convenient for you ? I will check the problem myself.

@mpariente
Copy link
Collaborator

Sounds about right.
Here is the log file.
The exp folder is quite heavy, I cannot pass it on github.
train_dcalone_newlr.log

@hangtingchen
Copy link
Contributor Author

It seemed that I could not produce nice results. After adopting the same settings, the sisdr is 8.3., still 1dB lower. I don't know the reason since I used the newest code but the software version might be different . Please comment if you have any suggestions. Otherwise, I am to close the issue. Thanks a lot.

@mpariente
Copy link
Collaborator

It seemed that I could not produce nice results. After adopting the same settings, the sisdr is 8.3., still 1dB lower. I don't know the reason since I used the newest code but the software version might be different . Please comment if you have any suggestions. Otherwise, I am to close the issue. Thanks a lot.

Could you please use version 0.2.0 (git checkout v0.2.0)and try training with the same config?
We've had some issues with LSTMs and the new version of pytorch lightning in the past, we upgraded lightning since then.

Thanks for reporting your problems

@mpariente
Copy link
Collaborator

@hangtingchen Could you try it please?

@hangtingchen
Copy link
Contributor Author

hangtingchen commented Jun 12, 2020 via email

@hangtingchen
Copy link
Contributor Author

hangtingchen commented Jul 5, 2020 via email

@mpariente
Copy link
Collaborator

Maybe you're using wsj1?
IDK it seems weird to me, anybody else can reproduce the problem?

@hangtingchen hangtingchen reopened this Jul 7, 2020
@hangtingchen
Copy link
Contributor Author

Hi
Sorry to disturb you again.
Could you please answer one more question about DeepClustering?
I have noticed that the wsj0-2mix dataset was created with the waveform generated by
https://github.com/mpariente/asteroid/blob/0bdec2644f2d770d037ce804b7f70cb98bd5c9fa/egs/wsj0-mix/DeepClustering/local/convert_sphere2wav.sh#L31
The line utilizes both wv1 and wv2. However, the wav generated by wv2 will cover the one from wv1. And the wv1 is noise-free however wv2 contains more noise.
In summary, I use wv2 to generate wsj0-mix dataset, but I don't know whether this is the reason of my poor performance. And which do you use to generate the wav files ?

An example :
2 lines with same names in sph.list
4B23ACFC-5A06-4DF6-8904-0547D511B935

wv1 spectrogram
4B8A3B98-9F81-44D8-9BAF-BE685D9453A2

wv2 spectrogram

C9D3D731-BFFC-4E69-BBBD-A8F8F405BD78

@mpariente
Copy link
Collaborator

Oh that's a very good point, could you please open a separate issue please?
Thanks !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants