RuntimeError: DF dataloader error: ThreadJoinError("Any { .. }") when attempting to train DeepFilterNet #187

AnonymousEliforp · 2022-11-22T14:15:12Z

I am trying to train DeepFilterNet but I am running into the following error when training: RuntimeError: DF dataloader error: ThreadJoinError("Any { .. }").

What I did

I installed DeepFilterNet via PyPI using:

pip3 install torch torchvision torchaudio
pip install deepfilternet[train]

Then I generated the following HDF5 dataset files:

TEST_SET_NOISE.hdf5
TEST_SET_SPEECH.hdf5
TRAIN_SET_NOISE.hdf5
TRAIN_SET_SPEECH.hdf5
VALID_SET_NOISE.hdf5
VALID_SET_SPEECH.hdf5

and placed them into the same directory .../data_folder.

I also created a config.ini file with the following contents:

config.ini

[deepfilternet]
emb_hidden_dim = 256
df_hidden_dim = 256
df_num_layers = 3
conv_ch = 16
conv_lookahead = 0
conv_depthwise = True
convt_depthwise = True
conv_kernel = 1,3
conv_kernel_inp = 3,3
emb_num_layers = 2
emb_gru_skip = none
df_gru_skip = none
df_pathway_kernel_size_t = 1
enc_concat = False
df_n_iter = 1
linear_groups = 1
enc_linear_groups = 16
mask_pf = False
gru_type = grouped
gru_groups = 1
group_shuffle = True
dfop_method = real_unfold
df_output_layer = linear

[df]
fft_size = 960
nb_erb = 32
nb_df = 96
sr = 48000
hop_size = 480
norm_tau = 1
lsnr_max = 35
lsnr_min = -15
min_nb_erb_freqs = 2
df_order = 5
df_lookahead = 0
pad_mode = input

[train]
model = deepfilternet2
batch_size = 16
batch_size_eval = 16
num_workers = 4
overfit = false
lr = 0.001
max_epochs = 30
seed = 42
device = cuda:1
mask_only = False
df_only = False
jit = False
max_sample_len_s = 5.0
num_prefetch_batches = 32
global_ds_sampling_f = 1.0
dataloader_snrs = -5,0,5,10,20,40
batch_size_scheduling = 
validation_criteria = loss
validation_criteria_rule = min
early_stopping_patience = 5
start_eval = False

[distortion]
p_reverb = 0.0
p_bandwidth_ext = 0.0
p_clipping = 0.0
p_air_absorption = 0.0

[optim]
lr = 0.0005
momentum = 0
weight_decay = 0.05
optimizer = adamw
lr_min = 1e-06
lr_warmup = 0.0001
warmup_epochs = 3
lr_cycle_mul = 1.0
lr_cycle_decay = 0.5
lr_cycle_epochs = -1
weight_decay_end = -1

[maskloss]
factor = 0
mask = iam
gamma = 0.6
gamma_pred = 0.6
f_under = 2

[spectralloss]
factor_magnitude = 0
factor_complex = 0
factor_under = 1
gamma = 1

[multiresspecloss]
factor = 0
factor_complex = 0
gamma = 1
fft_sizes = 512,1024,2048

[sdrloss]
factor = 0

[localsnrloss]
factor = 0.0005

and put it in a directory .../log_folder.

I also created a dataset.cfg file with the following contents:

dataset.cfg

{
  "train": [
    [
      "TRAIN_SET_SPEECH.hdf5",
      1.0
    ],
    [
      "TRAIN_SET_NOISE.hdf5",
      1.0
    ]
  ],
  "valid": [
    [
      "VALID_SET_SPEECH.hdf5",
      1.0
    ],
    [
      "VALID_SET_NOISE.hdf5",
      1.0
    ]
  ],
  "test": [
    [
      "TEST_SET_SPEECH.hdf5",
      1.0
    ],
    [
      "TEST_SET_NOISE.hdf5",
      1.0
    ]
  ]
}

Lastly, I ran the command to train the model:

python df/train.py path/to/dataset.cfg .../data_folder .../log_folder

and this is the output that I get:

2022-11-22 22:10:07 | INFO     | DF | Running on torch 1.13.0+cu117
2022-11-22 22:10:07 | INFO     | DF | Running on host workstation3
fatal: not a git repository (or any parent up to mount point /)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
2022-11-22 22:10:07 | INFO     | DF | Loading model settings of log_folder
2022-11-22 22:10:07 | INFO     | DF | Running on device cuda:1
2022-11-22 22:10:07 | INFO     | DF | Initializing model `deepfilternet2`
2022-11-22 22:10:08 | DEPRECATED | DF | Use of `linear` for `df_ouput_layer` is marked as deprecated.
2022-11-22 22:10:08 | INFO     | DF | Initializing dataloader with data directory /home/tester/bokleong/dl_project_dfn/data_folder/
2022-11-22 22:10:08 | INFO     | DF | Loading HDF5 key cache from /home/tester/bokleong/dl_project_dfn/.cache_dataset.cfg
2022-11-22 22:10:08 | INFO     | DF | Start train epoch 0 with batch size 32
thread 'DataLoader Worker 1' panicked at 'called `Option::unwrap()` on a `None` value', libDF/src/dataset.rs:1134:69
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'DataLoader Worker 0' panicked at 'called `Option::unwrap()` on a `None` value', libDF/src/dataset.rs:1134:69
thread 'DataLoader Worker 2' panicked at 'called `Option::unwrap()` on a `None` value', libDF/src/dataset.rs:1134:69
thread 'DataLoader Worker 3' panicked at 'called `Option::unwrap()` on a `None` value', libDF/src/dataset.rs:1134:69
Exception in thread PinMemoryLoop:
Traceback (most recent call last):
  File "/home/tester/miniconda3/envs/user5/lib/python3.9/threading.py", line 980, in _bootstrap_inner
    self.run()
  File "/home/tester/miniconda3/envs/user5/lib/python3.9/threading.py", line 917, in run
    self._target(*self._args, **self._kwargs)
  File "/home/tester/miniconda3/envs/user5/lib/python3.9/site-packages/torch/utils/data/_utils/pin_memory.py", line 49, in _pin_memory_loop
    do_one_step()
  File "/home/tester/miniconda3/envs/user5/lib/python3.9/site-packages/torch/utils/data/_utils/pin_memory.py", line 26, in do_one_step
    r = in_queue.get(timeout=MP_STATUS_CHECK_INTERVAL)
  File "/home/tester/miniconda3/envs/user5/lib/python3.9/site-packages/libdfdata/torch_dataloader.py", line 206, in get
    self.loader.cleanup()
RuntimeError: DF dataloader error: ThreadJoinError("Any { .. }")

Appreciate any help on this, thank you!

The text was updated successfully, but these errors were encountered:

bommyboy · 2022-11-22T18:40:29Z

Update: this was caused by not specifying --mono when using prepare-data.py on mono wav files to generate the hdf5 files

Rikorose · 2022-11-22T18:57:53Z

@AnonymousEliforp on which version did this error occur? On main there is no unwrap() at libDF/src/dataset.rs:1134:69. I would like to properly fix this error.

AnonymousEliforp · 2022-11-22T20:11:37Z

@AnonymousEliforp on which version did this error occur? On main there is no unwrap() at libDF/src/dataset.rs:1134:69. I would like to properly fix this error.

Not sure if this is what you mean, but I was using deepfilternet2 on the main branch

Rikorose · 2022-11-22T20:12:44Z

At what exact commit? The main branch has changed and does not fit to your stack trace.

AnonymousEliforp · 2022-11-22T20:37:29Z

At what exact commit? The main branch has changed and does not fit to your stack trace.

As far as I can tell from git log, I am at the 17-Nov 15:27:25 commit bc6bd9102212731843ed0a74f30164472865b167

Rikorose · 2022-11-22T20:40:43Z

Hm there is also no unwrap call on

DeepFilterNet/libDF/src/dataset.rs

Line 1134 in bc6bd91

Some(LpParam {

Your stack trace above said something else:
thread 'DataLoader Worker 0' panicked at 'called `Option::unwrap()` on a `None` value', libDF/src/dataset.rs:1134:69

bommyboy · 2022-11-22T22:42:33Z

Hi I am a colleague of @AnonymousEliforp this is the output from git log -l on our machine. Also I have tested that using prepare_data.py script with --mono enabled on the test, train and valid sound files separately will not cause this issue. However, using prepare_data.py with --mono enabled and then using the split_hdf5.py script will still cause the error to occur.

Rikorose · 2022-11-23T08:30:58Z

Ah, now I see. You used the main branch, but installed the rust modules from pypi. The error is since you don't have any noise samples in your dataset. If you run train.py with --debug you will see that you either have a too low sampling factor or you did you split your dataset incorrectly.

bommyboy · 2022-11-23T13:14:08Z

Hi I don't think I quite understand what you mean. I have ran the code with --debug and the output seems to register my noise dataset. However, the noise dataset is definitely lower than my speech dataset. Where might I be able to change the sampling factor?

Rikorose · 2022-11-23T13:15:25Z

You only have speech datasets.

bommyboy · 2022-11-23T13:18:56Z

Ok, but I meant for the hdf5 files ending with NOISE to be the noise datasets and in dataset.cfg, I have tried to follow the example dataset configuration.

Is there an error in which i defined dataset.cfg?

Rikorose · 2022-11-23T13:19:59Z

No in your prepare_dataset usage.

bommyboy · 2022-11-23T13:21:21Z

ah ok that is my bad then I have realised my mistake now so sorry for this and thanks for your help

Rikorose mentioned this issue Nov 23, 2022

Report proper error message when no noise samples are provided #189

Merged

Rikorose closed this as completed in #189 Nov 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: DF dataloader error: ThreadJoinError("Any { .. }") when attempting to train DeepFilterNet #187

RuntimeError: DF dataloader error: ThreadJoinError("Any { .. }") when attempting to train DeepFilterNet #187

AnonymousEliforp commented Nov 22, 2022

bommyboy commented Nov 22, 2022

Rikorose commented Nov 22, 2022

AnonymousEliforp commented Nov 22, 2022

Rikorose commented Nov 22, 2022

AnonymousEliforp commented Nov 22, 2022

Rikorose commented Nov 22, 2022

bommyboy commented Nov 22, 2022

Rikorose commented Nov 23, 2022

bommyboy commented Nov 23, 2022

Rikorose commented Nov 23, 2022

bommyboy commented Nov 23, 2022

Rikorose commented Nov 23, 2022

bommyboy commented Nov 23, 2022

RuntimeError: DF dataloader error: ThreadJoinError("Any { .. }") when attempting to train DeepFilterNet #187

RuntimeError: DF dataloader error: ThreadJoinError("Any { .. }") when attempting to train DeepFilterNet #187

Comments

AnonymousEliforp commented Nov 22, 2022

What I did

bommyboy commented Nov 22, 2022

Rikorose commented Nov 22, 2022

AnonymousEliforp commented Nov 22, 2022

Rikorose commented Nov 22, 2022

AnonymousEliforp commented Nov 22, 2022

Rikorose commented Nov 22, 2022

bommyboy commented Nov 22, 2022

Rikorose commented Nov 23, 2022

bommyboy commented Nov 23, 2022

Rikorose commented Nov 23, 2022

bommyboy commented Nov 23, 2022

Rikorose commented Nov 23, 2022

bommyboy commented Nov 23, 2022