Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When I retrain the model by models/DeepFilterNet2/config.ini, the result is strange~ #219

Closed
Vinttt opened this issue Dec 24, 2022 · 10 comments
Labels

Comments

@Vinttt
Copy link

Vinttt commented Dec 24, 2022

Thanks for your awesome work!
I'm trying to recurrence the training through models/DeepFilterNet2/config.ini, but when I use DeepFilterNet/df/enhance.py to test the retrained model, output wavs is nearly to silence.
This is my retraining step:Thanks for your awesome work!
I'm trying to recurrence the training through models/DeepFilterNet2/config.ini, but when I use DeepFilterNet/df/enhance.py to test the retrained model, output wavs is nearly to silence.
This is my retraining step:
(1) copy the config.ini of DeepFilterNet2

    cd DeepFilterNet
    mkdir base_dir
    cp ../models/DeepFilterNet2/config.ini ./base_dir

(2) preparing data
take VocalSet_48kHz_mono_000_NA_NA.tar.bz2 as clean data
take noise_fullband/datasets_fullband.noise_fullband.audioset_000.tar.bz2 as noise data

    find ../scripts/out/datasets_fullband/clean_fullband/VocalSet_48kHz_mono -iname "*.wav" > training_set_speech.txt
    python df/scripts/prepare_data.py --sr 48000 speech training_set_speech.txt ./data_dir/TRAIN_SET_SPEECH.hdf5
    find ../scripts/out/datasets_fullband/noise_fullband -iname "*.wav" > training_set_noise.txt
    python df/scripts/prepare_data.py --sr 48000 noise training_set_noise.txt ./data_dir/TRAIN_SET_NOISE.hdf5

this is my dataset.cfg:
{
"train": [
[
"TRAIN_SET_SPEECH.hdf5",
1.0
],
[
"TRAIN_SET_NOISE.hdf5",
1.0
],
[
"TRAIN_SET_RIR.hdf5",
0.0
]
],
"valid": [
[
"TRAIN_SET_SPEECH.hdf5",
0.2
],
[
"TRAIN_SET_NOISE.hdf5",
0.2
],
[
"TRAIN_SET_RIR.hdf5",
0.0
]
],
"test": [
[
"TRAIN_SET_SPEECH.hdf5",
0.2
],
[
"TRAIN_SET_NOISE.hdf5",
0.2
],
[
"TRAIN_SET_RIR.hdf5",
0.0
]
]
}

(3) start retraining
python df/train.py dataset.cfg ./data_dir/ ./base_dir/

(4) test retrained model

python DeepFilterNet/df/enhance.py -m DeepFilterNet/base_dir2 --output-dir DeepFilterNet/test_dir_out/ DeepFilterNet/test_dir_48k/interview_48k.wav

but the output of the reatrained model is just like this
retraing_model_results

I’m not sure which step is wrong or missed, have you ever met the same problem?

@Rikorose
Copy link
Owner

If you are only training with vocal set, I am not surprised that the results are not decent. Starting point for debugging would be full training logs, and some samples in the summaries folder.

@BWMa
Copy link

BWMa commented Dec 27, 2022

how do you download the clean_fullband dataset? excute download_process_dns4.sh ?

@Rikorose
Copy link
Owner

You should ask this at the dns repo

@Vinttt
Copy link
Author

Vinttt commented Dec 29, 2022

If you are only training with vocal set, I am not surprised that the results are not decent. Starting point for debugging would be full training logs, and some samples in the summaries folder.

I finally made it, thks a lot! The model performance is excellent!

And a small bug. The vocalset is set to 48000 hz when it's real samplerate is 16000 hz, in split_hdf5.py you correct the samplerate to 16000. But in summaries folder, the examples wavs are still in wrong samplerate.

@Vinttt
Copy link
Author

Vinttt commented Dec 29, 2022

I finally made it, thks a lot! The model performance is excellent!

And a small bug. The vocalset is set to 48000 hz when it's real samplerate is 16000 hz, in split_hdf5.py you correct the samplerate to 16000. But in summaries folder, the examples wavs are still in wrong samplerate.

@Penguin168
Copy link

I finally made it, thks a lot! The model performance is excellent!

And a small bug. The vocalset is set to 48000 hz when it's real samplerate is 16000 hz, in split_hdf5.py you correct the samplerate to 16000. But in summaries folder, the examples wavs are still in wrong samplerate.

Could I ask how did you solve it?

I take Voicebank+Demand as my training set, but the output of the reatrained model is just the noise, and I realized all the samples of that dataset are 16khz now, so I might take my cue from your method.

@Vinttt
Copy link
Author

Vinttt commented Jan 9, 2023

I finally made it, thks a lot! The model performance is excellent!
And a small bug. The vocalset is set to 48000 hz when it's real samplerate is 16000 hz, in split_hdf5.py you correct the samplerate to 16000. But in summaries folder, the examples wavs are still in wrong samplerate.

Could I ask how did you solve it?

I take Voicebank+Demand as my training set, but the output of the reatrained model is just the noise, and I realized all the samples of that dataset are 16khz now, so I might take my cue from your method.

In df/scripts/prepare_data.py, the following code

if meta.sample_rate != self.sr:
    # Load as normalized float32 and resample
    x, sr = torchaudio.load(file, normalize=True)
    x = resample(x, sr, new_sr=self.sr, method="kaiser_best")
else:
    x, sr = torchaudio.load(file, normalize=False)

if don't resample the signal, normalize = False;
if resample the signal, normalize=True, this may leading to error training output, you can change the code to

if meta.sample_rate != self.sr:
    # Load as normalized float32 and resample
    x, sr = torchaudio.load(file, normalize=True)
    x = resample(x, sr, new_sr=self.sr, method="kaiser_best")
    x = torch.tensor(x * 32768, dtype=torch.int16)
else:
    x, sr = torchaudio.load(file, normalize=False)

Hope to solve your problem.

@github-actions
Copy link

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the stale label Apr 10, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 17, 2023
@QQQQQQQQY
Copy link

我终于成功了,非常感谢!模型性能出色!还有一个小虫子。当实际采样率为 48000 Hz 时,人声集设置为 16000 Hz,split_hdf5.py将采样率更正为 16000。但是在摘要文件夹中,示例波形仍然处于错误的采样率。

我能问你是怎么解决的吗?
我以 Voicebank+Demand 作为我的训练集,但重新训练模型的输出只是噪声,我意识到该数据集的所有样本现在都是 16khz,所以我可能会从你的方法中得到提示。

在 df/scripts/prepare_data.py 中,以下代码

if meta.sample_rate != self.sr:
    # Load as normalized float32 and resample
    x, sr = torchaudio.load(file, normalize=True)
    x = resample(x, sr, new_sr=self.sr, method="kaiser_best")
else:
    x, sr = torchaudio.load(file, normalize=False)

如果不对信号进行重新采样,则归一化 = False;如果对信号进行重采样,归一化=True,这可能会导致错误训练输出,您可以将代码更改为

if meta.sample_rate != self.sr:
    # Load as normalized float32 and resample
    x, sr = torchaudio.load(file, normalize=True)
    x = resample(x, sr, new_sr=self.sr, method="kaiser_best")
    x = torch.tensor(x * 32768, dtype=torch.int16)
else:
    x, sr = torchaudio.load(file, normalize=False)

我终于成功了,非常感谢!模型性能出色!还有一个小虫子。当实际采样率为 48000 Hz 时,人声集设置为 16000 Hz,split_hdf5.py将采样率更正为 16000。但是在摘要文件夹中,示例波形仍然处于错误的采样率。

我能问你是怎么解决的吗?
我以 Voicebank+Demand 作为我的训练集,但重新训练模型的输出只是噪声,我意识到该数据集的所有样本现在都是 16khz,所以我可能会从你的方法中得到提示。

在 df/scripts/prepare_data.py 中,以下代码

if meta.sample_rate != self.sr:
    # Load as normalized float32 and resample
    x, sr = torchaudio.load(file, normalize=True)
    x = resample(x, sr, new_sr=self.sr, method="kaiser_best")
else:
    x, sr = torchaudio.load(file, normalize=False)

如果不对信号进行重新采样,则归一化 = False;如果对信号进行重采样,归一化=True,这可能会导致错误训练输出,您可以将代码更改为

if meta.sample_rate != self.sr:
    # Load as normalized float32 and resample
    x, sr = torchaudio.load(file, normalize=True)
    x = resample(x, sr, new_sr=self.sr, method="kaiser_best")
    x = torch.tensor(x * 32768, dtype=torch.int16)
else:
    x, sr = torchaudio.load(file, normalize=False)

希望能解决您的问题。

你好,我也是遇到这个问题,按照你所说的更改依旧没有改变,请问是否还有其他解决措施,期待您的回复,感谢

@QQQQQQQQY
Copy link

I finally made it, thks a lot! The model performance is excellent!
And a small bug. The vocalset is set to 48000 hz when it's real samplerate is 16000 hz, in split_hdf5.py you correct the samplerate to 16000. But in summaries folder, the examples wavs are still in wrong samplerate.

Could I ask how did you solve it?

I take Voicebank+Demand as my training set, but the output of the reatrained model is just the noise, and I realized all the samples of that dataset are 16khz now, so I might take my cue from your method.

Hello, I also encountered this problem, according to the changes you said remains unchanged, may I ask whether there are other measures to solve the problem, look forward to your reply, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants