Skip to content

Perceptual Contrast Stretching on Target Feature for Speech Enhancement (Accepted by INTERSPEECH 2022)

Notifications You must be signed in to change notification settings

RoyChao19477/PCS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

77 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Perceptual Contrast Stretching on Target Feature for Speech Enhancement

This repo is only dedicated to the post-processing PCS.

catalog

Introduction
PCS-tools
SpeechMetrics-tools
Citation
References

For Speech Enhancement Systems utilizing a 400-sample window frame in the Short-Time Fourier Transform (STFT), we recommend using PCS400 instead of PCS. This adjustment helps prevent distortion due to mismatching.

Introduction

"PCS is derived based on the critical band importance function and applied to modify the targets of the SE model."
"It can also be used as a post-processing (PP) method to further sharpen the structure of enhanced speech and suppress residual noise."

More details can be found in here: http://arxiv.org/abs/2203.17152 (Preprint arXiv; Accepted by INTERSPEECH 2022)

This repo is only dedicated to the post-processing PCS.

Enhanced audios are generated by different baseline models to which post-processing PCS is then applied.
The experimental results are as follows:

Some examples are shown below:

PCS-tools

Post-processing PCS tools can be found at /PCS or PCS400 folder.
So you can simply post-process the audio with PCS.

For Speech Enhancement Systems utilizing a 400-sample window frame in the Short-Time Fourier Transform (STFT), we recommend using PCS400 instead of PCS. This adjustment helps prevent distortion due to mismatching.

Scoring-tools

Speech metric scores were computed with /speech_metrics.

Online Post-processing PCS Demo

https://lojoffy-pcs-online-demo-main-luu0rc.streamlitapp.com/

Citation:

If you find the code useful in your research, please cite:

@article{chao2022perceptual,
  title={Perceptual Contrast Stretching on Target Feature for Speech Enhancement},
  author={Chao, Rong and Yu, Cheng and Fu, Szu-Wei and Lu, Xugang and Tsao, Yu},
  journal={Proc. of INTERSPEECH},
  year={2022}
}

Reference:

SEGAN:

arXiv: https://arxiv.org/pdf/1703.09452.pdf

Wiener filter:

wikipedia: https://en.wikipedia.org/wiki/Wiener_filter

Transformer T(c) / T(nc)

arXiv: https://arxiv.org/pdf/2006.10296.pdf

CRNN

arXiv: https://arxiv.org/pdf/1805.00579.pdf

MetricGAN+

arXiv: https://arxiv.org/pdf/2104.03538.pdf
From SpeechBrain: https://huggingface.co/speechbrain/metricgan-plus-voicebank

DPT-FSNet:

arXiv: https://arxiv.org/pdf/2104.13002.pdf
Reproduced and denoted as DPT*

About

Perceptual Contrast Stretching on Target Feature for Speech Enhancement (Accepted by INTERSPEECH 2022)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages