Skip to content

Khaitam911/Denoise-module

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Problem description

Speech denoising is the process of removing unwanted noise from speech signals while preserving the integrity of the speech itself.
The problem of speech denoising arises when speech signals are corrupted by various types of noise, such as background noise, microphone noise, or electrical interference.
This project aim to denoise signal given input is the noisy signal expected output will give us the denoised signal and evaluate it. The module include 3 methods: Spectral subtraction, FRCRN and Noise2noise

Data usage

Data Usage Note: Bahnar Voice Dataset

The Bahnar voice dataset provided by Prof. Quan Thanh Tho is used for research purposes in this project. If you intend to use this dataset, please contact Prof. Tho to discuss your usage and obtain permission.

How to use the file

FRCRN module Step 1: To use the denoise function please go to FRCRN_denoise and first install the requirement packages in requirements.txt
image

Step 2: Go to this file Denoise_module.ipynb and open it (recommend in jupyter lab)
Step 3: Import the necessary package
Step 4: Denoise module
To use Spectral Subtraction method you can go to section Spectral Subtraction then define the path needed to denoised and run the following code. image This method first compute the estimated noise and apply denoised based on the noise level. The model simply subtract the frequency components of noises from the noisy audio to get a cleaned/enhanced speech. image
The spectral subtraction came up with two major shortcomings

  • We have to choose a noise from the audio signal to remove it.
  • The noise should be present in the entire audio.

To use FRCRN we need to resample the signal as the architecture require the sampling rate fs is 16khz if it is not 16khz the model will give bad signal image FRCRN is a Mask-based models which compute masks (boolean arrays) in the time/frequency domain based on the input noisy speech image

Demo model can be found here: https://drive.google.com/drive/folders/1vq9QoRC75hIHRN47mC_3NszCxtnR2LFx?usp=sharing
Place the link to folder in here
image

Step 5: Evaluation Description (For reference) This part is to evaluate how good a signal is compare to a clean signal. It use the two metrics as follow:

  • PESQ (Perceptual Evaluation Of Speech Quality) is an objective and full-reference speech quality evaluation method. The score ranges from -0.5 to 4.5. The higher the score, the better the speech quality.
  • STOI (Short-Time Objective Intelligibility) reflects the objective evaluation of speech intelligibility by the human auditory perception system. The STOI value is between 0 and 1. The larger the value, the higher the speech intelligibility , the clearer it is.

Reference

I based on those github to denoise by FRCRN:
https://github.com/alibabasglab/FRCRN/tree/main
https://github.com/modelscope/modelscope

About

This module is used to denoise audio file and evaluate it. The module include 2 methods by using spectral subtraction and FRCRN

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors