Speech denoising is the process of removing unwanted noise from speech signals while preserving the integrity of the speech itself.
The problem of speech denoising arises when speech signals are corrupted by various types of noise, such as background noise, microphone noise, or electrical interference.
This project aim to denoise signal given input is the noisy signal expected output will give us the denoised signal and evaluate it. The module include 3 methods: Spectral subtraction, FRCRN and Noise2noise
Data Usage Note: Bahnar Voice Dataset
The Bahnar voice dataset provided by Prof. Quan Thanh Tho is used for research purposes in this project. If you intend to use this dataset, please contact Prof. Tho to discuss your usage and obtain permission.
FRCRN module
Step 1: To use the denoise function please go to FRCRN_denoise and first install the requirement packages in requirements.txt

Step 2: Go to this file Denoise_module.ipynb and open it (recommend in jupyter lab)
Step 3: Import the necessary package
Step 4: Denoise module
To use Spectral Subtraction method you can go to section Spectral Subtraction then define the path needed to denoised and run the following code.
This method first compute the estimated noise and apply denoised based on the noise level.
The model simply subtract the frequency components of noises from the noisy audio to get a cleaned/enhanced speech.

The spectral subtraction came up with two major shortcomings
- We have to choose a noise from the audio signal to remove it.
- The noise should be present in the entire audio.
To use FRCRN we need to resample the signal as the architecture require the sampling rate fs is 16khz if it is not 16khz the model will give bad signal
FRCRN is a Mask-based models which compute masks (boolean arrays) in the time/frequency domain based on the input noisy speech

Demo model can be found here: https://drive.google.com/drive/folders/1vq9QoRC75hIHRN47mC_3NszCxtnR2LFx?usp=sharing
Place the link to folder in here

Step 5: Evaluation Description (For reference) This part is to evaluate how good a signal is compare to a clean signal. It use the two metrics as follow:
- PESQ (Perceptual Evaluation Of Speech Quality) is an objective and full-reference speech quality evaluation method. The score ranges from -0.5 to 4.5. The higher the score, the better the speech quality.
- STOI (Short-Time Objective Intelligibility) reflects the objective evaluation of speech intelligibility by the human auditory perception system. The STOI value is between 0 and 1. The larger the value, the higher the speech intelligibility , the clearer it is.
I based on those github to denoise by FRCRN:
https://github.com/alibabasglab/FRCRN/tree/main
https://github.com/modelscope/modelscope