Audio Separation Project

An assignment for artificial intelligence department. A simple implementation of a music separation project, which refer the paper Demucs.

1 Direct Implementation

1.0 OS

Windows10
Ubuntu20.04
macOS (CPU only)

1.1 modify the configs/config.yaml

1.2 type in terminal:

python run.py --c configs/config.yaml

And then you will see results from records folder.

2 Train model

2.1 Dataset

The first thing to do should always be data. We use the following dataset from SigSep, an open source website that holds all kinds of data. We select the following one:

MUSDB18

For Southeast University student, we upload the dataset to pan.seu.edu.cn to fasten your downloading. Here is a link:

MUSDB18.zip from pan.seu

After downloading and unzip, please change its format from mp3 into .wav, since the current(Nov 2023) torchaudio only support wav format. You can directly run the following bash(remember to change the location!), here I recommend you to put the musicdb18 into a parallel position with the project:

audioSep Project
|--changeAudioFormat.bash
|....
musicdb18
|-- piece1.mp3
|-- piece1.mp3
|...

# please ensure you are at the current project work space
chmod +x changeAudioFormat.bash
./changeAudioFormat.bash

After running, you will see folder musicdb18_wav in your project folder. For more detailed information about this dataset, please refer to the introduction site or click the readme under downloaded original dataset folder.

3 Others

3.1 About metrics

STOI(Short-Time Objective Intelligibility)

mono channel audio only.

The stoi function is designed to evaluate the intelligibility of speech signals, which are typically mono. Intelligibility is a measure of how comprehensible speech is in given conditions, and for this measurement, stereo or multi-channel audio does not provide additional information compared to mono audio.

If the source or predicted audio is stereo (i.e., has 2 channels), it's common practice to either:

(The method we adopt) Average the channels to get a mono signal.
Evaluate the metric on each channel separately and then average the results.

PESQ (Perceptual Evaluation of Speech Quality)

mono channel audio only

Like STOI, PESQ is designed for mono signals and particularly for evaluating the quality of speech signals. For stereo or multi-channel audio, the same approach as STOI can be taken.

Caveat: PESQ is based on perceptual models, so the results can be affected if applied to non-speech signals.

(The method we adopt) Average the channels to get a mono signal.

SDR (Source-to-Distortion Ratio)

Able to Multi-Channel: SDR can be computed for multi-channel audio. When computing SDR for multi-channel audio, it's typically done channel-wise, and then the results can be averaged.

SNR (Signal-to-Noise Ratio)

Able to Multi-Channel: SNR can be computed for multi-channel audio. Like SDR, we typically compute SNR for each channel separately and then average.

SIR(Signal to interferences ratio)

Able to Multi-Channel: measures the amount of interference from other sources in the separated source. A higher SIR indicates that the separated source has less interference from other sources, which means the model's performance is better.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
configs		configs
dataset		dataset
model		model
utils		utils
.DS_Store		.DS_Store
.gitignore		.gitignore
changeAudioFormat.bash		changeAudioFormat.bash
readme.md		readme.md
run.py		run.py
train_log.txt		train_log.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Audio Separation Project

1 Direct Implementation

1.0 OS

1.1 modify the configs/config.yaml

1.2 type in terminal:

2 Train model

2.1 Dataset

3 Others

3.1 About metrics

STOI(Short-Time Objective Intelligibility)

PESQ (Perceptual Evaluation of Speech Quality)

SDR (Source-to-Distortion Ratio)

SNR (Signal-to-Noise Ratio)

SIR(Signal to interferences ratio)

About

Releases

Packages

Languages

ApiaoSamaa/audio_sep

Folders and files

Latest commit

History

Repository files navigation

Audio Separation Project

1 Direct Implementation

1.0 OS

1.1 modify the configs/config.yaml

1.2 type in terminal:

2 Train model

2.1 Dataset

3 Others

3.1 About metrics

STOI(Short-Time Objective Intelligibility)

PESQ (Perceptual Evaluation of Speech Quality)

SDR (Source-to-Distortion Ratio)

SNR (Signal-to-Noise Ratio)

SIR(Signal to interferences ratio)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages