Audio Classification

Dependency Setup

Create new conda virtual environment

conda create --name audio_classify python=3.7 -y
conda activate audio_classify

Installation

conda install pytorch==1.9.0 torchvision==0.10.0 torchaudio==0.9.0 -c pytorch -y
git clone https://github.com/chingi071/Audio_Classification
pip install -r requirements.txt

Dataset Preparation

Open source audio dataset

Tomofun-AI 狗音辨識: https://github.com/lawrencechen0921/Tomofun-AI-

Kaggle Audio Cats and Dogs: https://www.kaggle.com/mmoreaux/audio-cats-and-dogs

Kaggle Freesound General-Purpose Audio Tagging Challenge: https://www.kaggle.com/c/freesound-audio-tagging/data

Data Preprocessing

If you want to try your dataset, please prepare the following items.

The training/ validation dataset file
The data label csv
The dataset yaml

Take the Kaggle Audio Cats and Dogs dataset as an example, please place the dataset in different folders according to the category.

Next, create the data label csv using the following ipynb file.

create_data_csv.ipynb

Third, create the dataset yaml.

cat_dog.yaml

Take the Tomofun-AI dataset as an example, please do data preprocessing. You will get tomofun_train.csv.

Tomofun_data_preprocessing.ipynb

And then create the dataset yaml.

The Tomofun-AI dataset structure is as follows:

train
├── train_00001.wav
├── train_00002.wav
├── ...
└── train_01200.wav
tomofun_train.csv

Data Augmentation

We use Audiomentations to add more data.

data_augmentation.ipynb

The dataset structure is as follows:

tomofun_aug_train
├── aug_0_train_00001.wav
├── aug_0_train_00002.wav
├── ...
├── train_00001.wav
├── train_00002.wav
├── ...
└── train_01200.wav
tomofun_aug_train.csv

Data Visualize

data_visualize.ipynb

Training

The model you can choose: ResNet18、ResNet34、ResNet50、ResNet101、ResNet152、SENet、DenseNet、Convnext_tiny、Convnext_small、Convnext_base、Convnext_large

Train on one GPU

python train.py --yaml_file=tomofun.yaml --model=ResNet18 --model_saved_path=workdirs

Train on multi-GPU

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 train.py --yaml_file=tomofun.yaml --model=Convnext_tiny --model_saved_path=workdirs

To enable one more multi-GPU training, use the following command.

CUDA_VISIBLE_DEVICES=2,3 python -m torch.distributed.launch --nproc_per_node=2 --master_port 9999 train.py --yaml_file=tomofun.yaml --model=Convnext_tiny --model_saved_path=workdirs

Start TensorBoard

tensorboard --logdir runs

Predict

python predict.py --yaml_file=tomofun.yaml --model=Convnext_tiny --model_saved_path=workdirs --test_data=test_data

Convert to ONNX

pip install onnx onnxruntime==1.6.0

python convert_to_onnx.py --yaml_file=tomofun.yaml --model=Convnext_tiny --model_saved_path=workdirs --model_weights=best.pth

python onnx_predict.py --test_data=test_data

Record audio

pip install pyaudio

Create the record file using the following ipynb file.

record.ipynb

Result

device: cuda:1, rank: 1, world_size: 2
device: cuda:0, rank: 0, world_size: 2
Train_Epoch: 0/99, Training_Loss: 0.011717653522888819 Training_acc: 0.42
Train_Epoch: 0/99, Training_Loss: 0.012225324138998985 Training_acc: 0.40               
Valid_Epoch: 0/99, Valid_Loss: 0.010406222939491273 Valid_acc: 0.49
Valid_Epoch: 0/99, Valid_Loss: 0.01043313001592954 Valid_acc: 0.48
--------------------------------
Train_Epoch: 1/99, Training_Loss: 0.00876050346220533 Training_acc: 0.54               
Train_Epoch: 1/99, Training_Loss: 0.008517718284080426 Training_acc: 0.56               
Valid_Epoch: 1/99, Valid_Loss: 0.008887257364888986 Valid_acc: 0.57               
Valid_Epoch: 1/99, Valid_Loss: 0.008429310657083989 Valid_acc: 0.58               
--------------------------------                          

............

Train_Epoch: 99/99, Training_Loss: 4.295512663895462e-06 Training_acc: 1.00               
Valid_Epoch: 99/99, Valid_Loss: 0.0004894535513647663 Valid_acc: 0.99               
Train_Epoch: 99/99, Training_Loss: 2.0122603179591654e-06 Training_acc: 1.00               
Valid_Epoch: 99/99, Valid_Loss: 0.0006921298647505341 Valid_acc: 0.99             
--------------------------------
Finished Training.

Accuracy

Loss

Confusion Matrix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Audio Classification

Dependency Setup

Dataset Preparation

Training

Predict

Convert to ONNX

Record audio

Result

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README_pix		README_pix
models		models
test_data		test_data
utils		utils
README.md		README.md
Tomofun_data_preprocessing.ipynb		Tomofun_data_preprocessing.ipynb
audio.yaml		audio.yaml
audio_train.csv		audio_train.csv
cat_dog.yaml		cat_dog.yaml
cat_dog_train.csv		cat_dog_train.csv
convert_to_onnx.py		convert_to_onnx.py
create_data_csv.ipynb		create_data_csv.ipynb
data_augmentation.ipynb		data_augmentation.ipynb
data_visualize.ipynb		data_visualize.ipynb
onnx_predict.py		onnx_predict.py
predict.py		predict.py
record.ipynb		record.ipynb
requirements.txt		requirements.txt
tomofun.yaml		tomofun.yaml
tomofun_train.csv		tomofun_train.csv
train.py		train.py
train_one_gpu.py		train_one_gpu.py

chingi071/Audio_Classification

Folders and files

Latest commit

History

Repository files navigation

Audio Classification

Dependency Setup

Dataset Preparation

Training

Predict

Convert to ONNX

Record audio

Result

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages