Skip to content

ZhaoKe1024/AudioClassification-Pytorch-KZhao

Repository files navigation

English|简体中文

Audio Classification

This repository contains the implementation (in PyTorch) of some models for Sound/Audio Classification.

MFCC + TDNN (Mel Frequency Cepstral Coefficents, Time-Delay Neural Network)

Dataset

DCASE2020 : ./ackit/data_utils/soundreader2020.py

COUGHVID2021 : ./ackit/data_utils/soundexplore.ipynb : ./coughvid_reader.py

covid19-cough : ../SoundDL-CoughVID/covid19_explore

The procedure of preprocessing see in soundexplore.ipynb

This dataset contains 20000 audio pieces, each lasting 10 seconds, and a total of 7 categories of machines, totaling 23 different machines.

Experiments Set

dsptcls

Dataset-Pretrain-Classifiers

cnn: CNNCls

set1

  • input: ./datacreate.ipynb, preprocessing from mariostrbac
  • 2.95s
  • run: ./ackit/dsptcls.py

set2

  • input: ./datacreate.ipynb, dcase2020 mariostrbac
  • 2.95s
  • run: ./ackit/dsptcls.py

set3 Failed

set4

  • dataset: covid19 covid19-cough
  • trained with fold10 cycle: not good
  • loss: CrossEntropy

set5

  • dataset: covid19
  • trained with fold1 not cycle: well
  • loss: CrossEntropy

set6

  • dataset: covid19
  • trained with fold1 not cycle: not good
  • loss: FocalLoss

set7 The Most Perfect Set

Data: coughvid feature:Melspec
Pretrained: None; Model: mnv2,Loss: FocalLoss
LR: 0.0002, schedule: torch.optim.lr_scheduler.CosineAnnealingLRepoch_num: 220, batch_size: 128

TDNN - coughvid2021

  • trainer jupyter 1: ./ackit/coughcls_tdnn.ipynb
  • trainer jupyter 2: ./ackit/coughcls_tdnn_focalloss.ipynb
  • trainer py: ./ackit/trainer_tdnn.py
  • model: ./ackit/models/tdnn.py
  • dataset: ./ackit/data_utils/coughvid_reader.py
  • dataset: ./datasets/waveinfo_annotation.csv

CNN - dcase2020

  • models: ./ackit/models/cnn_classifier.py
  • pretrain_model: ./runs/VAE/model_epoch_12/model_epoch12.pth
  • config: ./configs/autoencoder.yaml
  • result: ./runs/vae_cnncls/202404181142_ptvae/
  • accuracy, precision, recall: ./ackit/utils/plotter.py calc_accuracy(pred_matrix, label_vec, save_path)

run train

trainer = TrainerEncoder(istrain=True, isdemo=False)
trainer.train_classifier()

>>python trainer_ConvEncoder.py

run t-SNE

trainer = TrainerEncoder(istrain=False, isdemo=False)
trainer.plot_reduction()

>>python trainer_ConvEncoder.py

see plot_reduction(self, resume_path="202404181142_ptvae") about how to construct the input of t-SNE

and ./ackit/utils/plotter.py plot_TSNE(embd, names, save_path)

run Heatmap

procedure same as t-SNE

trainer = TrainerEncoder(istrain=False, isdemo=False)
trainer.plot_heatmap()

>>python trainer_ConvEncoder.py

see ./ackit/utils/plotter.py plot_heatmap(pred_matrix, label_vec, savepath)

About

Sound Classification Project and Demo

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages