This repository contains the implementation (in PyTorch) of some models for Sound/Audio Classification.
MFCC + TDNN (Mel Frequency Cepstral Coefficents, Time-Delay Neural Network)
DCASE2020 : ./ackit/data_utils/soundreader2020.py
COUGHVID2021 : ./ackit/data_utils/soundexplore.ipynb : ./coughvid_reader.py
covid19-cough : ../SoundDL-CoughVID/covid19_explore
The procedure of preprocessing see in soundexplore.ipynb
This dataset contains 20000 audio pieces, each lasting 10 seconds, and a total of 7 categories of machines, totaling 23 different machines.
set1
- input: ./datacreate.ipynb, preprocessing from mariostrbac
- 2.95s
- run: ./ackit/dsptcls.py
set2
- input: ./datacreate.ipynb, dcase2020 mariostrbac
- 2.95s
- run: ./ackit/dsptcls.py
set3 Failed
- input: ./datacreate.ipynb, reference: COVID-19 Screening From Audio | Part 2
- dataloader: reference: mariostrbac
set4
- dataset: covid19 covid19-cough
- trained with fold10 cycle: not good
- loss: CrossEntropy
set5
- dataset: covid19
- trained with fold1 not cycle: well
- loss: CrossEntropy
set6
- dataset: covid19
- trained with fold1 not cycle: not good
- loss: FocalLoss
set7 The Most Perfect Set
Data: coughvid feature:Melspec
Pretrained: None; Model: mnv2,Loss: FocalLoss
LR: 0.0002, schedule: torch.optim.lr_scheduler.CosineAnnealingLRepoch_num: 220, batch_size: 128
- trainer jupyter 1: ./ackit/coughcls_tdnn.ipynb
- trainer jupyter 2: ./ackit/coughcls_tdnn_focalloss.ipynb
- trainer py: ./ackit/trainer_tdnn.py
- model: ./ackit/models/tdnn.py
- dataset: ./ackit/data_utils/coughvid_reader.py
- dataset: ./datasets/waveinfo_annotation.csv
- models: ./ackit/models/cnn_classifier.py
- pretrain_model: ./runs/VAE/model_epoch_12/model_epoch12.pth
- config: ./configs/autoencoder.yaml
- result: ./runs/vae_cnncls/202404181142_ptvae/
- accuracy, precision, recall: ./ackit/utils/plotter.py calc_accuracy(pred_matrix, label_vec, save_path)
trainer = TrainerEncoder(istrain=True, isdemo=False)
trainer.train_classifier()
>>python trainer_ConvEncoder.py
trainer = TrainerEncoder(istrain=False, isdemo=False)
trainer.plot_reduction()
>>python trainer_ConvEncoder.py
see plot_reduction(self, resume_path="202404181142_ptvae") about how to construct the input of t-SNE
and ./ackit/utils/plotter.py plot_TSNE(embd, names, save_path)
procedure same as t-SNE
trainer = TrainerEncoder(istrain=False, isdemo=False)
trainer.plot_heatmap()
>>python trainer_ConvEncoder.py
see ./ackit/utils/plotter.py plot_heatmap(pred_matrix, label_vec, savepath)