Please feel free to use the dataset, code, and models, and consider citing our paper as
@inproceedings{hou23_interspeech,
author={Yuanbo Hou and Siyang Song and Cheng Luo and Andrew Mitchell and Qiaoqiao Ren and Weicheng Xie and Jian Kang and Wenwu Wang and Dick Botteldooren},
title={{Joint Prediction of Audio Event and Annoyance Rating in an Urban Soundscape by Hierarchical Graph Representation Learning}},
year=2023,
booktitle={Proc. INTERSPEECH 2023},
pages={331--335},
doi={10.21437/Interspeech.2023-1021}
}
1) Unzip the Dataset.zip.001 ~ Dataset.zip.012 under the application folder
2) Unzip the pretrained_models.zip.001 ~ pretrained_models.zip.037 under the application folder
3) Enter the application folder: cd application
python inference.py -model HGRL
----------------------------------------------------------------------------------------
Loading data time: 0.613 s
Split development data to 2200 training and 245 validation data and 445 test data.
Number of 445 audios in testing
ARP:
mse_loss: 1.0493635489891715, mae_loss: 0.8028289986728283, r2: 0.45838324378599105
AEC:
Acc: 0.9171348314606742
python inference.py -model DNN
----------------------------------------------------------------------------------------
Loading data time: 0.395 s
Split development data to 2200 training and 245 validation data and 445 test data.
Number of 445 audios in testing
ARP:
mse_loss: 1.7334889331245231, mae_loss: 1.0105259865535778, r2: 0.10527990628586636
AEC:
Acc: 0.9048689138576779
python inference.py -model CNN
----------------------------------------------------------------------------------------
Loading data time: 0.346 s
Split development data to 2200 training and 245 validation data and 445 test data.
Number of 445 audios in testing
ARP:
mse_loss: 1.6749440582583413, mae_loss: 0.9967529329128479, r2: 0.13549716059065042
AEC:
Acc: 0.9074906367041199
python inference.py -model CNN_Transformer
----------------------------------------------------------------------------------------
Loading data time: 0.401 s
Split development data to 2200 training and 245 validation data and 445 test data.
Number of 445 audios in testing
ARP:
mse_loss: 1.4451657348827343, mae_loss: 0.9664605942522541, r2: 0.25409456210595127
AEC:
Acc: 0.8894194756554307
python inference.py -model PANN_fixed
----------------------------------------------------------------------------------------
Loading data time: 0.388 s
Split development data to 2200 training and 245 validation data and 445 test data.
Number of 445 audios in testing
ARP:
mse_loss: 1.2619049372729334, mae_loss: 0.8799253472853243, r2: 0.3486824852696866
AEC:
Acc: 0.9105805243445693
python inference.py -model PANN_fine_tuning
----------------------------------------------------------------------------------------
Loading data time: 0.392 s
Split development data to 2200 training and 245 validation data and 445 test data.
Number of 445 audios in testing
ARP:
mse_loss: 1.162030314619814, mae_loss: 0.8582522204431255, r2: 0.4002316068315971
AEC:
Acc: 0.9188202247191011
If you want to modify the model structure, please modify it in the models_pytorch.py file under the framework folder.
If you want to retrain these models, the data loader and model definition are ready, you only need to refer to the common Pytorch training process to complete your own training script.