PKU Team Code for DCASE 2021 Task6

This is the code of PKU team for DCASE 2021 Task 6.

Setting up the Code and Environment

Clone this repository: https://github.com/WangHelin1997/DCASE2021_Task6_PKU.git
Install pytorch >=1.4.0
Use pip to install dependencies: pip install -r requirements.txt

Preparing the data

Download the Clotho dataset for DCASE2021 Automated Audio Captioning challenge. And how to prepare training data and setup coco caption, please refer to Dcase2020 BUPT team's
Enter the audio_tag directory.
Firstly, run python generate_word_list.py to create words list word_list_pretrain_rules.p and tagging words to indexes of embedding layer TaggingtoEmbs.
Then run python generate_tag.py to generate audioTagName_{development/validation/evaluation}_fin_nv.pickle and audioTagNum_{development/validation/evaluation}_fin_nv.pickle

Configuration

The training configuration is saved in the hparams.py and you can reset it to your own parameters.

Training tagging model

Run the Tag_train.py. Firstly, train the tagging model by freezing up the CNN for 80 epochs, then fintune it for 25 epochs. Finally, the mAP of tagging could reach 0.287 in the evaluation splits.

Training the captioning model

We choose the 25th epoch keyword pre-trained model for our final encoder.

Train baseline model

Run python run.py, it will freeze up the encoder and just train the part of decoder for 30 epochs. We choose the best model in validation splits for the next step training.
The scores of validation splits will be shown after every epoch.

Train baseline model by optimizing CIDEr

Run python run_rl.py to train the model by opytimizing CIDEr.
The scores of validation splits will be shown after every epoch.

Eval

Run python eval.py to get the score of a single model.
Run python eval_ensemble.py to get the score of an ensemble model.
- Modify eval_ensemble.py to ensemble models by epochs.
- Or Modify ensemble.py to select models of different seeds.

Test

Set mode=test in hparams.py. Then run python train.py, python train_rl.py or python ensemble.py to get the final results in test splits.

Cite

The code is implementation of the papers: pdf and pdf.

You can cite as following:

@inproceedings{Ye2021,
    author = "Ye, Zhongjie and Wang, Helin and Yang, Dongchao and Zou, Yuexian",
    title = "Improving the Performance of Automated Audio Captioning via Integrating the Acoustic and Semantic Information",
    booktitle = "Proceedings of the 6th Detection and Classification of Acoustic Scenes and Events 2021 Workshop (DCASE2021)",
    address = "Barcelona, Spain",
    month = "November",
    year = "2021",
    pages = "40--44"
}

or

@techreport{ye2021_t6,
    Author = "Ye, Zhongjie and Wang, Helin and Yang, Dongchao and Zou, Yuexian",
    title = "Improving the Performance of Automated Audio Captioning via Integrating the Acoustic and Textual Information",
    institution = "DCASE2021 Challenge",
    year = "2021",
    month = "July"
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
audio_tag		audio_tag
create_dataset		create_dataset
data_augmentation		data_augmentation
data_handling		data_handling
scripts		scripts
.gitignore		.gitignore
README.md		README.md
Tag_train.py		Tag_train.py
augmentation.py		augmentation.py
encoder.py		encoder.py
ensemble.py		ensemble.py
eval.py		eval.py
eval_ensemble.py		eval_ensemble.py
eval_metrics.py		eval_metrics.py
finetune.py		finetune.py
hparams.py		hparams.py
loss.py		loss.py
model.py		model.py
requirements.txt		requirements.txt
run.py		run.py
run_rl.py		run_rl.py
train.py		train.py
train_rl.py		train_rl.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PKU Team Code for DCASE 2021 Task6

Setting up the Code and Environment

Preparing the data

Configuration

Training tagging model

Training the captioning model

Train baseline model

Train baseline model by optimizing CIDEr

Eval

Test

Cite

About

Releases

Packages

Languages

WangHelin1997/DCASE2021_Task6_PKU

Folders and files

Latest commit

History

Repository files navigation

PKU Team Code for DCASE 2021 Task6

Setting up the Code and Environment

Preparing the data

Configuration

Training tagging model

Training the captioning model

Train baseline model

Train baseline model by optimizing CIDEr

Eval

Test

Cite

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages