Skip to content

Code and data to reproduce the experiments presented in the article "A Human Subject Study of Named Entity Recognition (NER) in Conversational Music Recommendation Queries" (EACL 2023)

License

Notifications You must be signed in to change notification settings

deezer/music-ner-eacl2023

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

music-ner-eacl2023

This repository provides Python code to reproduce the experiments from the article A Human Subject Study of Named Entity Recognition (NER) in Conversational Music Recommendation Queries, accepted for publication to EACL 2023.

Installation

git clone git@github.com:deezer/music-ner-eacl2023.git
cd music-ner-eacl2023

Setup

Build the docker image and run it in a container while launching an interactive bash session (the current docker image requires a CUDA-capable GPU):

$ make build
$ make run-bash

Experiments

Data statistics and preparation

Print the data statistics shown in Table 2 of the paper:

poetry run python3 music-ner/datasets/stats.py --data_dir=data/dataset1
poetry run python3 music-ner/datasets/stats.py --data_dir=data/dataset2
poetry run python3 music-ner/datasets/stats.py --data_dir=data/dataset3
poetry run python3 music-ner/datasets/stats.py --data_dir=data/dataset4

Prepare ground-truth sets with seen and rare / unseen entities:

poetry run python3 music-ner/datasets/create_seen_rare_ds.py --data_dir data/dataset1/ --th_seen=1 --th_rare_unseen=0
poetry run python3 music-ner/datasets/create_seen_rare_ds.py --data_dir data/dataset2/ --th_seen=1 --th_rare_unseen=0
poetry run python3 music-ner/datasets/create_seen_rare_ds.py --data_dir data/dataset3/ --th_seen=1 --th_rare_unseen=0
poetry run python3 music-ner/datasets/create_seen_rare_ds.py --data_dir data/dataset4/ --th_seen=1 --th_rare_unseen=0

Fine-tuning

Note: some small variations between different runs, hence from the exact scores reported in the paper, could exist but with no statistically significant differences.

Fine-tune multiple transformers (BERT, RoBERTa and MPNet) to perform music NER and print results (Table 4):

./music-ner/scripts/run_ner_model_selection.sh
poetry run python3 music-ner/tables-and-stats/transformer_baselines.py --results_dir output

Fine-tune BERT to perform music NER, export human annotation results in the same json format as the one produced by transformers, and print results (Tables 5 and 6):

./music-ner/scripts/run_ner.sh
poetry run python3 music-ner/src/compute_human_performance.py --data_dir data/dataset1 --output_dir output/dataset1
poetry run python3 music-ner/src/compute_human_performance.py --data_dir data/dataset2 --output_dir output/dataset2
poetry run python3 music-ner/src/compute_human_performance.py --data_dir data/dataset3 --output_dir output/dataset3
poetry run python3 music-ner/src/compute_human_performance.py --data_dir data/dataset4 --output_dir output/dataset4
poetry run python3 music-ner/tables-and-stats/human_vs_bert.py --results_dir output

Run experiments for seen and rare / unseen ground-truth sets and print results (Table 7):

./music-ner/scripts/run_ner_seen_ents.sh
./music-ner/scripts/run_ner_rare_unseen_ents.sh
poetry run python3 music-ner/tables-and-stats/seen_vs_unseen.py --results_dir output

Reproduce Figure 1 with the detailed error analysis for BERT and human predictors:

poetry run python3 music-ner/tables-and-stats/graph_error_analysis.py --results_dir output

Paper

Please cite our paper if you use this data or code in your work:

@InProceedings{Epure2023,
  title={A Human Subject Study of Named Entity Recognition (NER) in Conversational Music Recommendation Queries},
  author={Epure, Elena and Hennequin, Romain},
  booktitle={Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL)},
  month={May},
  year={2023}
}

About

Code and data to reproduce the experiments presented in the article "A Human Subject Study of Named Entity Recognition (NER) in Conversational Music Recommendation Queries" (EACL 2023)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published