Recognizing Musical Entities in User-generated Content

We present a novel method for detecting musical entities from user-generated content, modelling linguistic features with statistical models and extracting contextual information from a radio schedule. We analyzed tweets related to a classical music radio station, integrating its schedule to connect users' messages to tracks broadcasted.

This repository contains code to reproduce the results of our arXiv paper.

Reference:

Lorenzo Porcaro, Horacio Saggion (2019). Recognizing Musical Entities in User-generated Content. Paper presented at the International Conference on Computational Linguistics and Intelligent Text Processing (CICLing) 2019, University of La Rochelle, La Rochelle, 7-13 April.

Contact:

lorenzo.porcaro at gmail.com

Reproduce our results

Installation:

Create a python 2.7 (sorry!) virtual environment and install dependencies pip install -r src/requirements.txt

Update config file:

Update the file etc/config.yaml, insert your consumer key, consumer secret, access token, access secret from the Twitter API. More info about the API: https://developer.twitter.com/

Import data:

To receive the data for reproducing the experiment, please contact lorenzo.porcaro at gmail.com. Once received, go to the data README page for more info.

Pre-process data:

To pre-process the data, run:

python src/hydrate_tweet.py -i ../path/to/input/file.json

It will read the tweet IDs and related annotations from the input file, and create the following output files

INPUTFILE_entities.csv: list of entities annotated
INPUTFILE_summary.csv: tweets summary information (creation date, raw text, etc)
INPUTFILE_text_tkn.txt: tweet raw texts tokenized

Extract features:

To extract the required features from the data, run:

python src/extract_features.py -i ../path/to/INPUTFILE_summary.csv -e ../path/to/INPUTFILE_entities.csv -o ../path/to/OUTPUTFILE_WEKA.csv -n ../path/to/OUTPUTFILE_biLSTM_CRF.csv

It extracts several features from the input tweets for performing the experiments. It takes as input the INPUTFILE_summary.csv and INPUTFILE_entities.csv, and it creates two output files: one which can be used as input in WEKA, and one which can be used as input in this BiLSTM-CNN-CRF architecture for sequence tagging implementation

Schedule matching:

To run the matching against the schedule, run

python src/schedule_matcher.py -w work_tsl -c contr_tsl -t time_tsl -i ../path/to/UGC_INPUTFILE_summary.csv -s ../path/to/SCHEDULE_INPUTFILE_summary.csv

It searches for matches between entities annotated in the schedule and user-generated tweets. It writes the results in a text file in CoNLL format. The input parameters are the input summary files and the thresholds:

time_tsl (int): time-distance threshold (in seconds) between schedule tweet and user-generated tweet
work_tsl (float): string similarity threshold for Musical Work entities
contr_tsl (float): string similarity threshold for Contributor entities

The output file is written in results/schedule_matcher_%s_%s_%s.txt, where the %s in the file path are the values used for the thresholds.

For evaluating the results obtained from the schedule matching, run

src/conlleval < results/schedule_matcher_%s_%s_%s.txt > results/score.schedule_matcher_%s_%s_%s.txt

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
data		data
etc		etc
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Recognizing Musical Entities in User-generated Content

Reference:

Contact:

Reproduce our results

Installation:

Update config file:

Import data:

Pre-process data:

Extract features:

Schedule matching:

About

Releases

Packages

Languages

License

LPorcaro/musicner

Folders and files

Latest commit

History

Repository files navigation

Recognizing Musical Entities in User-generated Content

Reference:

Contact:

Reproduce our results

Installation:

Update config file:

Import data:

Pre-process data:

Extract features:

Schedule matching:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages