Extract COVID Entities

Leveraging Event Specific & Chunk Span features to Extract COVID Events - 1st at the leaderboard for EMNLP 2020 WNUT Shared Task-3.

This Repo contains

Code for Models
Trained models used in the final submission.
Dependencies and steps to replicate results.

Relevant Links: arxiv-pdf, slides, poster

Please cite with the following BiBTeX code:

Author = {Ayush Kaushal and Tejas Vaidhya},
Title = {Leveraging Event Specific and Chunk Span features to Extract COVID Events from tweets},
Year = {2020},
Eprint = {arXiv:2012.10052},
Doi = {10.18653/v1/2020.wnut-1.79},
}

Authors: Ayush Kaushal and Tejas Vaidhya

Overview

Abstract

Twitter has acted as an important source of information during disasters and pandemic, especially during the times of COVID-19. In this paper, we describe our system entry for WNUT 2020 Shared Task-3. The task was aimed at automating the extraction of a variety of COVID-19 related events from Twitter, such as individuals who recently contracted the virus, someone with symptoms who were denied testing and believed remedies against the infection. The system consists of separate multi-task models for slot-filling subtasks and sentence-classification subtasks while leveraging the useful sentence-level information for the corresponding event. The system uses COVID-Twitter-Bert with attention-weighted pooling of candidate slot-chunk features to capture the useful information chunks. The system ranks 1st at the leader-board with F1 of 0.6598, without using any ensembles or additional datasets. The code and trained models are available at this https url.

System overview

Our system contains two models, one for sentence classification and one for slot-filling task, both with the following enhancements:

An event-prediction task as auxiliary subtask
Fuse event-prediction features for all the event-specific subtasks
Weighted pooling over the candidate chunk span enabling the model to attend to subtask specific cues
Domain-specific Covid-Twitter Bert

Refer our paper for complete details.

Slot-Filling

Classification

Dependencies and set-up

Dependency	Version	Installation Command
Python	3.8	`conda create --name covid_entities python=3.8` and `conda activate covid_entities`
PyTorch, cudatoolkit	1.5.0, 10.1	`conda install pytorch==1.5.0 cudatoolkit=10.1 -c pytorch`
Transformers (Huggingface)	2.9.0	`pip install transformers==2.9.0`
Scikit-learn	0.23.1	`pip install scikit-learn==0.23.1`
scipy	1.5.0	`pip install scipy==1.5.0`
ekphrasis	0.5.1	`pip install ekphrasis==0.5.1`
wandb	-	`pip install wandb`

Instructions

Set up the codebase and requirements
- git clone https://github.com/Ayushk4/extract_covid_entity & cd extract_covid_entity.
- Follow the instructions from the Dependencies and set-up above to install the dependencies.
- If you are interested in logging your runs, Set up your wandb. wandb login.
Set up the dataset: Follow instructions given in data/README.md
Recreating the experiments for our final submission:
- Slot-filling: python automate_multitask_bert_entity_classifier_experiments.py --sentence_level.
- Sentence classification: First pre_process by python3 pre_process.py (required only once) and then python3 sent_model.py --data <PREPROCESSED-FILE-LOCATION> --task " + <TASK-NAME>
- You may add the following optional flags depending on which experiment you would like to replicate. Run_name - --run=<YOUR_RUN_NAME>; Use COVID_Twitter BERT - --covid. Track runs on Wandb - --wandb.

Trained Models

Our model weights used in the submission have been released now.

Slot-filling models

Task	Link
Tested Positive	positive.tar.gz
Tested Negative	negative.tar.gz
Denied Testing	can_not_test.tar.gz
Death	death.tar.gz
Cure/Prevention	cure.tar.gz

Sentence classification models

Task	Link
Tested Positive	sent_positive.tar.gz
Tested Negative	sent_negative.tar.gz
Denied Testing	sent_can_not_test.tar.gz
Death	sent_death.tar.gz
Cure/Prevention	sent_cure.tar.gz

Model performances on test set.

We stand first overall as well as on Denied Testing, Death, Cure/Prevention categories.

Task	Micro-F1	Micro-Precision	Micro-Recall
Tested Positive	0.676	0.802	0.584
Tested Negative	0.663	0.659	0.667
Denied Testing	0.652	0.666	0.640
Death	0.694	0.724	0.667
Cure/Prevention	0.621	0.745	0.532
Overall	0.660	0.727	0.604

Miscellanous

You may contact us by opening an issue on this repo. Please allow 2-3 days of time to address the issue.
For the slot-filling model, the starter code was obtained from here
License: MIT

Update: Dec 2020: The dataset is no longer public due to Twitter Privacy Policy. To get access to the dataset, please mail zong.56@osu.edu and cc alan.ritter@cc.gatech.edu.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data		data
images		images
model		model
sent_levl		sent_levl
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
automate_multitask_bert_entity_classifier_experiments.py		automate_multitask_bert_entity_classifier_experiments.py
download_data.py		download_data.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Extract COVID Entities

Overview

Abstract

System overview

Slot-Filling

Classification

Dependencies and set-up

Instructions

Trained Models

Slot-filling models

Sentence classification models

Model performances on test set.

Miscellanous

About

Releases 1

Packages

Contributors 2

Languages

License

Ayushk4/extract_covid_entity

Folders and files

Latest commit

History

Repository files navigation

Extract COVID Entities

Overview

Abstract

System overview

Slot-Filling

Classification

Dependencies and set-up

Instructions

Trained Models

Slot-filling models

Sentence classification models

Model performances on test set.

Miscellanous

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages