COVID-19 event extraction from Twitter via extractive question answering with continuous prompts

The corresponding code for our paper: COVID-19 event extraction from Twitter via extractive question answering with continuous prompts.

Installation

This repository requires Python 3.8 or later.

Installing the library and dependencies

pip install -r requirements.txt

Dataset

To obtain the original dataset, please see Extracting COVID-19 Events from Twitter.

Preprocess the dataset to our required format

python preprocess_dataset.py

Training & evaluation

Here we train our model on the preprocessed data file train_dataset.pkl and evaluate on shared_task-test_set-final, which is a 2020 shared task on COVID-19 event extraction. Simply use run.py file,

python run.py\
          --data_dir train_dataset.pkl\
          --train_batch_size 8\
          --learning_rate 4e-06\
          --num_epoch 8\
          --seed 902\

usage: run.py [-h] --data_dir DATA_DIR --test_data_dir TEST_DATA_DIR
              [--train_batch_size TRAIN_BATCH_SIZE]
              [--learning_rate LEARNING_RATE] [--num_epoch NUM_EPOCH]
              [--model MODEL] [--seed SEED] [--pre_seq_len PRE_SEQ_LEN]
              [--prefix_hidden_size PREFIX_HIDDEN_SIZE]
              [--output_dir OUTPUT_DIR]

optional arguments:
  -h, --help            show this help message and exit
  --data_dir DATA_DIR   path to train dataset
  --test_data_dir TEST_DATA_DIR
                        path to test dataset
  --train_batch_size TRAIN_BATCH_SIZE
                        batch size during training
  --learning_rate LEARNING_RATE
                        learning rate for the RoBERTa encoder
  --num_epoch NUM_EPOCH
                        number of the training epochs
  --model MODEL         the base model name (a huggingface model)
  --seed SEED           the random seed
  --pre_seq_len PRE_SEQ_LEN
                        the length of prefix tokens
  --prefix_hidden_size PREFIX_HIDDEN_SIZE
                        the hidden size of prefix tokens
  --output_dir OUTPUT_DIR
                        output directory to store predictions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model

model

README.md

README.md

preprocess_dataset.py

preprocess_dataset.py

requirements.txt

requirements.txt

run.py

run.py

Repository files navigation

COVID-19 event extraction from Twitter via extractive question answering with continuous prompts

Installation

Installing the library and dependencies

Dataset

Training & evaluation

Our evaluation results for each event and subtask

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
model		model
README.md		README.md
preprocess_dataset.py		preprocess_dataset.py
requirements.txt		requirements.txt
run.py		run.py

bionlproc/twitter-covid-QA-extraction

Folders and files

Latest commit

History

Repository files navigation

COVID-19 event extraction from Twitter via extractive question answering with continuous prompts

Installation

Installing the library and dependencies

Dataset

Training & evaluation

Our evaluation results for each event and subtask

About

Resources

Stars

Watchers

Forks

Languages