DEGREE: A Data-Efficient Generation-Based Event Extraction Model

Code for our NAACL-2022 paper DEGREE: A Data-Efficient Generation-Based Event Extraction Model.

Environment

Python==3.8
PyTorch==1.8.0
transformers==3.1.0
protobuf==3.17.3
tensorboardx==2.4
lxml==4.6.3
beautifulsoup4==4.9.3
bs4==0.0.1
stanza==1.2
sentencepiece==0.1.95
ipdb==0.13.9

Note:

If you meet issues reated to rust when installing transformers through pip, this website might be helpful
Or you can reference the env_reference.yml for clearer installation

Datasets

We support ace05e, ace05ep, and ere.

Preprocessing

Our preprocessing mainly adapts OneIE's released scripts with minor modifications. We deeply thank the contribution from the authors of the paper.

`ace05e`

Prepare data processed from DyGIE++
Put the processed data into the folder processed_data/ace05e_dygieppformat
Run ./scripts/process_ace05e.sh

`ace05ep`

Download ACE data from LDC
Run ./scripts/process_ace05ep.sh

`ere`

Download ERE English data from LDC, specifically, "LDC2015E29_DEFT_Rich_ERE_English_Training_Annotation_V2", "LDC2015E68_DEFT_Rich_ERE_English_Training_Annotation_R2_V2", "LDC2015E78_DEFT_Rich_ERE_Chinese_and_English_Parallel_Annotation_V2"
Collect all these data under a directory with such setup:

ERE
├── LDC2015E29_DEFT_Rich_ERE_English_Training_Annotation_V2
│     ├── data
│     ├── docs
│     └── ...
├── LDC2015E68_DEFT_Rich_ERE_English_Training_Annotation_R2_V2
│     ├── data
│     ├── docs
│     └── ...
└── LDC2015E78_DEFT_Rich_ERE_Chinese_and_English_Parallel_Annotation_V2
      ├── data
      ├── docs
      └── ...

Run ./scripts/process_ere.sh

The above scripts will generate processed data (including the full training set and the low-resourece sets) in ./process_data.

Training

DEGREE (End2end)

Run ./scripts/train_degree_e2e.sh or use the following commands:

Generate data for DEGREE (End2end)

python degree/generate_data_degree_e2e.py -c config/config_degree_e2e_ace05e.json

Train DEGREE (End2end)

python degree/train_degree_e2e.py -c config/config_degree_e2e_ace05e.json

The model will be stored at ./output/degree_e2e_ace05e/[timestamp]/best_model.mdl in default.

DEGREE (ED)

Run ./scripts/train_degree_ed.sh or use the following commands:

Generate data for DEGREE (ED)

python degree/generate_data_degree_ed.py -c config/config_degree_ed_ace05e.json

Train DEGREE (ED)

python degree/train_degree_ed.py -c config/config_degree_ed_ace05e.json

The model will be stored at ./output/degree_ed_ace05e/[timestamp]/best_model.mdl in default.

DEGREE (EAE)

Run ./scripts/train_degree_eae.sh or use the following commands:

Generate data for DEGREE (EAE)

python degree/generate_data_degree_eae.py -c config/config_degree_eae_ace05e.json

Train DEGREE (EAE)

python degree/train_degree_eae.py -c config/config_degree_eae_ace05e.json

The model will be stored at ./output/degree_eae_ace05e/[timestamp]/best_model.mdl in default.

Evaluation

Evaluate DEGREE (End2end) on Event Extraction task

python degree/eval_end2endEE.py -c config/config_degree_e2e_ace05e.json -e [e2e_model]

Evaluate DEGREE (Pipe) on Event Extraction task

python degree/eval_pipelineEE.py -ced config/config_degree_ed_ace05e.json -ceae config/config_degree_eae_ace05e.json -ed [ed_model] -eae [eae_model]

Evaluate DEGREE (EAE) on Event Argument Extraction task (given gold triggers)

python degree/eval_pipelineEE.py -ceae config/config_degree_eae_ace05e.json -eae [eae_model] -g

Pre-Trained Models

Dataset	Model	Model	Model
ace05e	DEGREE (EAE)	DEGREE (ED)	DEGREE (E2E)
ace05ep	DEGREE (EAE)	DEGREE (ED)	DEGREE (E2E)
ere	DEGREE (EAE)	DEGREE (ED)	DEGREE (E2E)

Citation

If you find that the code is useful in your research, please consider citing our paper.

@inproceedings{naacl2022degree,
    author    = {I-Hung Hsu and Kuan-Hao Huang and Elizabeth Boschee and Scott Miller and Prem Natarajan and Kai-Wei Chang and Nanyun Peng},
    title     = {DEGREE: A Data-Efficient Generative Event Extraction Model},
    booktitle = {Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)},
    year      = {2022},
}

Contact

If you have any issue, please contact I-Hung Hsu at (ihunghsu@usc.edu) or Kuan-Hao Huang at (khhuang@cs.ucla.edu).

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
config		config
degree		degree
preprocessing		preprocessing
resource		resource
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
Pre-recorded_Video_NAACL2022_Degree.mp4		Pre-recorded_Video_NAACL2022_Degree.mp4
README.md		README.md
env_reference.yml		env_reference.yml
slide_degree_naacl2022.pdf		slide_degree_naacl2022.pdf

License

PlusLabNLP/DEGREE

Folders and files

Latest commit

History

Repository files navigation

DEGREE: A Data-Efficient Generation-Based Event Extraction Model

Environment

Datasets

Preprocessing

ace05e

ace05ep

ere

Training

DEGREE (End2end)

DEGREE (ED)

DEGREE (EAE)

Evaluation

Pre-Trained Models

Citation

Contact

About

Resources

License

Stars

Watchers

Forks

Languages

`ace05e`

`ace05ep`

`ere`