Skip to content

debaditya473/Event-Extraction-NLP

Repository files navigation

Joint Event-Relation Extraction using Encoder Decoder Architecture

Natural Language Processing Course Project of Group 19 at IIT Kharagpur (Autumn'21)

Group Members:

  • Abhikhya Tripathy (19EC10085)
  • Debaditya Mukhopadhyay (19IE10036)
  • Aditya Basu (19IE10002)
  • Angana Mondal (19IE10039)

Introduction

Joint-event-extraction is a significant emerging application of NLP techniques which involves extracting structural information (i.e., event triggers, arguments of the event) from unstructured real-world corpora. Existing methods for this task rely on complicated pipelines prone to error propagation.

Model Architecture

An encoder-decoder based architecture for joint entity-relation extraction was proposed by Tapas Nayak et al., and we further develop the architecture to deploy it to predicting trigger, argument and relation tuple (including the classes of the trigger and argument). We also utilise pretrained BERT embeddings to preprocess our data.

Model Architecture

Datasets

The data is available at: https://drive.google.com/drive/u/1/folders/1fYP9PUQYRV0JWBa-N3CwuGkOCeBielT9
To obtain the Word2Vec embeddings and BERT embeddings, download 'w2v.txt' and 'BERT_embeddings.txt' from the aforementioned link.

Requirements

  • Python 3.5 +
  • Pytorch 1.1.0
  • CUDA 8.0

Running the Code

  • Python3: python3 Joint_Event_Extraction.py gpu_id random_seed source_data_dir target_data_dir train/test w2v/bert

  • IPython: Run individual cells of NLP_Proj6_Grp_19.ipynb

    • set the data folder and the output saving folder: src_data_folder = path trg_data_folder = path + 'Model'

    • for switching between training and testing phases, set the following parameters under if __name__ == "__main__":, job_mode = 'train' or job_mode = 'test'

    • for switching between Word2Vec and BERT embeddings, set the following parameters under if __name__ == "__main__":, embedding_type = 'w2v' or embedding_type = 'bert'

Command Line Arguments

  • Source_data_dir: Path to source data directory
  • Target_data_dir: Path to target data directory
  • train/test: Job mode (Choose only one of the two modes at once)
  • w2v/bert: Embedding type

Default Command Line Arguments for Google Colaboratory

  • os.environ[‘CUDA_VISIBLE_DEVICES’] (= gpu_id) = ‘0’
  • random_seed = 42

About

Natural Language Processing Project for Event Extraction

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •