This is the implementation of the paper QA4IE: A Question Answering based Framework for Information Extraction.
This reporsitory follows the implementation of BiDAF.
Please contact Lin Qiu(email@example.com) for questions and suggestions.
To run our code, you need to download GloVe for pre-trained word embedding and NLTK for tokenizer. You can run
download.sh to download these two datasets to
Run the preprocessing code at
squad/prepro_span.py for span datasets and
squad/prepro_seq.py for seq datasets with:
python -m squad.prepro_span
python -m squad.prepro_seq
The data file after preprocessing will be saved in
$PWD/data/qa4ie. You need to make sure the argument of input file path are correct in the code (line 26). The default setting is for small sized datasets with document length < 400. If you want to try datasets with longer documents, you should modify the source_dir with
To train our model in default settings, run:
python -m basic.cli --mode train --noload --len_opt --cluster --run_id default
The default setting is to train a model with a span dataset. Additional configurations can be found at
To test, run:
python -m basic.cli --len_opt --cluster --run_id default
This command loads the most recently saved model during training and begins testing on the test data. You can find the inference results of test data in the output directory
To evaluate, run:
python squad/evaluate-v1.1.py <file dir of groundtruths> <file dir of inference results>
The file directories here in default settings are:
python squad/evaluate-v1.1.py $HOME/data/span/0-400/test.span.json $PWD/out/basic/default/answer/test-100000.json
The evaluation results can be found in