Skip to content
Heterogeneous Representations for Neural Relation Extraction https://arxiv.org/abs/1903.10126
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
data
log add replication results Jun 13, 2019
plot add replication results Jun 13, 2019
utils add utils Mar 14, 2019
.gitignore add prepare data Mar 15, 2019
LICENSE add misc Mar 10, 2019
README.md Merge pull request #1 from lfoppiano/patch-1 Jul 6, 2019
bilstm.py
complex_hrere.py add task Mar 12, 2019
config.py fix errors when constructing subgraph Jun 9, 2019
create_kg.py fix errors when constructing subgraph Jun 9, 2019
eval.py add replication results Jun 13, 2019
final_plot.py add replication results Jun 13, 2019
get_embeddings.py update README Mar 23, 2019
model.py add task Mar 12, 2019
model_param_space.py get embeddings for RE Mar 23, 2019
preprocess.py add print in preprocess Mar 10, 2019
real_hrere.py add task Mar 12, 2019
task.py add replication results Jun 13, 2019

README.md

HRERE

Connecting Language and Knowledge with Heterogeneous Representations for Neural Relation Extraction

Paper Published in NAACL 2019: HRERE

Prerequisites

  • tensorflow >= r1.2
  • hyperopt
  • gensim
  • sklearn

Dataset

To download the dataset used:

cd ./data
python prepare_data.py

Preprocessing

Construct the knowledge graph:

python create_kg.py

Preprocessing the data:

python preprocess.py -p -g

Complex Embeddings

Copy the directory ./fb3m in the data folder in tensorflow-efe and run the following commands to obtain the complex embeddings:

python preprocess.py --data fb3m
python train.py --model best_Complex_tanh_fb3m --data fb3m --save
python get_embeddings.py --embed complex --model best_Complex_tanh_fb3m --output <repo_path>/fb3m

Then copy e2id.txt and r2id.txt in the tensorflow-efe/data/fb3m to ./fb3m and run the following command:

python get_embeddings.py 

Hyperparameters Tuning

python task.py --model <model_name> --eval <max_number_of_search> --runs <number_of_runs_per_setting>

model_name can be found in model_param_space.py. You can also define the search space by yourself.

Evaluation

python eval.py --model <model_name> --prefix <file_prefix> --runs <number_of_runs>

model_name can be found in model_param_space.py. To replicate our results, use best_complex_hrere as the model_name. It will run the model multiple times and calculate the means and stds of P@N which are logged in ./log. The predicted probabilities and labels of the first run are stored in plot/output for plotting PR curves.

Results

Curve

After replicating the results, we find that the results on P@N(%) reported in the paper seem to be a bit over-optimisitic due to the variance. According our replication based on 5 runs (./log/replication.log), the results are P@10% (0.849 +- 0.019), P@30% (0.728 +- 0.019), P@50% (0.636 +- 0.013). We also report our scores to NLP Progress based on this replication.

Cite

If you found this codebase or our work useful, please cite:

@InProceddings{xu2019connecting,
  author = {Xu, Peng and Barbosa, Denilson},
  title = {Connecting Language and Knowledge with Heterogeneous Representations for Neural Relation Extraction}
  booktitle = {The 17th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL 2019)},
  month = {June},
  year = {2019},
  publisher = {ACL}
}
You can’t perform that action at this time.