GitHub - INK-USC/shifted-label-distribution: Source code for paper "Looking Beyond Label Noise: Shifted Label Distribution Matters in Distantly Supervised Relation Extraction" (EMNLP 2019)

Looking Beyond Label Noise:

Shifted Label Distribution Matters in Distantly Supervised Relation Extraction

TL;DR: We identify shifted label distribution, an important yet long-overlooked issue in DSRE; introduce a simple yet effective bias adjustment to adapt a trained model along such shift explicitly; release a RE codebase.

Introduction

When we compare different sentence-level RE models, we observe diminishing phenomenon in model performance (i.e., neural models outperform feature-based models by a lot on human-annoated dataset, but the gap diminishes on DS datasets.) In addtion, we find two heuristic threshold techniques to be effective on DS datasets. These observation leads to shifted label distribution, an important yet long-overlooked issue in DSRE. We further introduce bias adjustment to adapt a trained model along such shift explicitly.

RE=Relation Extraction; DS=Distant Supervision.

Codebase Overview

We release code and provide detailed instructions for all models used in the paper, including feature-based models (ReHession, CoType and Logistic Regression) and neural models (Bi-GRU, Bi-LSTM, PCNN, CNN, PositionAware LSTM, Bi-GRU+ATT, PCNN+ATT).

Our codebase highlights an integrated framework for sentence-level neural RE models. You can easily customize the settings (dataset, model, hyper-parameters, etc.) by passing command line arguments. Check out this example:

python Neural/train.py --model bgru --data_dir data/neural/KBP --lr 1.0 --in_drop 0.6 --intra_drop 0.1 --out_drop 0.6 --info bgru_kbp --repeat 1
python Neural/eva.py --info bgru_kbp --repeat 1

Training Recipes

Environment Setup

We set up our environment in Anaconda3 (version: 5.2.0, build: py36_3) with the following commands.

conda create --name shifted
conda activate shifted
conda install pytorch=0.3.1
source deactivate

conda create --name shifted-neural
conda activate shifted-neural
conda install cudnn=7.1.2 pytorch=0.4.0 tqdm
source deactivate

Download and Pre-processing

Please check data download and pre-processing instructions in each data directory in ./data. Also, check this to download our processed word embeddings and word2id file.

Running Instructions

Click the model name to view the running instructions for each model.

Feature-based Models

Run conda activate shifted first to activate the environment for feature-based models.

Neural Models

Run conda activate shifted-neural first to activate the environment for neural models.

Reference

Please cite the following paper if you find the paper and the code to be useful :-)

@inproceedings{ye2019looking,
 title={Looking Beyond Label Noise: Shifted Label Distribution Matters in Distantly Supervised Relation Extraction},
 author={Ye, Qinyuan and Liu, Liyuan and Zhang, Maosen and Ren, Xiang},
 booktitle={Proc. of EMNLP-IJCNLP},
 year={2019}
}

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
CoType		CoType
DataProcessor		DataProcessor
LogisticRegression		LogisticRegression
Neural		Neural
NeuralATT		NeuralATT
ReHession		ReHession
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
brown_clustering.sh		brown_clustering.sh
feature_extraction.sh		feature_extraction.sh
overview.png		overview.png

License

INK-USC/shifted-label-distribution

Folders and files

Latest commit

History

Repository files navigation

Looking Beyond Label Noise:

Shifted Label Distribution Matters in Distantly Supervised Relation Extraction

Table of Contents

Introduction

Codebase Overview

Training Recipes

Environment Setup

Download and Pre-processing

Running Instructions

Feature-based Models

Neural Models

Reference

About

Topics

Resources

License

Stars

Watchers

Forks

Languages