From 48bdac9ef1d3e57142152a9058a8ea7207e9bfe4 Mon Sep 17 00:00:00 2001 From: Qinyuan Ye Date: Thu, 24 Oct 2019 00:39:29 -0700 Subject: [PATCH] update README files --- CoType/README.md | 19 +++++++++ LogisticRegression/README.md | 21 +++++++++ Neural/README.md | 32 ++++++++++++++ NeuralATT/README.md | 7 +++ README.md | 82 +++++++++--------------------------- ReHession/README.md | 44 +++++++++++++++++++ 6 files changed, 144 insertions(+), 61 deletions(-) create mode 100644 CoType/README.md create mode 100644 LogisticRegression/README.md create mode 100644 Neural/README.md create mode 100644 NeuralATT/README.md create mode 100644 ReHession/README.md diff --git a/CoType/README.md b/CoType/README.md new file mode 100644 index 0000000..e0444ee --- /dev/null +++ b/CoType/README.md @@ -0,0 +1,19 @@ +### Example Usage + +KBP + +``` +CoType/retype-rm -data KBP -mode m -size 50 -negative 3 -threads 3 -alpha 0.0001 -samples 1 -iters 2000 -lr 0.001 +python2 CoType/Evaluation/emb_dev_n_test.py extract KBP retypeRm cosine 0.0 +``` +NYT +``` +CoType/retype-rm -data NYT -mode m -size 50 -negative 3 -threads 3 -alpha 0.0001 -samples 1 -iters 1000 -lr 0.01 +python2 CoType/Evaluation/emb_dev_n_test.py extract NYT retypeRm cosine 0.0 +``` + +TACRED +``` +CoType/retype-rm -data TACRED -mode m -size 50 -negative 3 -threads 3 -alpha 0.0001 -samples 1 -iters 1000 -lr 0.01 +python2 CoType/Evaluation/emb_dev_n_test.py extract TACRED retypeRm cosine 0.0 +``` \ No newline at end of file diff --git a/LogisticRegression/README.md b/LogisticRegression/README.md new file mode 100644 index 0000000..469aecd --- /dev/null +++ b/LogisticRegression/README.md @@ -0,0 +1,21 @@ +### Example Usage + +First, move to the model directory with `cd LogisticRegression` + +KBP (Using default args) +``` +python2 train.py +python2 test.py +``` + +NYT +``` +python2 train.py --save_filename result_nyt.pkl --data_dir ../data/intermediate/NYT/rm +python2 test.py --save_filename result_nyt.pkl --data_dir ../data/intermediate/NYT/rm +``` + +TACRED +``` +python2 train.py --save_filename result_tacred.pkl --data_dir ../data/intermediate/TACRED/rm +python2 test.py --save_filename result_tacred.pkl --data_dir ../data/intermediate/TACRED/rm +``` \ No newline at end of file diff --git a/Neural/README.md b/Neural/README.md new file mode 100644 index 0000000..2072523 --- /dev/null +++ b/Neural/README.md @@ -0,0 +1,32 @@ +### Arguments + +You can select dataset, set hyperparameters, choose the way to handle bias term by passing arguments. For simplicity, we're only listing some important arguments here. Check the usage of all available arguments with `python Neural/train.py -h` and `python Neural/test.py -h` + +``` +train.py +--data_dir DATA_DIR specify dataset with directory. +--model MODEL model name, (cnn|pcnn|bgru|lstm). +--fix_bias Train model with fix bias (not fixed by default). +--repeat REPEAT train the model for multiple times. +--info INFO description, also used as filename to save model. +``` +``` +test.py +--info INFO description, also used as filename to save model. +--repeat REPEAT test the model for multiple trains. +--thres_ratio THRES_RATIO + proportion of data to tune thres. +--bias_ratio BIAS_RATIO + proportion of data to estimate bias. +--cvnum CVNUM # samples to tune thres or estimate bias +--fix_bias test model with fix bias (not fixed by default). +``` + +### Example Usage + +KBP (Using default args) +``` +python Neural/train.py --repeat 1 +python Neural/eva.py --repeat 1 +``` + diff --git a/NeuralATT/README.md b/NeuralATT/README.md new file mode 100644 index 0000000..ab7ed9d --- /dev/null +++ b/NeuralATT/README.md @@ -0,0 +1,7 @@ +### Example Usage + +KBP (Using default args) +``` +python Neural/train.py --repeat 1 +python Neural/eva.py --repeat 1 +``` \ No newline at end of file diff --git a/README.md b/README.md index aece04c..44dc99d 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,15 @@ Code for EMNLP 2019 paper "Looking Beyond Label Noise: Shifted Label Distribution Matters in Distantly Supervised Relation Extraction" [[Link]](https://arxiv.org/abs/1904.09331) -### Environment Setup +_Todo: Briefly introduce our findings here._ + +### Content +- [Environment Setup](#environment-setup) +- [Download and Pre-processing](#download-and-pre-processing) +- [Models](#models) + +### Environment Setup +We set up our environment in Anaconda (version: 5.2.0, build: py36_3) with the following commands. ``` conda create --name shifted conda activate shifted @@ -16,72 +24,24 @@ source deactivate ### Download and Pre-processing -Please check data download and pre-processing instructions in `/data`. Also, check README in `data/neural/vocab` to download our processed word embeddings and word2id file. - - -### Feature-based Models - -For feature-based models, run `conda activate shifted` first to activate the environment. - -#### 1. ReHession - -KBP (hyper-params are using the default settings) -``` -python ReHession/run.py --seed 1 -python ReHession/eva.py --seed 1 -``` - - -NYT -``` -python ReHession/run.py --dataset NYT --info NYT-default --input_dropout 0.5 --output_dropout 0.0 --seed 1 -``` - -Note: +Please check data download and pre-processing instructions in each data directory in `./data`. Also, check [this](data/neural/vocab/README.md) to download our processed word embeddings and word2id file. -By default, run.py trains the default model. Set "--bias fix" to use "Fix Bias" as said in the paper. -By default, eva.py evaluates the performance (1) without threshold, (2) with max threshold, (3) with entropy threshold. Set "--bias set" to enable "Set Bias" during evaluation. If you train a model with "--bias fix", you should pass the same flag to eva.py. -#### 2. CoType +### Models -KBP +Click on the model name to see the instructions on how to run each model. -``` -CoType/retype-rm -data KBP -mode m -size 50 -negative 3 -threads 3 -alpha 0.0001 -samples 1 -iters 2000 -lr 0.001 -python2 CoType/Evaluation/emb_dev_n_test.py extract KBP retypeRm cosine 0.0 -``` -NYT -``` -CoType/retype-rm -data NYT -mode m -size 50 -negative 3 -threads 3 -alpha 0.0001 -samples 1 -iters 1000 -lr 0.01 -python2 CoType/Evaluation/emb_dev_n_test.py extract NYT retypeRm cosine 0.0 -``` +#### Feature-based Models -#### 3. Logistic Regression +Run `conda activate shifted` first to activate the environment for feature-based models. -First, move to the model directory with `cd LogisticRegression` +1. [ReHession](ReHession/README.md) +2. [CoType](CoType/README.md) +3. [Logistic Regression](LogisticRegression/README.md) -KBP (data_dir is using default) -``` -python2 train.py -python2 test.py -``` +#### Neural Models -NYT -``` -python2 train.py --save_filename result_nyt.pkl --data_dir ../data/intermediate/NYT/rm -python2 test.py --save_filename result_nyt.pkl -``` - -### Neural Models - -First activate the environment with `source activate shifted-neural` - -#### 1. Bi-GRU / Bi-LSTM / PCNN / CNN - -KBP (data_dir is using default) -``` -python Neural/train.py --repeat 1 -python Neural/eva.py --repeat 1 -``` +Run `conda activate shifted-neural` first to activate the environment for neural models. -You may specify the dataset, save directory, hyperparams (dropout, lr, lr_decay, etc.) by passing arguments. Try `python Neural/train.py -h` to check the usage of each argument. +1. [Bi-GRU / Bi-LSTM / PCNN / CNN](neural/README.md) +2. [Bi-GRU + ATT / PCNN + ATT](neuralATT/README.md) \ No newline at end of file diff --git a/ReHession/README.md b/ReHession/README.md new file mode 100644 index 0000000..8e1e5ab --- /dev/null +++ b/ReHession/README.md @@ -0,0 +1,44 @@ +### Arguments +You can select dataset, set hyperparameters, choose the way to handle bias term by passing arguments. For simplicity, we're only listing some important arguments here. Check the usage of all available arguments with `python ReHession/run.py -h` and `python ReHession/eva.py -h` + +``` +run.py +--dataset DATASET name of the dataset, (KBP|NYT|TACRED). +--bias BIAS ways to handle bias term, (default|fix). +--info INFO description, also used as filename to save model. +``` + +``` +eva.py +--dataset DATASET name of the dataset, (KBP|NYT|TACRED). +--bias BIAS ways to handle bias term, (default|fix|set) +--info INFO description, also used as filename to load model. +--thres_ratio THRES_RATIO + proportion of data to tune thres. +--bias_ratio BIAS_RATIO + proportion of data to estimate bias. +``` + +By default, `eva.py` evaluates the performance (1) without threshold, (2) with max threshold, (3) with entropy threshold. Set `--bias set` to enable "Set Bias" during evaluation. If you train a model with `--bias fix`, you should pass the same flag to eva.py. + + +### Example Usage +KBP (Using default args) +``` +python ReHession/run.py --seed 1 +python ReHession/eva.py --seed 1 +``` + + +NYT +``` +python ReHession/run.py --dataset NYT --info NYT-default --input_dropout 0.5 --output_dropout 0.0 --seed 1 +python ReHession/eva.py --info NYT-default +``` + + +TACRED +``` +python ReHession/run.py --dataset TACRED --info TACRED-default --input_dropout 0.2 --output_dropout 0.1 --seed 2 +python ReHession/eva.py --info TACRED-default +``` \ No newline at end of file