Globally Normalized Reader

This repository contains the code used in the following paper:

Jonathan Raiman and John Miller. Globally Normalized Reader. Empirical Methods in Natural Language Processing (EMNLP), 2017.

If you use the dataset/code in your research, please cite the above paper:

@inproceedings{raiman2015gnr,
    author={Raiman, Jonathan and Miller, John},
    booktitle={Empirical Methods in Natural Language Processing (EMNLP)},
    title={Globally Normalized Reader},
    year={2017},
}

Note: This repository is a reimplementation of the original used for the above paper. The original used a batch size of 32 and synchronous-SGD across multiple GPUs. However, this code currently only runs on a single GPU and will split a batch that runs out of memory into several smaller batches. For this reason, the code does not exactly reproduce the results in that paper (but it should be <2% off). Work is underway to rectify this issue.

Usage (TensorFlow)

Prerequisites

You must have installed and available the following libraries:

CUDA 8.0.61 or higher, with appropriate drivers installed for the available GPUs.
CuDNN v6.0 or higher.

Make sure you know where the aforementioned libraries are located on your system; you will need to adjust the paths you use to point to them.

Set-Up

Set up your environment variables
```
# Copy this into ~/.zshrc or ~/.bashrc for regular use.
source env.sh
```
If you are not running on the SVAIL cluster, you will need to change these variables.
Create your virtual environment:
```
python3.6 -m venv env
```
Python 3.6 must be on your command-line PATH, which is set up automatically by env.sh above.

Activate your virtual environment:

# You will need to do this every time you use the GNR
source env/bin/activate

Install numpy, separately from the other packages
```
pip install numpy
```
Install all dependencies from requirements.txt
```
pip install -r requirements.txt
```

Data

Before training the Globally Normalized Reader, you need to download and featurize the dataset.

Download all the necessary data:

cd data && ./download.sh && cd ..
GLOVE_PATH=data/glove.txt
wget http://nlp.stanford.edu/data/glove.840B.300d.zip -O $GLOVE_PATH

Featurize all of the data:

python featurize.py --datadir data --outdir featurized  --glove-path $GLOVE_PATH

Training

Create a new model:

python main.py create --name default --vocab-path featurized/

Train the model:

python main.py train --name default --data featurized/

Evaluation

Evaluate the model:

python main.py predict --name default --data data/dev.json --vocab-path featurized/ --output predictions.txt

Usage (PaddlePaddle)

Install the latest GPU-compatible PaddlePaddle Docker image, as directed on the PaddlePaddle website.
To print the model configuration as text, use paddle_model.py.
To train the model, use paddle_train.py.
To run inference the model, use paddle_infer.py.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
constants.py		constants.py
env.sh		env.sh
evaluate.py		evaluate.py
featurize.py		featurize.py
framework.py		framework.py
gnr.py		gnr.py
main.py		main.py
ops.py		ops.py
paddle-config.json		paddle-config.json
paddle_infer.py		paddle_infer.py
paddle_model.py		paddle_model.py
paddle_train.py		paddle_train.py
requirements.txt		requirements.txt
vocab.py		vocab.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Globally Normalized Reader

Usage (TensorFlow)

Prerequisites

Set-Up

Data

Training

Evaluation

Usage (PaddlePaddle)

About

Releases

Packages

Contributors 3

Languages

License

baidu-research/GloballyNormalizedReader

Folders and files

Latest commit

History

Repository files navigation

Globally Normalized Reader

Usage (TensorFlow)

Prerequisites

Set-Up

Data

Training

Evaluation

Usage (PaddlePaddle)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages