Skip to content
Permalink
Browse files

Update README to use Castor-models and Instructions for Internal Users (

#113)

* Update instructions to use Castor-models
* Consolidate requirements.txt
* Refine README with convenience scripts
* Update internal instructions
* MP-CNN working dir minor edit
  • Loading branch information...
tuzhucheng authored and lintool committed May 25, 2018
1 parent 62f8abe commit 5bf33bf8ea4a163ee54799412209ae21c68c3c79
Showing with 89 additions and 16 deletions.
  1. +33 −6 README.md
  2. +49 −0 docs/internal-instructions.md
  3. +0 −2 idf_baseline/requirements.txt
  4. +1 −0 mp_cnn/README.md
  5. +6 −4 requirements.txt
  6. +0 −4 sm_cnn/requirements.txt
@@ -17,39 +17,66 @@ For paraphrase detection, question answering, etc.

+ [SM-CNN](./sm_cnn/): Siamese CNN for ranking texts [(Severyn and Moschitti, SIGIR 2015)](https://dl.acm.org/citation.cfm?id=2767738)
+ [MP-CNN](./mp_cnn/): Multi-Perspective CNN [(He et al., EMNLP 2015)](http://anthology.aclweb.org/D/D15/D15-1181.pdf)
+ [NCE](./nce/): Noise-Contrastive Estimation for answer selection applied on SM-CNN and MP-CNN
+ [NCE](./nce/): Noise-Contrastive Estimation for answer selection applied on SM-CNN and MP-CNN [(Rao et al., CIKM 2016)](https://dl.acm.org/citation.cfm?id=2983872)
+ [IDF Baseline](./idf_baseline/): IDF overlap between question and candidate answers

Each model directory has a `README.md` with further details.

## Setting up PyTorch

Copy and run the command at https://pytorch.org/ for your environment. PyTorch recommends the Anaconda environment, which we use in our lab.
**If you are an internal Castor contributor and is planning to use the Data System Group's GPU machines in the lab,
please follow the instructions [here](./docs/internal-instructions.md) instead.**

Copy and run the command at [https://pytorch.org/](https://pytorch.org/) for your environment.
PyTorch recommends the Anaconda environment, which we use in our lab. We are currently targeting PyTorch 0.4 for our codebase.

The typical installation command is

```bash
conda install pytorch torchvision -c pytorch
```

Other Python packages we use can be installed via pip:

```bash
pip install -r requirements.txt
```

Please also run the following inside the `utils` directory to build the `trec_eval` tool for evaluating certain datasets.

```bash
./get_trec_eval.sh
```

## Data and Pre-Trained Models

**If you are an internal Castor contributor and is planning to use the Data System Group's GPU machines in the lab,
please follow the instructions [here](./docs/internal-instructions.md) instead.**

Data associated for use with this repository can be found at: https://git.uwaterloo.ca/jimmylin/Castor-data.git.

Pre-trained models can be found at: https://github.com/castorini/models.git.
Pre-trained models can be found at: https://git.uwaterloo.ca/jimmylin/Castor-models.

Your directory structure should look like
```
.
├── Castor
├── Castor-data
└── models
└── Castor-models
```

For example (if you use HTTPS instead of SSH):

```bash
git clone https://github.com/castorini/Castor.git
git clone https://git.uwaterloo.ca/jimmylin/Castor-data.git
git clone https://github.com/castorini/models.git
git clone https://git.uwaterloo.ca/jimmylin/Castor-models.git
```

Sourcing and pre-processing of input data for each model is described in the respective ```model/README.md```'s.
After cloning the Castor-data repo, you need to unzip embeddings and run data pre-processing scripts. You can choose
to follow instructions under each dataset / embedding directory separately, or just run the following script in Castor-data
to do all of the steps for you:

```bash
./setup.sh
```
@@ -0,0 +1,49 @@
# Instructions for DSG Castor Contributors

Please follow these instructions if you are a graduate student or undergrad research assistant working with the group
in the Data Systems Lab and want to run Castor on the lab desktop GPU machine (dragon).

If you have trouble / questions with instructions on this page, ping @tuzhucheng on Slack.

## PyTorch Environment

We already have a multi-user Conda environment with PyTorch and all other dependencies installed, so you do not need to
install anything yourself. However, you can create [Conda environments](https://conda.io/docs/user-guide/tasks/manage-environments.html)
if you need to experiment with different library versions etc.

The multi-user Conda environment is located at `/anaconda3/`.
To use this multi-user environment, just add the following to your `.bashrc` or configuration file for your favourite shell.

```bash
export PATH="/anaconda3/bin:$PATH"
export LIBRARY_PATH="/usr/lib/nvidia-375"
```

Please also ensure `/usr/local/cuda-8.0/lib64` is in the `LD_LIBRARY_PATH` environment variable **if it is not already**.
If not, you should add it in the `.bashrc` similar to above.

Please re-login or re-source your shell configuration after `.bashrc` is updated for the updated environment variables
to take effect.

## Data and Pre-Trained Models

We use shared cloned versions of the Castor-data and Castor-models repositories.
Instead of making your own cloned copies, you can just create symbolic links to the shared version instead
in your own working directory to save disk space. Assuming you want to put `Castor`, `Castor-data`, and `Castor-models`
under a directory called `castorini` and you are currently in the `castorini` directory, you can enter these commands:

```bash
ln -s /Castor-data Castor-data
ln -s /Castor-models Castor-models
```

So after you clone Castor, you have a directory structure under `castorini` that looks like this:

```
.
├── Castor
├── Castor-data
└── Castor-models
```

where `Castor-data` and `Castor-models` are actually symbolic links to `/Castor-data` and `/Castor-models`.

This file was deleted.

Oops, something went wrong.
@@ -5,6 +5,7 @@ This is a PyTorch implementation of the following paper
* Hua He, Kevin Gimpel, and Jimmy Lin. [Multi-Perspective Sentence Similarity Modeling with Convolutional Neural Networks](http://aclweb.org/anthology/D/D15/D15-1181.pdf). *Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP 2015)*, pages 1576-1586.

Please ensure you have followed instructions in the main [README](../README.md) doc before running any further commands in this doc.
The commands in this doc assume you are under the root directory of the Castor repo.

## Pre-Trained Models

@@ -1,7 +1,9 @@
Flask==0.12.1
gensim==1.0.1
numpy==1.12.1
nltk==3.2.5
numpy==1.14.0
pandas==0.19.2
Flask==0.12.1
nltk==3.2.2
pyjnius==1.1.1
-e git+https://github.com/castorini/Castor.git#egg=sm-cnn-1.0.0
scikit-learn==0.19.1
scipy==1.0.0
torchtext==0.2.3

This file was deleted.

Oops, something went wrong.

0 comments on commit 5bf33bf

Please sign in to comment.
You can’t perform that action at this time.