Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
HendrikStrobelt committed Apr 24, 2018
2 parents 3395419 + 43b6c9e commit fe5d069
Show file tree
Hide file tree
Showing 8 changed files with 151 additions and 18 deletions.
125 changes: 118 additions & 7 deletions README.md
@@ -1,12 +1,123 @@
# S2Splay
# Seq2Seq-Vis

### A visual debugging tool for Sequence-to-Sequence models
*by IBM Research AI and Harvard SEAS -- more info [seq2seq-vis.io](http://seq2seq-vis.io)

![Seq2Seq-Vis](docs/pics/s2s_teaser.png)











## Install with `conda`

We require using [miniconda](https://conda.io/docs/user-guide/install/index.html) to create a virtual environment and install all dependencies via scripts.
Seq2Seq-Vis currently works with a special version of OpenNMT-py modified version by [Sebastian Gehrmann](https://github.com/sebastianGehrmann/OpenNMT-py/tree/states_in_translation). We provide a script to install this special branch.

### 1 - Install dependencies (server and client) and create virtual environment



```bash
git clone https://github.com/HendrikStrobelt/Seq2Seq-Vis.git
cd Seq2Seq-Vis
```

and run in `/Seq2Seq-Vis`:

1) Install
```bash
git clone git@github.com:HendrikStrobelt/S2Splay.git
cd S2Splay
./setup.sh #runs pip (server-side) and npm (client-side)
source setup_cpu.sh
```

- Start Vis Server: `S2SPlay:> python server.py --port 8080`
### 2 - Install custom OpenNMT-py version

```bash
cd ..
source Seq2Seq-Vis/setup_onmt_custom.sh
```

### 3 - Download some example data
Here we provide some example data for a character based dataset which converts date strings (e.g. "March 03, 1999" , "03/03/99") into a base form "mm-dd-yyyy". [Download here ~130MB]() and unzip it in `/Seq2Seq-Vis`

```bash
unzip fakedate.zip
```

## Run the system

```bash
python3 server.py --dir 0316-fakedates/
```
go here: [http://localhost:8080/client/index.html?in=M a r c h _ 0 3 , 1 9 9 9](http://localhost:8080/client/index.html?in=M%20a%20r%20c%20h%20_%200%203%20,%20%201%209%209%209)

You should see:

<img src="docs/pics/s2s_dates_01.png" width="400">

Enjoy exploring !





## Run own models

### 1 - Prepare your data
to be done.

### 2 - Create a `s2s.yaml` file to describe project

```yaml
# -- minimal config
model: date_acc_100.00_ppl_1.00_e7.pt # model file
dicts:
src: src.dict # source dictionary file
tgt: tgt.dict # target dictionary file
embeddings: embs.h5 # word embeddings for src and tgt
train: train.h5 # training data

# -- OPTIONAL: FAISS indices for Neighborhoods
indexType: faiss # index type should be 'faiss' (or 'annoy')
indices:
decoder: decoder.faiss # index for decoder states
encoder: encoder.faiss # index for encoder states

# -- OPTIONAL: model for linear projection
project_model: linear_projection.pkl # pickl-ed scikit-learn model
```

### 3 - Command Line Parameters

```
usage: server.py [-h] [--nodebug NODEBUG] [--port PORT]
[-dir DIR]
optional arguments:
--nodebug TRUE if not in debug mode
--port port to run system (default: 8080)
--dir directory with s2s.yaml file
```

# Cite us

```
BIBTEX to arxive
```

# Contributors

- Hendrik Strobelt (IBM Research & MIT-IBM Watson AI Lab)
- Sebastian Gehrmann (Harvard NLP)
- Alexander M. Rush (Harvard NLP)

- Michael Behrisch (Harvard VCG), Adam Perer (IBM Research), Hanspeter Pfister (Harvard VCG)

# License

- Enjoy-- localhost:8080/client/index.html
Seq2Seq-Vis is licensed under Apache 2 license.
Binary file added docs/pics/s2s_dates_01.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/pics/s2s_teaser.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion index/faissVectorIndex.py
@@ -1,7 +1,7 @@
import numpy as np
import sys

sys.path.append('faiss')
# sys.path.append('faiss')
import faiss


Expand Down
8 changes: 4 additions & 4 deletions server.py
Expand Up @@ -34,10 +34,10 @@
parser.add_argument("--nodebug", default=True)
parser.add_argument("--port", default="8080")
parser.add_argument("--nocache", default=False)
parser.add_argument("-dir", type=str, default=os.path.abspath('model_api/data'))
parser.add_argument('-api', type=str, default='pytorch',
choices=['pytorch', 'lua'],
help="""The API to use.""")
parser.add_argument("--dir", type=str, default=os.path.abspath('model_api/data'))
# parser.add_argument('-api', type=str, default='pytorch',
# choices=['pytorch', 'lua'],
# help="""The API to use.""")
args = parser.parse_args()

print(args)
Expand Down
6 changes: 0 additions & 6 deletions setup.sh

This file was deleted.

16 changes: 16 additions & 0 deletions setup_cpu.sh
@@ -0,0 +1,16 @@
#!/usr/bin/env bash

# Install all essential packages
conda create --yes --name s2sv python=3.6 h5py numpy scikit-learn flask
conda install --name s2sv --yes -c conda-forge connexion nodejs python-annoy
conda install --name s2sv --yes -c pytorch pytorch torchvision faiss-cpu
source activate s2sv


cd client
npm install
npm run wp
cd ..



12 changes: 12 additions & 0 deletions setup_onmt_custom.sh
@@ -0,0 +1,12 @@
#!/usr/bin/env bash

# just to be sure :)
source activate s2sv

# clone modified opennmt repo which exposes internals to Seq2Seq-Vis
git clone https://github.com/sebastianGehrmann/OpenNMT-py.git
cd OpenNMT-py/
git checkout states_in_translation
python setup.py install
pip install torchtext
cd ..

0 comments on commit fe5d069

Please sign in to comment.