Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'master' of https://github.com/HendrikStrobelt/S2Splay
- Loading branch information
Showing
8 changed files
with
151 additions
and
18 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,12 +1,123 @@ | ||
# S2Splay | ||
# Seq2Seq-Vis | ||
|
||
### A visual debugging tool for Sequence-to-Sequence models | ||
*by IBM Research AI and Harvard SEAS -- more info [seq2seq-vis.io](http://seq2seq-vis.io) | ||
|
||
![Seq2Seq-Vis](docs/pics/s2s_teaser.png) | ||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
## Install with `conda` | ||
|
||
We require using [miniconda](https://conda.io/docs/user-guide/install/index.html) to create a virtual environment and install all dependencies via scripts. | ||
Seq2Seq-Vis currently works with a special version of OpenNMT-py modified version by [Sebastian Gehrmann](https://github.com/sebastianGehrmann/OpenNMT-py/tree/states_in_translation). We provide a script to install this special branch. | ||
|
||
### 1 - Install dependencies (server and client) and create virtual environment | ||
|
||
|
||
|
||
```bash | ||
git clone https://github.com/HendrikStrobelt/Seq2Seq-Vis.git | ||
cd Seq2Seq-Vis | ||
``` | ||
|
||
and run in `/Seq2Seq-Vis`: | ||
|
||
1) Install | ||
```bash | ||
git clone git@github.com:HendrikStrobelt/S2Splay.git | ||
cd S2Splay | ||
./setup.sh #runs pip (server-side) and npm (client-side) | ||
source setup_cpu.sh | ||
``` | ||
|
||
- Start Vis Server: `S2SPlay:> python server.py --port 8080` | ||
### 2 - Install custom OpenNMT-py version | ||
|
||
```bash | ||
cd .. | ||
source Seq2Seq-Vis/setup_onmt_custom.sh | ||
``` | ||
|
||
### 3 - Download some example data | ||
Here we provide some example data for a character based dataset which converts date strings (e.g. "March 03, 1999" , "03/03/99") into a base form "mm-dd-yyyy". [Download here ~130MB]() and unzip it in `/Seq2Seq-Vis` | ||
|
||
```bash | ||
unzip fakedate.zip | ||
``` | ||
|
||
## Run the system | ||
|
||
```bash | ||
python3 server.py --dir 0316-fakedates/ | ||
``` | ||
go here: [http://localhost:8080/client/index.html?in=M a r c h _ 0 3 , 1 9 9 9](http://localhost:8080/client/index.html?in=M%20a%20r%20c%20h%20_%200%203%20,%20%201%209%209%209) | ||
|
||
You should see: | ||
|
||
<img src="docs/pics/s2s_dates_01.png" width="400"> | ||
|
||
Enjoy exploring ! | ||
|
||
|
||
|
||
|
||
|
||
## Run own models | ||
|
||
### 1 - Prepare your data | ||
to be done. | ||
|
||
### 2 - Create a `s2s.yaml` file to describe project | ||
|
||
```yaml | ||
# -- minimal config | ||
model: date_acc_100.00_ppl_1.00_e7.pt # model file | ||
dicts: | ||
src: src.dict # source dictionary file | ||
tgt: tgt.dict # target dictionary file | ||
embeddings: embs.h5 # word embeddings for src and tgt | ||
train: train.h5 # training data | ||
|
||
# -- OPTIONAL: FAISS indices for Neighborhoods | ||
indexType: faiss # index type should be 'faiss' (or 'annoy') | ||
indices: | ||
decoder: decoder.faiss # index for decoder states | ||
encoder: encoder.faiss # index for encoder states | ||
|
||
# -- OPTIONAL: model for linear projection | ||
project_model: linear_projection.pkl # pickl-ed scikit-learn model | ||
``` | ||
|
||
### 3 - Command Line Parameters | ||
|
||
``` | ||
usage: server.py [-h] [--nodebug NODEBUG] [--port PORT] | ||
[-dir DIR] | ||
optional arguments: | ||
--nodebug TRUE if not in debug mode | ||
--port port to run system (default: 8080) | ||
--dir directory with s2s.yaml file | ||
``` | ||
|
||
# Cite us | ||
|
||
``` | ||
BIBTEX to arxive | ||
``` | ||
|
||
# Contributors | ||
|
||
- Hendrik Strobelt (IBM Research & MIT-IBM Watson AI Lab) | ||
- Sebastian Gehrmann (Harvard NLP) | ||
- Alexander M. Rush (Harvard NLP) | ||
|
||
- Michael Behrisch (Harvard VCG), Adam Perer (IBM Research), Hanspeter Pfister (Harvard VCG) | ||
|
||
# License | ||
|
||
- Enjoy-- localhost:8080/client/index.html | ||
Seq2Seq-Vis is licensed under Apache 2 license. |
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,7 @@ | ||
import numpy as np | ||
import sys | ||
|
||
sys.path.append('faiss') | ||
# sys.path.append('faiss') | ||
import faiss | ||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
#!/usr/bin/env bash | ||
|
||
# Install all essential packages | ||
conda create --yes --name s2sv python=3.6 h5py numpy scikit-learn flask | ||
conda install --name s2sv --yes -c conda-forge connexion nodejs python-annoy | ||
conda install --name s2sv --yes -c pytorch pytorch torchvision faiss-cpu | ||
source activate s2sv | ||
|
||
|
||
cd client | ||
npm install | ||
npm run wp | ||
cd .. | ||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
#!/usr/bin/env bash | ||
|
||
# just to be sure :) | ||
source activate s2sv | ||
|
||
# clone modified opennmt repo which exposes internals to Seq2Seq-Vis | ||
git clone https://github.com/sebastianGehrmann/OpenNMT-py.git | ||
cd OpenNMT-py/ | ||
git checkout states_in_translation | ||
python setup.py install | ||
pip install torchtext | ||
cd .. |