Skip to content

Commit

Permalink
updated bin/utils
Browse files Browse the repository at this point in the history
  • Loading branch information
ZhitingHu committed Sep 21, 2018
1 parent 8908a67 commit 7039403
Show file tree
Hide file tree
Showing 2 changed files with 13 additions and 6 deletions.
17 changes: 12 additions & 5 deletions bin/utils/README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,16 @@
### *[Byte Pair Encoding](https://arxiv.org/abs/1508.07909)* (BPE) pipeline

This directory contains several utilities for, e.g., data pre-processing.

Instructions of using BPE and WPM encoding are as follows.
See [examples/transformer](https://github.com/asyml/texar/tree/master/examples/transformer)
for a real example of using these encoding.

### *[Byte Pair Encoding (BPE)](https://arxiv.org/abs/1508.07909)* pipeline

* Add `bin` directory to `PATH` env variable
```bash
TXTGEN=$(pwd)
export PATH=$PATH:$TXTGEN/bin
TEXAR=$(pwd)
export PATH=$PATH:$TEXAR/bin
```

* Learning BPE vocab on source and target combined
Expand All @@ -26,12 +33,12 @@ mv test.out test.out.bpe
cat test.out.bpe | sed -E 's/(@@ )|(@@ ?$)//g' > test.out
```

##### Evaluate Using t2t-Bleu
##### Evaluate Using t2t-bleu
```bash
t2t-bleu --translation=test.out --reference=test.tgt
```

### Word Piece Model (WPM)
### Word Piece Model (WPM) pipeline

* This requires installation of *[sentencepiece](https://github.com/google/sentencepiece#python-module) library
```bash
Expand Down
2 changes: 1 addition & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -255,7 +255,7 @@
# author, documentclass [howto, manual, or own class]).
latex_documents = [
(master_doc, 'texar.tex', u'Texar Documentation',
u'TxtGen', 'manual'),
u'Texar', 'manual'),
]

# The name of an image file (relative to this directory) to place at the top of
Expand Down

0 comments on commit 7039403

Please sign in to comment.