New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New tool: embeddings_to_torch #398

Merged
merged 2 commits into from Dec 5, 2017

Conversation

Projects
None yet
2 participants
@pltrdy
Contributor

pltrdy commented Nov 29, 2017

Easily plug GloVe pre-trained embedding into OpenNMT-py.
the script is a slightly modified version of ylhsieh's one.

Usage:

usage: embeddings_to_torch.py [-h] -emb_file EMB_FILE -output_file OUTPUT_FILE
                              -dict_file DICT_FILE [-verbose]
  • emb_file: GloVe like embedding file i.e. CSV [word] [dim1] ... [dim_d]
  • output_file: a filename to save the output as PyTorch serialized tensors
  • dict_file: dict output from OpenNMT-py preprocessing

Example

0) set some variables:

export data="../onmt_merge/sorted_tokens/"
export root="./glove_experiment"
export glove_dir="./glove"

1) get GloVe files:

mkdir "$glove_dir"
wget http://nlp.stanford.edu/data/glove.6B.zip
unzip glove.6B.zip -d "$glove_dir"

2) prepare data:

  mkdir -p $root
  python preprocess.py \
      -train_src $data/train.src.txt \
      -train_tgt $data/train.tgt.txt \
      -valid_src $data/valid.src.txt \
      -valid_tgt $data/valid.tgt.txt \
      -save_data $root/data

3) prepare embeddings:

  ./tools/embeddings_to_torch.py -emb_file "$glove_dir/glove.6B.100d.txt" \
                                 -dict_file "$root/data.vocab.pt" \
                                 -output_file "$root/embeddings" 

4) train using pre-trained embeddings:

  python train.py -save_model $root/model \
        -batch_size 64 \
        -layers 2 \ 
        -rnn_size 200 \
        -word_vec_size 100 \
        -pre_word_vecs_enc "$root/embeddings.enc.pt" \
        -pre_word_vecs_dec "$root/embeddings.dec.pt" \
        -data $root/data
@srush

This comment has been minimized.

Show comment
Hide comment
@srush

srush Dec 5, 2017

Contributor

Perfect. Can you send a PR adding your instructions to the README as well?

Contributor

srush commented Dec 5, 2017

Perfect. Can you send a PR adding your instructions to the README as well?

@srush srush merged commit 3bc2d7a into OpenNMT:master Dec 5, 2017

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

pltrdy added a commit to pltrdy/OpenNMT-py that referenced this pull request Dec 6, 2017

README: mention pretrained embeddings tutorial
as suggested in the PR [OpenNMT#398](OpenNMT#398) introducing `embeddings_to_torch.py`

da03 pushed a commit to da03/OpenNMT-py that referenced this pull request Dec 12, 2017

da03 pushed a commit to da03/OpenNMT-py that referenced this pull request Dec 12, 2017

README: mention pretrained embeddings tutorial
as suggested in the PR [OpenNMT#398](OpenNMT#398) introducing `embeddings_to_torch.py`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment