Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
How to use pretrained word vectors #23
You can read more here: http://opennmt.net//Guide/#pre-trained-embeddings
I would start with looking at the code here. https://github.com/zhexxian/From-Machine-Learning-To-Zero-Day-Exploits/blob/5e619235f2248ddadde26849ba4acf2fda01f925/util/GloVeEmbedding.lua . In https://github.com/OpenNMT/OpenNMT/blob/master/onmt/modules/WordEmbedding.lua, they basically load a torch .t7 file and assign the index and weights to nn.nn.LookupTable:weights. One thing that is important is to make sure is the the the words in your src and target files are found in whatever embeddings you use. Also, memory will be helped if you can parse the large glove files to only represent what is in your training data.
Probably the biggest reason this was not incorporated by default into the codebase to this point is that this is meant for translation (used bi-directionally) and most of the pre-trained embeddings I am familiar with are English.
Thanks @jroakes this is a very nice explanation. A couple further points.
If either of you wanted to submit a pull request for (2), we would be happy to give credit and include it in the tools/ directory.