Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to how pre-trained word embeddings are handled. #15

Closed
1 task done
JohnGiorgi opened this issue Aug 10, 2018 · 0 comments
Closed
1 task done

Improvements to how pre-trained word embeddings are handled. #15

JohnGiorgi opened this issue Aug 10, 2018 · 0 comments
Assignees
Labels
enhancement New feature or request invalid This doesn't seem right

Comments

@JohnGiorgi
Copy link
Contributor

JohnGiorgi commented Aug 10, 2018

A couple improvement to how pre-trained embeddings are handled is required.

1. Out-source loading of vectors

Currently, the code to load pre-trained embeddings was written by me. That means its likely fragile and slow. See if I can load embeddings using Gensim which is likely to be faster and more reliable.

  • Use Gensim to load word embeddings

Note, this might actually solve the problem below.

2. Handle binary or plain text format

Currently, pre-trained embeddings in binary format (.bin) must be manually converted to a plaint text format (.txt) to be used with saber. This is an unnecessary additional step imposed on the user. Automatically detect if the embeddings are in binary or plain text format, and convert from binary to plain text automatically if necessary. To fix:

- [ ] Determine if embeddings are binary or plain text
- [ ] Use Gensim to convert from binary to plain text if necessary

@JohnGiorgi JohnGiorgi added enhancement New feature or request invalid This doesn't seem right labels Aug 10, 2018
@JohnGiorgi JohnGiorgi self-assigned this Aug 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request invalid This doesn't seem right
Projects
None yet
Development

No branches or pull requests

1 participant