Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Word2Bits benchmark #1991

Closed
menshikh-iv opened this issue Mar 21, 2018 · 3 comments
Closed

Word2Bits benchmark #1991

menshikh-iv opened this issue Mar 21, 2018 · 3 comments
Labels
difficulty medium Medium issue: required good gensim understanding & python skills performance Issue related to performance (in HW meaning) testing Issue related with testing (code, documentation, etc)

Comments

@menshikh-iv
Copy link
Contributor

menshikh-iv commented Mar 21, 2018

Description

Pretty interesting paper Word2Bits - Quantized Word Vectors by Maximilian Lam, looks like it possible to apply "quantization" to the current w2v algorithm and receive a memory-compact representation without sacrificing quality.

ToDo

  1. Make needed changes in current w2v code (according to this article), only for testing
  2. Compare this approach by embedding quality (+memory consumption) with current w2v implementation (reproduce evaluation from paper)
    • Train corpus: English wikipedia
    • Benchmark:
      • accuracy method (classical approach)
      • SQuAD task (more detailed described in the paper)

If benchmark shows good-enough results, this will be a part of Gensim.

@menshikh-iv menshikh-iv added testing Issue related with testing (code, documentation, etc) difficulty medium Medium issue: required good gensim understanding & python skills performance Issue related to performance (in HW meaning) labels Mar 21, 2018
@menshikh-iv
Copy link
Contributor Author

Looks like very good task for you @persiyanov :)

@persiyanov
Copy link
Contributor

I'm posting benchmark results in related pull request #2011

@menshikh-iv
Copy link
Contributor Author

Fixed by #2011 (benchmark)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
difficulty medium Medium issue: required good gensim understanding & python skills performance Issue related to performance (in HW meaning) testing Issue related with testing (code, documentation, etc)
Projects
None yet
Development

No branches or pull requests

2 participants