streaming embeddings with the space-saving algorithm
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
src
test
.dockerignore
.gitignore
.travis.yml
Dockerfile
LICENSE
Makefile
README.md

README.md

athena

Build Status

Athena is a library and collection of programs implementing streaming embeddings with the space-saving algorithm. Code released for the "Streaming Word Embeddings with the Space-Saving Algorithm" (arXiv:1704.07463) manuscript can be found in the arxiv-1704-07463v1 tag.

Athena comprises a C++ library and a few programs. The source code is located in src/.

Usage

To build the library by itself run make lib. To build the programs run make main. The compiled library and programs will be placed in build/lib/.

Testing

The test suite requires Google Test and Google Mock. On Linux or OS X, install cmake (e.g., with sudo yum install cmake or sudo apt-get install cmake) and then run the following bash shell code to install Google Test and Google Mock:

git clone https://github.com/google/googletest.git && \
    mkdir gtest-build && \
    pushd gtest-build && \
    cmake ../googletest/googletest && \
    make && \
    sudo mv libgtest.a libgtest_main.a /usr/local/lib/ && \
    sudo mv ../googletest/googletest/include/gtest \
        /usr/local/include/ && \
    popd && \
    mkdir gmock-build && \
    pushd gmock-build && \
    cmake ../googletest/googlemock && \
    make && \
    sudo mv libgmock.a libgmock_main.a /usr/local/lib/ && \
    sudo mv ../googletest/googlemock/include/gmock \
        /usr/local/include/ && \
    popd && \
    rm -rf googletest gtest-build gmock-build

Now to run the tests do: make test

References

  1. Walker (1977)
  2. Wikipedia alias method article
  3. Vitter (1985)
  4. Metwally, Agrawal, and El Abaddi (2005)
  5. Cormode (2009) slides
  6. Knoll space saving implementation article
  7. D'Elia (2013) (Section 3)
  8. Mikolov, Sutskever, Chen, Corrado, and Dean (2013)
  9. Levy, Goldberg, and Dagan (2015)
  10. Mikolov, Sutskever, Chen, Corrado, and Dean (2013) word2vec implementation
  11. Řehůřek gensim word2vec implementation article (part 1)
  12. Řehůřek gensim word2vec implementation article (part 2)
  13. Řehůřek gensim word2vec implementation article (part 3)

License

Copyright 2012-2017 Johns Hopkins University HLTCOE. All rights reserved. This software is released under the 2-clause BSD license. See LICENSE in this directory for more information.