Python implementation of Word2Vec
Switch branches/tags
Nothing to show
Clone or download
jeremy9959 and cbellei modifications to fix problems with latest numpy version (#3)
* modifications to fix problems with latest numpy version
* changed test
Latest commit 336786a Jul 11, 2018

README.rst

Word2VecLite

Word2VecLite is a Python implementation of Word2Vec that makes it easy to understand how Word2Vec works. This package is intended to be used in conjunction with this blog post.

Installation

  • In your target folder, clone the repository with the command:

    git clone https://github.com/cbellei/word2veclite.git
  • Then, inside the same folder (as always, it is advisable to use a virtual environment):

    pip install .

NOTE: if you get a ModuleNotFoundError: No module named 'setuptools.wheel' error, you may need to update pip and setuptools via the command pip install -U pip setuptools

  • Make sure you install the correct dependencies by typing pip install -r requirements.txt

NOTE: this should work on both a conda environment and a standard virtual environment

  • To check that the package has been installed, in the Python shell type:

    import word2veclite
  • If everything works correctly, the package will be imported without errors.

Dependencies

  • Word2VecLite is tested on Python 3.6 and depends on NumPy, Keras (see requirements.txt for version

information). The unit tests in test/test_word2veclite.py depend on Tensorflow.

How to use Word2VecLite

Input. You need to: 1. Define a corpus of text (the vocabulary will be built from it) 2. Define which method you want to use: cbow or skipgram 3. Decide how many nodes should make the hidden layer 4. Define the size of the context window around the center word 5. Define learning rate 6. Decide how many epochs you want to train the neural network for

Output. Word2VecLite outputs the embeddings W1 and W2 of the neural network and the history of the loss vs. epochs.

Example

  • In IPython, type:

    from word2veclite import Word2Vec
    
    corpus = "I like playing football with my friends"
    cbow = Word2Vec(method="cbow", corpus=corpus,
                    window_size=1, n_hidden=2,
                    n_epochs=10, learning_rate=0.8)
    W1, W2, loss_vs_epoch = cbow.run()
    
    print(W1)
    #[[ 0.99870389  0.20697257]
    # [-1.01911559  2.26364436]
    # [-0.69737232  0.14131477]
    # [ 3.28315183  1.13801973]
    # [-1.42944927 -0.62142097]
    # [ 0.65359329 -2.21415048]
    # [-0.22343751 -1.17927987]]
    
    print(W2)
    #[[-0.97080793  1.21120331  2.15603796 -1.79083151  3.38445043 -1.65295511
    #   1.36685097]
    # [2.77323464  0.78710269  2.74152617  0.08953005  0.04400675 -1.34149651
    #   -2.19375528]]
    
    print(loss_vs_epoch)
    #[14.328868654443703, 12.290456644464603, 10.366644621637064,
    # 9.1759777684446622, 8.4233626997233895, 7.3952948684910256,
    # 6.1727393307549736, 5.1639476117698191, 4.6333377088153043,
    # 4.2944697259465485]

Licence

Apache License, Version 2.0