Skip to content

a-paxton/Gensim-LSI-Word-Similarities

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 

Gensim-LSI-Word-Similarities

Two simple little functions to create word-word similarities from Gensim's latent semantic indexing in Python. Both functions produce an inverted cosine similarity score (0 = low, 1 = high) between two words in a Gensim-generated LSA/LSI space across the total number of dimensions specified in the creation of the model (i.e., num_topics from gensim.models.LsiModel).

Both require Gensim, Pandas, and SciPy.

Includes four functions:

  • wordsim: Create cosine-derived similarity score (from 0-1) between individual words. Input:
    • word1 (string or string variable)
    • word2 (string or string variable)
    • target_dictionary (Gensim-created LSI dictionary)
    • target_lsi_model (Gensim-created LSI model)

  • wordvectsim: Same as wordvect but created to calculate similarity scores (from 0-1) for word pairs in a 2-dimensional word vector (e.g., using numpy.apply_along_axis). Input:
    • word_vector2d (2D string vector or 2D string vector variable)
    • target_dictionary (Gensim-created LSI dictionary)
    • target_lsi_model (Gensim-created LSI model)

  • Two additional functions/series of functions added (detailed documentation available in each function and will be added here soon):
    • word2vec_vect_sim_fun: similarity score function for gensim's word2vec
    • word_pair_similarity_matrix: word-word similarity matrix function for gensim's LSI (LSA) model

About

Generate word-word similarities from Gensim's latent semantic indexing (Python)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages