Skip to content

wmd4j is a Java library for calculating Word Mover's Distance (WMD)

License

Notifications You must be signed in to change notification settings

crtomirmajer/wmd4j

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

wmd4j

wmd4j is a Java library for computing Word Mover's Distance (WMD) between 2 text documents. It provides same functionality as Word2Vec.wmdistance in Gensim.

wmd4j depends on deeplearning4j WordVectors interface for word vectors manipulation and uses optimized version of JFastEMD (Earth Mover's Distance transportaion problem) underneath, which is about 1.8x faster.

Usage

WordVectors vectors = WordVectorSerializer.loadGoogleModel(new File(word2vecPath), false);
WordMovers wm = WordMovers.Builder().wordVectors(vectors).build();

wm.distance("obama speaks to the media in illinois", "the president greets the press in chicago");

Validation

wmd4j is validated against Gensim's wmdistance results on custom word2vec model.

About

wmd4j is a Java library for calculating Word Mover's Distance (WMD)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published