Generate most likely substitutes for words in a given text based on an n-gram language model.
k-means algorithm with (optional) instance weights.
Unsupervised word sense disambiguation
Unsupervised multilingual part of speech induction system (2014 version)
Sphere embedding (s-code) is a variation of Euclidean embedding of co-occurence data (code).
Supporting code and data for the langvis project.
Protein dynamics research.
glookup - reads ngram patterns with wildcards from stdin and prints their counts from the Web1T Google ngram data.
Paradigmatic approach to Childes child data.
Demo run of the S-CODE algorithm on a 3D-Sphere
Calculates a variety of distances between vectors.
Run SRILM with different options to find the best language model given the training and test data.
Unsupervised part of speech induction.