Grow your team on GitHub
GitHub is home to over 28 million developers working together. Join them to grow your own development teams, manage permissions, and collaborate on projects.Sign up
Paradigmatic approach to Childes child data.
Feature Decay Algorithm
Generate most likely substitutes for words in a given text based on an n-gram language model.
k-means algorithm with (optional) instance weights.
Unsupervised word sense disambiguation
Unsupervised multilingual part of speech induction system (2014 version)
Sphere embedding (s-code) is a variation of Euclidean embedding of co-occurence data (code).
Supporting code and data for the langvis project.
Protein dynamics research.
glookup - reads ngram patterns with wildcards from stdin and prints their counts from the Web1T Google ngram data.
Demo run of the S-CODE algorithm on a 3D-Sphere
Calculates a variety of distances between vectors.
Semeval 2013 | Task 13 WSI and WSD
Run SRILM with different options to find the best language model given the training and test data.
Word Sense Induction
Unsupervised part of speech induction.
CONNL-X Turkish data set of upos repository
CONNL-X Spanish data set of upos repository
CONNL-X Swedish data set of upos repository
CONNL-X Slovene data set of upos repository
CONNL-X German data set of upos repository
CONNL-X Portuguese data set of upos repository
CONNL-X Dutch data set of upos repository
CONNL-X Danish data set of upos repository
CONNL-X Czech data set of upos repository
Multext East Hungarian data set of upos repository