Word2Vec + Principal Component Analysis + Clustering for low-dimensional semantic representation of a set of words or compositional MWEs.
Make sure you have at least 10GB of RAM available before running the script
python and the following packages :
nltk (+ english stopwords dictionnary).
As well as the pre-trained word2vec model on Google News (heavy, decompress it in the same folder) : https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit