This repo was used to prepare the talk given by Alex Mann at Cognitect's 2016 Conj Conference. It includes a standard implementation of tSNE, examples of data rendered this way, a novel implementation of interop between Clojure and Python, a number of datasets which can be rendered into Clojure objects, and some examples of generatives testing.
I want to start by citing the sources that helped me get this far. This list is by no means exhaustive as there are many blogs and whitepapers I consumed where the information remains and the name has fled.
- Original whitepaper by Hinton and van der Maaten
- Laurens van der Maaten's tSNE resource website
- Joseph Turian's modifications/code for tSNE
I lifted datasets from the following places:
- MNIST from Turian's github repo (link above)
- 130000 Word embeddings from Collobert's SENNA site download (link above)
- Places from hiiamrohit's countries-states-cities-database github repo
- 3000 most common words were copy and pasted from http://www.ef.com/english-resources/english-vocabulary/top-3000-words/
I got sick of starting a headless repl, so the following will start a session at port 54321.
There are examples of SVG rendering presented in the
core namespace in the comments below. The gist is though, to run data through
tSNE, then pipe it into
spit-svg. Pretty straightforward!