Python module for Barnes-Hut implementation of t-SNE (Cython)
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
bhtsne
src
test
.gitignore
.travis.yml
MANIFEST.in
Makefile
README.md
bhtsne_wrapper.pyx
setup.cfg
setup.py

README.md

travis-ci

Python BHTSNE

Python module for Barnes-Hut implementation of t-SNE (Cython).

This module is based on the excellent work of Laurens van der Maaten.

Features

  • Better results than the Scikit-Learn BH t-SNE implementation: bhtsne VS Scikit-Learn
  • Fast (C++/Cython)
  • Ability to set random seed
  • Ability to set pre-defined plot coordinates (allow for smooth transitions between plots)

Installation

From pip:

pip install bhtsne

Examples

Iris Data Set

Reduce the four dimensional iris data set to two dimensions:

from bhtsne import tsne
from sklearn.datasets import load_iris
iris = load_iris()
Y = tsne(iris.data)
plt.scatter(Y[:, 0], Y[:, 1], c=iris.target)
plt.show()

This should result in:

Iris Plot

Transition between two t-SNE results

When adding new data the t-SNE plot can change dramatically (even when setting a random seed). This makes it hard to animate between different plots when data is in motion.

This problem can be partially solved by setting the start coordinates of the first N vectors. In this example we'll create two t-SNE plots, the first one will have part of the iris data set. The second will include the remaining 10 of the iris set:

from bhtsne import tsne
from sklearn.datasets import load_iris
iris = load_iris()
X_a = load_iris().data[:-10]
X_b = load_iris().data
# Generate random positions for last 10 items
remainder_positions = np.array([
    [(random.uniform(0, 1) * 0.0001), (random.uniform(0, 1) * 0.0001)]
        for x in range(X_b.shape[0] - Y_a.shape[0])
    ])
# Append them to previous TSNE output and use as seed_positions in next plot
seed_positions = np.vstack((Y_a, remainder_positions))
Y_b = tsne(X_b, seed_positions=seed_positions)
plt.scatter(Y_a[:, 0], Y_a[:, 1], c='b')
plt.scatter(Y_b[:-10, 0], Y_b[:-10, 1], c='r')
plt.scatter(Y_b[-10:, 0], Y_b[-10:, 1], c='g')
plt.show()

The resulting plot shows our first iteration in blue. Then the second iteration is shown in red and the new nodes that were added are green:

Iris Plot

Development

Build:

pip install cython
make

To run unit tests:

make test

Also creates visual plots in the test/plots folder.

Todo

  • Allow more sophisticated control of updates to the t-SNE (streaming/online t-SNE)
  • Allow more control on the number of iterations and error rate thresholds