# Coni63/DOSNES

Implementation of the Doubly Stochastic Neighbor Embedding on Spheres algorithm published by Yao Lu in Sep. 2016 (Source : https://arxiv.org/abs/1609.01977)
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.
analysis
dosnes
images
test_digit.py
test_iris.py

# Doubly Stochastic Neighbor Embedding on Spheres

The repository propose a Python implementation of the Doubly Stochastic Neighbor Embedding on Spheres (DOSNES) published by Yao Lu, Zhirong Yang, Jukka Corander on Arxiv in Sep. 2016. It is based on the Matlab implemantion available on Github.

The principle of this model is to embed an high dimensionnal datas on a 3D Sphere. The principle is the same as the TSNE but at every iteration, all embedded points are forced to be on a sphere.

## Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

### Prerequisites

No prerequisites are needed, you can download the repository or clone it.

``````git clone https://github.com/Coni63/DOSNES.git
``````

### Installing

No pip installation is available, you just have to include the package in you project folder.

## How to use

The model has been build to be similar to sklearn model. As imple example is available below or in test_XX.py files.

```from sklearn import datasets
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

from dosnes import dosnes

X, y = datasets.load_digits(return_X_y = True)
metric = "sqeuclidean"

model = dosnes.DOSNES(metric = metric, verbose = 1, random_state=42)
X_embedded = model.fit_transform(X)

fig = plt.figure()
ax = fig.gca(projection='3d')
ax.scatter(X_embedded[:, 0], X_embedded[:, 1], X_embedded[:, 2], c=y, cmap=plt.cm.Set1)
plt.title("Digits Dataset Embedded on a Sphere with metric {}".format(metric))
plt.show()```

You can provide either a distance matrix custom if you set the metric to pre-computed, or provide you dataset and the metric to use (must be part of distances available with the function pdist of scipy)

## Results

An very quick analysis of all parameter is present in the Notebook available in "analysis" folder. You can see below the evolution of the cost function and a gif of the training on the Iris dataset. Values are taken from the quick analysis.

## Contributor

• Nicolas MINE - Initial work - Coni63
• btaba - Implementation of Sinkhorn Knopp Algorithm - btaba
• Paul Panzer - Support on StackOverflow

## Acknowledgments

There may be still some errors. For example on digit dataset, the result is not as good as the one from the official paper of matlab implementation. There is no checks / error handling implemented yet. You should not provide a dataset with missing values.

You can’t perform that action at this time.