## Introduction to nptsne

This Jupyter notebook provides an executable documentation for using the nptsne package. To run it install nptsne from Pypi or pip install the downloaded .whl file fo your os. 

#### Demo requirements

Nptsne is supported on python 3.6, 3.7, 3.8 & 3.9. The following packages are required to run this demo:

* numpy
* matplotlib
* six
* scipy
* umap-learn (for the umap examples)


In [None]:
import os
import sys
import nptsne
import matplotlib.pyplot as plt
from   matplotlib import rc
import numpy as np
import umap

from six.moves import urllib
from scipy.io import loadmat
from matplotlib import colors as mcolors
from timeit import default_timer as timer
colors = ['#FF0000', '#FF9900', '#CCFF00', '#33FF00', '#00FF66', '#00FFFF', '#0066FF', '#3300FF', '#CC00FF', '#FF0099']
print("Running python {}.{}".format(sys.version_info.major, sys.version_info.minor))


### <font color=blue>nptsne.TextureTsne API</font>

This class allows a basic inteface which is similar to the [scikit-learn tsne](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html). This API comprises the following methods 
* `__init__`: nptsne.TextureTsne() constructor
* `fit_transform` : Create the tSNE embedding

Full API documentation will be shown in the cell below.

In [None]:
import nptsne
help(nptsne)


In [None]:
from nptsne import KnnAlgorithm
help(KnnAlgorithm)

#### <font color=blue>Download minist data for use in the demos</font>

In [None]:
from pathlib import Path
root = Path.cwd().resolve().parent
mnist_path = root / 'data/mnist-original.mat'
#if not os.path.isfile(mnist_path):
#    mnist_alternative_url = 'https://github.com/amplab/datascience-sp14/raw/master/lab7/mldata/mnist-original.mat'
#    response = urllib.request.urlopen(mnist_alternative_url)
#    with open(mnist_path, 'wb') as f:
#        content = response.read()
#        f.write(content)
mnist_raw = loadmat(mnist_path)
mnist = {
    'data': mnist_raw['data'].T,
    'label': mnist_raw['label'][0],
    'COL_NAMES': ['label', 'data']
}
print('Mnist data dimenstions: ', mnist['data'].shape)


#### <font color=blue>Create a tSNE embedding of the 70000 MNIST data points & display the elapsed time</font>

In [None]:
tsne = nptsne.TextureTsne(False,1000,2,30,800, nptsne.KnnAlgorithm.Flann)
#Can also be run with knn as HNSW: this works faster in very large datasets lower dimensional data (<40 dimensions)
#tsne = nptsne.TextureTsne(False,1000,2,30,800, nptsne.nptsne.KnnAlgorithm.HNSW)

embedding = None
try:
   
    for i in range(1):
        start = timer()
        embedding = tsne.fit_transform(mnist['data'])
        end = timer()
        print(f'got embedding in {end - start}')
except Exception as ex:
    print('Error....')
    print(ex)

#### <font color=blue>Display the tSNE Mnist embedding</font>

In [None]:
# norm = mcolors.Normalize(vmin=0, vmax=9)
xyembed = embedding.reshape((70000, 2))
# mcolors.ListedColormap(colors)
rc('lines', linewidth=2)
rc('lines', markersize=1)
plt.scatter(xyembed[..., 0], xyembed[..., 1], c=mnist['label'], cmap=mcolors.ListedColormap(colors), facecolors='None', marker='o')
plt.show()

### <font color=blue>nptsne.TextureTsneExtended API</font>

This class offers a second, more flexible API. It adds a number of features to the basic TextureTsne API, specifically:

* `__init`: nptsne.TextureTsneExtended() constructor.
* `init_transform`: Initialize the transform with data and an optional initial embedding.  Performs the nearest neighbor calculation
* `run_transform`: Running/restarting the transform for a number or iterations (enables display of intermediate results). Enable/disable verbose output. 
* `start_exaggeration_decay`: Explicitly triggering the force exaggeration decay. Permits the typical tSNE cluster expansion. In the basic API this occurs at 250 iterations.
* `close`: Free the GPU resources.

Properties
* `decay_started_at`: The iteration number when decay exaggeration was started.
* `iteration_count`: The current iteration.

Full API documentation will be shown in the cell below.

In [None]:
print("nptsne version: {}".format(nptsne.__version__))
help(nptsne.TextureTsneExtended)

#### <font color=blue>Create and initialize TextureTsneExtended with Mnist data</font>

In [None]:
tsne = nptsne.TextureTsneExtended(False)
embeddings = []
if tsne.init_transform(mnist['data']):
    print('Init succeeded')

#### <font color=blue>Run the tSNE embedding in blocks of 100 iterations. Reduce the exaggeration force from step 700. Record the intermediate embeddings in a plot.</font>

In [None]:
step_size = 100
plt.figure(2,figsize=(15,10))

for i in range(10):
    
    start = timer()
    stop_exaggeration = False
    # reduce the forces from iteration 700 
    if i == 7:
        tsne.start_exaggeration_decay()
        print(f'exaggeration stopping at {tsne.decay_started_at}')
    embedding = tsne.run_transform(verbose=False, iterations=step_size)
    end = timer()
    print(f'got embedding in {end - start}')
    print(f'iteration count {tsne.iteration_count}')
    xyembed = np.copy(embedding.reshape((70000, 2)))
    embeddings.append(xyembed)
    print(f"subplot {i+1}")
    plt.subplot(3,4,i+1)
    plt.gca().set_title('Iter: ' + str(100*(i+1)))
    plt.scatter(xyembed[..., 0], xyembed[..., 1], c=mnist['label'], cmap=mcolors.ListedColormap(colors), facecolors='None', marker='o')

plt.draw()
plt.savefig(f'testext.png')
   


#### <font color=blue>Reset the embedding and rerun the transform above. This does not rerun the knn so the speed of the OpenGL texture speed is clear.</font>

Rerun the cell below (using Ctrl-Enter) to demonstrate the stochastic nature of the embedding

In [None]:
fig, axes = plt.subplots()
tsne.reinitialize_transform()
start = timer()
tsne.run_transform(verbose=False, iterations=700)
tsne.start_exaggeration_decay()
embedding = tsne.run_transform(verbose=False, iterations=300)
end = timer()
print(f'Recalculated embedding in {end - start}')
xyembed = np.copy(embedding.reshape((70000, 2)))
embeddings.append(xyembed)
axes.scatter(xyembed[..., 0], xyembed[..., 1], c=mnist['label'], cmap=mcolors.ListedColormap(colors), facecolors='None', marker='o')

#### <font color=blue>Closing the tsne frees the OpenGL context</font>

In [None]:
tsne.close() 

#### <font color=blue>Make a umap embedding of a randompy selected 1/10 of the Mnist data.</font>

In [None]:
# extract 1 data point in 10  : p=[.1, .9]
# generate and index array with approximately 1/10 of data row numbers
import umap
idx = np.where(np.random.choice([1, 0], size=70000, p=[0.1, 0.9]))
subLabel = np.squeeze(mnist['label'][idx])
subData =  mnist['data'][idx]

print(subLabel.shape, subData.shape)

umap_embed = umap.UMAP().fit_transform(subData)

plt.figure(3,figsize=(15,10))

plt.scatter(umap_embed[..., 0], umap_embed[..., 1], c=subLabel, cmap=mcolors.ListedColormap(colors), facecolors='None', marker='o')
plt.draw()


#### <font color=blue>Use the umap embedding to initialize the TextureTsneExtended. 

This shows that with high exaggeration forces tSNE leaves the umap embedding largely unchanged. Allowing the exaggeration forces to decay produced a more typical tSNE embedding.</font>

This equivalence has been noted elsewhere see [Attraction-Repulsion Spectrum in Neighbor Embeddings](https://arxiv.org/abs/2007.08902) 

In [None]:
tsne = nptsne.TextureTsneExtended(verbose=True)

print(f'Init tSNE from umap, shape: {umap_embed.shape}')
if tsne.init_transform(subData, umap_embed):
    print('Init from umap succeeded')

step_size = 100
plt.figure(4,figsize=(15,10))
for i in range(10):
    start = timer()
    exaggeration_iter = 100
    # reduce the forces from 1000 
    if i == 5:
        tsne.start_exaggeration_decay()
        print(f'exaggeration stopping at {tsne.decay_started_at}')

    embedding = tsne.run_transform(verbose=True, iterations=step_size)
    end = timer()
    print(f'got embedding in {end - start}')
    xyembed = np.copy(embedding.reshape((-1, 2)))
    plt.subplot(4,4,i+1)
    plt.gca().set_title('Iter: ' + str(100*(i+1)))
    plt.scatter(xyembed[..., 0], xyembed[..., 1], c=subLabel, cmap=mcolors.ListedColormap(colors), facecolors='None', marker='o')

plt.draw()

tsne.close()  
