# TSNE Optimizer
A Grid search method to minimize the KL divergence.

Run `optimize.sh` to do a grid search and optimize TSNE.
- Save model parameters and kl_divergence
- Only keep lowest kl_divergence

```bash

< param_grid.csv parallel -C , --header , "python optimize.py --model TSNE --params p1 p2 p3 --header h1 h2 h3 >> kl_divergence.csv"

```


Parameters that control optimization of t-SNE:
- `perplexity`: larger perplexity leads to more nearest neighbours and is less sensitive to small structures. A lower perplexity considers a smaller number of neighbours. Noisy data will requir larger perplexity values to encompass enoughlocal neighbors to see beyond noise. Values:
    - default is 30.0, usually between 5 and 50.
- `early_exaggeration`: the larger the factor, the larger the gap between natural clusters. Usually doesn't have to be tuned.
    - default is 12.0
- `learning_rate`: can set `learning_rate='auto'`.
    - default is 200.0 [10.0, 1000.0], or `auto` which is `max(N / early_exaggeration / 4, 50)`.
- `n_iter` (usually high enough)
- `angle` (not used in exact method): tradeoff between performance and accuracy. Larger angles increase speed, but less accurate results. Usually between 0.2 and 0.8
- method: `exact` or `barness-hut`. The latter is an approximation that is better on big datasets.
- `init` can be random or pca. pca is usually more globally stable.

**Note**:
- Failing to visualize well separated homogeneously labeled groups does not necessarily imply that the data cannot be classified by a supervised model. It might be that 2 dimensions are not high enough to accurately represent the internal structure of the data.
- If t_SNE provides labels that match the natural grouping, while the linear 2D projection of PCA yields labels that overlap, it's a strong clue that the data can be separated by non-linear methods that focus on local structure.
- Make sure to use scaling methods.

In [9]:
import pandas as pd
import numpy as np
import itertools as it

In [6]:
param_dict = {
    'perplexity': list(range(20, 45, 5)),
    'early_exaggeration': [12],
    'learning_rate': ['auto'],
    'n_iter': list(range(1000, 5000, 1000)),
    'angle': np.linspace(0.2, 0.8, 4),
    'method': ['exact', 'barness-hut'],
    'init': ['random', 'pca']
}

In [7]:
def create_param_options(param_dict):
    grid = []
    keys = list(param_dict.keys())
    values = list(param_dict.values())
    value_combs = list(it.product(*values))

    for val in value_combs:
        temp = dict(zip(keys, val))
        grid.append(temp)

    return grid

In [15]:
combinations = create_param_options(param_dict)
param_grid = pd.DataFrame(combinations)
param_grid.head()

Unnamed: 0,perplexity,early_exaggeration,learning_rate,n_iter,angle,method,init
0,20,12,auto,1000,0.2,exact,random
1,20,12,auto,1000,0.2,exact,pca
2,20,12,auto,1000,0.2,barness-hut,random
3,20,12,auto,1000,0.2,barness-hut,pca
4,20,12,auto,1000,0.4,exact,random


In [17]:
# param_grid.to_csv('param_grid.csv', index=False)

In [19]:
param_grid.loc[0]

perplexity                20
early_exaggeration        12
learning_rate           auto
n_iter                  1000
angle                    0.2
method                 exact
init                  random
Name: 0, dtype: object

In [None]:
import multiprocessing as mp

import pandas as pd
import numpy as np
import itertools as it
import json

from sklearn.manifold import TSNE
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import GridSearchCV

class TSNEOptimizer:
    
    def __init__(self, param_dict) -> None:
        self.param_grid = param_dict
        self.kl_divergence_ = None
        self.model = None
        self.embedding = None
        
    def fit_transform(self, X):
        model = TSNE(**self.param_grid)
        embedding = model.fit_transform(X)
        
        # cost function
        self.kl_divergence_ = model.kl_divergence
        

class TSNEGridSearch:

    def __init__(self, param_dict) -> None:
        self.best_score = np.inf
        self.best_model = None
        self.embedding = None

        self.param_grid = self.create_param_options(param_dict)

    def create_param_options(self, param_dict):
        grid = []
        keys = list(param_dict.keys())
        values = list(param_dict.values())
        value_combs = list(it.product(*values))

        for val in value_combs:
            temp = dict(zip(keys, val))
            grid.append(temp)

        return grid

    def fit_transform(self, X):
        n_params = len(self.param_grid)

        for i, params in enumerate(self.param_grid):
            print(f'Processing {i+1}/{n_params}')

            model = TSNE(**params)
            embedding = model.fit_transform(X)

            # cost function
            kl_div = model.kl_divergence_
            if kl_div < self.best_score:
                self.__update_model(kl_div, model, embedding)

        return self.embedding

    def fit_mp(self, parameters):
        X, params  = parameters
        print("Training model...")
        model = TSNE(**params)
        embedding = model.fit_transform(X)

        return model

    def __update_model(self, kl_divergence, model, embedding):
        self.best_score = kl_divergence
        self.best_model = model
        self.embedding = embedding