# Compare Dimensional Reduction


## Run following dimensional reduction algorithms for given fonts
1. Principal Component Analysis (PCA)
1. IsoMap
1. t-SNE

## Compare results of each using calculation of correlation coefficient
1. || X_hat - X || / || X ||

## Imports and Globals

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.decomposition import PCA
from sklearn.manifold import Isomap
from sklearn.manifold import TSNE

import font_utils.load_font as LF
import font_utils.upper_lower_numerals as ULN
import sci_kit_learn_utils.utils as SKU

In [3]:
import importlib
importlib.reload(SKU)

<module 'sci_kit_learn_utils.utils' from '/home/digital-tenebrist/ms-data-science/math-637/udel-math-637/utils/sci_kit_learn_utils/utils.py'>

## Read Font
1. Returns dictionary for each variant with following fields
    1. df - pandas data frame with following trimming
        1. Retains m_label, and r0c0,...,r19c19 columns only
        1. No italic
        1. Only a-zA-Z0-9 returned
        1. Only min instances of each character based on min for a-zA-Z0-9
    1. min_char_count - number of instances of each character

In [4]:
uln = ULN.UpperLowerNumerals.get_ascii_codes()

lf = LF.LoadFont('garamond')
font_dict = lf.get_trimmed_font()
font_df = font_dict['GARAMOND']['df']

LABEL_AR = None

face_names = ['Normal', 'Bold']

raw_dfs = list()

for i in range(font_dict['GARAMOND']['min_char_count']):
        t_df = pd.DataFrame(data=[font_df.loc[font_df.m_label == x].iloc[i] for x in uln])
        
        if i==0:
            LABEL_AR = [chr(x) for x in t_df.m_label]
            
        t_df = t_df.drop(columns=['m_label'])
        t_df = t_df-t_df.mean(axis=0)

        # Perform PCA and calculate distance score
        pca = PCA(n_components=2)
        pca_y = pca.fit_transform(t_df)
        print(f'{face_names[i]:6s} PCA   Distance Score {SKU.calc_dist_cor_score(t_df,pca_y):0.4f}')
        
        # Perform IsoMap and calculate distance score
        isomap = Isomap(n_neighbors=8, n_components=2)
        iso_y = isomap.fit_transform(t_df)
        print(f'{face_names[i]:6s} Iso   Distance Score {SKU.calc_dist_cor_score(t_df,iso_y):0.4f}')
        
        # Perform t-SNE and calculate distance score
        tsne = TSNE(n_components=2, init='pca',random_state=0)
        tsne_y = tsne.fit_transform(t_df)
        print(f'{face_names[i]:6s} t-SNE Distance Score {SKU.calc_dist_cor_score(t_df,tsne_y):0.4f}')


Normal PCA   Distance Score 0.7940
Normal Iso   Distance Score 0.5091
Normal t-SNE Distance Score 0.8677
Bold   PCA   Distance Score 0.6884
Bold   Iso   Distance Score 0.5629
Bold   t-SNE Distance Score 0.5767


### t-SNE Wins

1. What are the best parameters for t-SNE
1. Start with neighbors=2 and increase to 10

In [8]:
for i in range(font_dict['GARAMOND']['min_char_count']):
        t_df = pd.DataFrame(data=[font_df.loc[font_df.m_label == x].iloc[i] for x in uln])
        
        if i==0:
            LABEL_AR = [chr(x) for x in t_df.m_label]
            
        t_df = t_df.drop(columns=['m_label'])
        t_df = t_df-t_df.mean(axis=0)

        for n_n in range(2,50):
            # Perform t-SNE and calculate distance score
            tsne = TSNE(n_components=n_n, init='pca',random_state=0, method='exact')
            tsne_y = tsne.fit_transform(t_df)
            print(f'{face_names[i]:6s} t-SNE Distance Score {SKU.calc_dist_cor_score(t_df,tsne_y):0.4f}')

Normal t-SNE Distance Score 0.7329
Normal t-SNE Distance Score 0.6279
Normal t-SNE Distance Score 0.5317
Normal t-SNE Distance Score 0.4602
Normal t-SNE Distance Score 0.4027
Normal t-SNE Distance Score 0.3522
Normal t-SNE Distance Score 0.3108
Normal t-SNE Distance Score 0.2769
Normal t-SNE Distance Score 0.2486
Normal t-SNE Distance Score 0.2232
Normal t-SNE Distance Score 0.2020
Normal t-SNE Distance Score 0.1818
Normal t-SNE Distance Score 0.1660
Normal t-SNE Distance Score 0.1524
Normal t-SNE Distance Score 0.1405
Normal t-SNE Distance Score 0.1295
Normal t-SNE Distance Score 0.1204
Normal t-SNE Distance Score 0.1121
Normal t-SNE Distance Score 0.1048
Normal t-SNE Distance Score 0.0967
Normal t-SNE Distance Score 0.0897
Normal t-SNE Distance Score 0.0841
Normal t-SNE Distance Score 0.0781
Normal t-SNE Distance Score 0.0728
Normal t-SNE Distance Score 0.0680
Normal t-SNE Distance Score 0.0634
Normal t-SNE Distance Score 0.0592
Normal t-SNE Distance Score 0.0551
Normal t-SNE Distanc