# 188.413 Self-Organizing Systems 2020W
## SOM Visualization
#### Lorenz Bacca [01552268] and Davide Sforza [12006440]
**https://github.com/dsforza96/SOM-visualization**

In [1]:
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; } .output { align-items: center; }</style>"))

## 1. Implementation

### 1.1. P-Matrix
Our P-matrix implementation follows [A. Ultsch](https://www.researchgate.net/profile/Alfred-Ultsch/publication/228706090_Maps_for_the_visualization_of_high-dimensional_data_spaces/links/544652950cf2f14fb80f3134/Maps-for-the-visualization-of-high-dimensional-data-spaces.pdf) "Maps for the Visualization of high-dimensional Data Spaces". Radius can be specified through the `perc` parameter, which represent the percentile of all pairwise distances between input data. According to the author, 18 a good value for this parameter, so we set it as default.

In [2]:
import numpy as np
from scipy.spatial import distance_matrix

def pmatrix(self, som_map=None, color='Viridis', idata=[], perc=18, interp='best', title=''):
    in_distmat = distance_matrix(idata, idata)
    radius = np.percentile(in_distmat, perc)
    
    distmat = distance_matrix(self.weights, idata)

    pm = distmat <= radius
    pm = np.sum(pm, axis=-1)

    if som_map == None:
        return self.plot(pm.reshape(self.m, self.n), color=color, interp=interp, title=title)    
    else:
        som_map.data[0].z = pm.reshape(self.m,self.n)

### 1.2. U-Matrix
The implementation of the U-matrix provided in this notebook seems to compute the D-matrix instead. Thus, we decided to reimplement it and to use the given code for the D-matrix visualization (after a minor bug fix). As in the Java SOMToolbox, our implementation computes distances between adjacent units using Neumann neighborhood (four neighbors) and then interpolates values using the mean.

In [3]:
from scipy.ndimage import correlate

def umatrix(self, som_map=None, color='Viridis', interp='best', title=''):
    # Distance computation
    m, n = self.m, self.n
    mm, nn = m * 2 - 1, n * 2 - 1
    
    idx1 = np.kron(np.arange(m * n).reshape(m, n), np.ones((2, 2)))
    
    k1 = np.array([0, 1] * n)[None, :]
    k2 = np.array([0, n] * m)[:, None]
    idx2 = idx1 + k1 + k2
    
    idx1 = idx1[:-1, :-1].astype(np.int32).ravel()
    idx2 = idx2[:-1, :-1].astype(np.int32).ravel()
    
    dist = np.sqrt(np.sum(np.square(self.weights[idx1] - self.weights[idx2]), axis=-1))
    um = dist.reshape(mm, nn)
    
    # Interpolation
    w = [[0, 1, 0],
         [1, 0, 1],
         [0, 1, 0]]
    
    corr = correlate(um, w) / 4
    
    mask = np.zeros((mm, nn), dtype=np.bool)
    
    for i in range(mm):
        for j in range(nn):
            mask[i, j] = i & 1 == j & 1
            
    um[mask] = corr[mask]

    if som_map == None:
        return self.plot(um, color=color, interp=interp, title=title)    
    else:
        som_map.data[0].z = um

### 1.3. U*-Matrix
U\*-matrix visualization combines the U-matrix and the P-matrix via the formula described in [A. Ultsch](https://www.researchgate.net/profile/Alfred-Ultsch/publication/228530835_UMatrix_a_Tool_to_visualize_Clusters_in_high_dimensional_Data/links/544659300cf22b3c14de1c2f/UMatrix-a-Tool-to-visualize-Clusters-in-high-dimensional-Data.pdf) "U\*-Matrix: A Tool to Visualize Cluster in High-Dimensional Data". P-matrix is upscaled to match the U-matrix size which is larger due to its interpolating values.

In [4]:
from scipy.ndimage import zoom

def usmatrix(self, som_map=None, color='Viridis', idata=[], perc=10, interp='best', title=''):
    pm = self.pmatrix(idata=idata, perc=perc).data[0].z
    um = self.umatrix().data[0].z
    
    pm = zoom(pm, [u / p for u, p in zip(um.shape, pm.shape)])  # upscaling p-matrix
    
    if np.mean(pm) != np.min(pm):
        usm = um * (pm - np.mean(pm)) / (np.mean(pm) - np.min(pm))
    else:
        usm = um
    
    if som_map == None:
        return self.plot(usm, color=color, interp=interp, title=title)    
    else:
        som_map.data[0].z = usm

### 1.4. Quantization Error (qe) and Mean Quantization Error (que)
Quantization error (qe) and mean quantization error (mqe) are implemented here, following the definitions we saw during lecture.

In [5]:
def qe(self, som_map=None, color='Viridis', idata=[], interp=None, title=''):
    dist = distance_matrix(self.weights, idata)
    bmu = np.argmin(dist, axis=0)

    idx = np.arange(self.m * self.n)[:, None] == bmu[None, :]
    qe = np.sum(np.where(idx, dist, 0), axis=-1)

    if som_map == None:
        return self.plot(qe.reshape(self.m, self.n), color=color, interp=interp, title=title)    
    else:
        som_map.data[0].z = qe.reshape(self.m,self.n)

In [6]:
def mqe(self, som_map=None, color='Viridis', idata=[], interp=None, title=''):
    dist = distance_matrix(self.weights, idata)
    bmu = np.argmin(dist, axis=0)

    idx = np.arange(self.m * self.n)[:, None] == bmu[None, :]
    qe = np.sum(np.where(idx, dist, 0), axis=-1)

    count = np.sum(idx, axis=-1)
    mqe = np.divide(qe, count, out=np.zeros_like(qe), where=count != 0)

    if som_map == None:
        return self.plot(mqe.reshape(self.m, self.n), color=color, interp=interp, title=title)    
    else:
        som_map.data[0].z = mqe.reshape(self.m,self.n)

### 1.5. Code for SOM training, SOM parsing from file and some basic visualizations (provided togheter with the notebook)
The existing U-matrix implementation is exploited to compute the D-matrix visualization.

In [7]:
import plotly.graph_objects as go
from ipywidgets import Layout, HBox, Box, widgets, interact


class SomViz:
    def __init__(self, weights=[], m=None, n=None):
        self.weights = weights
        self.m = m
        self.n = n
        
    umatrix = umatrix
    pmatrix = pmatrix
    usmatrix = usmatrix
    qe = qe
    mqe = mqe

    # Exploiting the existing (probably wrong) U-matrix implementation to compute D-matrix
    def dmatrix(self, som_map=None, color="Viridis", interp="best", title=""):
        um = np.zeros((self.m * self.n, 1))
        neuron_locs = list()
        for i in range(self.m):
            for j in range(self.n):
                neuron_locs.append(np.array([i, j]))
        neuron_distmat = distance_matrix(neuron_locs,neuron_locs)

        for i in range(self.m * self.n):
            neighbor_idxs = neuron_distmat[i] <= 1
            neighbor_weights = self.weights[neighbor_idxs]
            # Bug-fix: the unit itself shall not be included in the mean calculation
            um[i] = np.sum(distance_matrix(np.expand_dims(self.weights[i], 0), neighbor_weights))
            um[i] /= (np.sum(neighbor_idxs) - 1)

        if som_map == None: return self.plot(um.reshape(self.m,self.n), color=color, interp=interp, title=title)    
        else: som_map.data[0].z = um.reshape(self.m,self.n)
            

    def hithist(self, som_map=None, idata=[], color="RdBu", interp="best", title=""):
        hist = [0] * self.n * self.m
        for v in idata: 
            position = np.argmin(np.sqrt(np.sum(np.power(self.weights - v, 2), axis=1)))
            hist[position] += 1    
        
        if som_map == None: return self.plot(np.array(hist).reshape(self.m,self.n), color=color, interp=interp, title=title)        
        else: som_map.data[0].z = np.array(hist).reshape(self.m,self.n)


    def component_plane(self, som_map=None, component=0, color="Viridis", interp = "best", title=""):
        if som_map == None: return self.plot(self.weights[:,component].reshape(-1,self.n), color=color, interp=interp, title=title)   
        else: som_map.data[0].z = self.weights[:,component].reshape(-1,n)


    def sdh(self, som_map=None, idata=[], sdh_type=1, factor=1, draw=True, color="Cividis", interp = "best", title=""):
        import heapq
        sdh_m = [0] * self.m * self.n

        cs = 0
        for i in range(0, factor): cs += factor - i

        for vector in idata:
            dist = np.sqrt(np.sum(np.power(self.weights - vector, 2), axis=1))
            c = heapq.nsmallest(factor, range(len(dist)), key=dist.__getitem__)
            if (sdh_type == 1): 
                for j in range(0, factor): sdh_m[c[j]] += (factor - j) / cs  # normalized
            if (sdh_type == 2):
                for j in range(0, factor): sdh_m[c[j]] += 1.0 / dist[c[j]]  # based on distance
            if (sdh_type == 3): 
                dmin = min(dist)
                for j in range(0, factor): sdh_m[c[j]] += 1.0 - (dist[c[j]] - dmin) / (max(dist) - dmin)  

        if som_map == None: return self.plot(np.array(sdh_m).reshape(-1,self.n), color=color, interp=interp, title=title)      
        else: som_map.data[0].z = np.array(sdh_m).reshape(-1,self.n)

    
    def project_data(self, som_m=None, idata=[], title=""):
        data_y = []
        data_x = []
        for v in idata:
            position = np.argmin(np.sqrt(np.sum(np.power(self.weights - v, 2), axis=1)))
            x,y = position % self.n, position // self.n
            data_x.extend([x])
            data_y.extend([y])
            
        if som_m != None: som_m.add_trace(go.Scatter(x=data_x, y=data_y, mode="markers", marker_color="rgba(255, 255, 255, 0.8)"))
    

    def time_series(self, som_m=None, idata=[], wsize=50, title=""):
        data_y = []
        data_x = [i for i in range(0,len(idata))]
        
        data_x2 = []
        data_y2 = []
        
        qmin = np.Inf
        qmax = 0
        
        step=1
        
        ps = []
        for v in idata:
            matrix = np.sqrt(np.sum(np.power(self.weights - v, 2), axis=1))
            position = np.argmin(matrix)
            qerror = matrix[position]
            if qmin > qerror: qmin = qerror
            if qmax < qerror: qmax = qerror
            ps.append((position, qerror))
       
        markerc=[]    
        for v in ps:
            data_y.extend([v[0]])
            rez = v[1] / qmax
 
            markerc.append('rgba(0, 0, 0, '+str(rez)+')') 
            
            x,y = v[0] % self.n, v[0] // self.n 
            if    x == 0: y = np.random.uniform(low=y, high=y+.1)
            elif  x == self.m - 1: y = np.random.uniform(low=y-.1, high=y)
            elif  y == 0: x = np.random.uniform(low=x, high=x+.1)
            elif  y == self.n - 1: x = np.random.uniform(low=x-.1, high=x)
            else: x, y = np.random.uniform(low=x-.1, high=x+.1), np.random.uniform(low=y-.1, high=y+.1)                           
            
            data_x2.extend([x])
            data_y2.extend([y]) 
    
        ts_plot = go.FigureWidget(go.Scatter(x=[], y=[], mode="markers", marker_color=markerc, marker=dict(colorscale="Viridis", showscale=True, color=np.random.randn(500))))
        ts_plot.update_xaxes(range=[0, wsize])       

        ts_plot.data[0].x, ts_plot.data[0].y = data_x, data_y
        som_m.add_trace(go.Scatter(x=data_x2, y=data_y2, mode="markers"))
  
        som_m.layout.height = 500
        ts_plot.layout.height = 500
        som_m.layout.width = 500
        ts_plot.layout.width = 1300
        
        return HBox([go.FigureWidget(som_m), go.FigureWidget(ts_plot)])


    def plot(self, matrix, color="Viridis", interp = "best", title=""):
        return go.FigureWidget(go.Heatmap(z=matrix, zsmooth=interp, colorscale=color), layout=go.Layout(width=700*self.n/self.m,  height=700, yaxis=dict(autorange='reversed'), title=title, title_x=0.5))

In [8]:
import gzip
import pandas as pd


class SOMToolBox_Parse:
    def __init__(self, filename):
        self.filename = filename


    def read_weight_file(self,):
        df = pd.DataFrame()
        if self.filename[-3:len(self.filename)] == '.gz':
            with gzip.open(self.filename, 'rb') as file:
                df, vec_dim, xdim, ydim = self._read_vector_file_to_df(df, file)
        else:
            with open(self.filename, 'rb') as file:
                df, vec_dim, xdim, ydim = self._read_vector_file_to_df(df, file)

        file.close()            
        return df.astype('float64'), vec_dim, xdim, ydim


    def _read_vector_file_to_df(self, df, file):
        xdim, ydim, vec_dim, position = 0, 0, 0, 0
        for byte in file:
            line = byte.decode('UTF-8')
            if line.startswith('$'):
                xdim, ydim, vec_dim = self._parse_vector_file_metadata(line, xdim, ydim, vec_dim)
                if xdim > 0 and ydim > 0 and len(df.columns) == 0:
                    df = pd.DataFrame(index=range(0, ydim * xdim), columns=range(0, vec_dim))
            else:
                if len(df.columns) == 0 or vec_dim == 0:
                    raise ValueError('Weight file has no correct Dimensional information.')
                position = self._parse_weight_file_data(line, position, vec_dim, df)
        return df, vec_dim, xdim, ydim


    def _parse_weight_file_data(self, line, position, vec_dim, df):
        splitted=line.split(' ')
        try:
            df.values[position] = list(np.array(splitted[0:vec_dim]).astype(float))
            position += 1
        except: raise ValueError('The input-vector file does not match its unit-dimension.') 
        return  position


    def _parse_vector_file_metadata(self, line, xdim, ydim, vec_dim):
        splitted = line.split(' ')
        if splitted[0] == '$XDIM':      xdim = int(splitted[1])
        elif splitted[0] == '$YDIM':    ydim = int(splitted[1])
        elif splitted[0] == '$VEC_DIM': vec_dim = int(splitted[1])
        return xdim, ydim, vec_dim  

## 2. Evaluation

For the evaluation of the different implementations we have used two datasets and two sizes of SOM. (40x20 -small and a 100x60 - large)

The first dataset (left visualization below) is the so called chain link data set that contains two two-dimensional rings which are intertwined in a three-dimensional space. For the evaluation of the implementation the rings are projected on a two-dimensional space it which causes the rings to break. 

The second dataset (right visualization below) is the so called 10 clusters dataset. The clusters were generated from 10-dimensional gaussian distributions with different densities. For the evaluation of the implementation it is important that the the clusters are still visible on the two-dimensional projection. 

<table><tr><td><img src="images/chainlink-info.png" width="350"/></td><td><img src="images/10clusters-info.png" width="350"/></td></tr></table>

In [9]:
import minisom as som
from sklearn.preprocessing import MinMaxScaler

small_m, small_n = 40, 20
large_m, large_n = 100, 60

### 2.1. Chain Link Dataset Small SOM

In [10]:
# Train
chainlink = SOMToolBox_Parse('datasets/chainlink.vec')
chainlink, _, _, _ = chainlink.read_weight_file()
chainlink = MinMaxScaler().fit_transform(chainlink)
chainlink_dim = chainlink.shape[-1]

chainlink_small = som.MiniSom(small_m, small_n, chainlink_dim, sigma=0.5, learning_rate=0.1)
chainlink_small.train_random(chainlink, 30000, verbose=True)

# Visualizaton
chainlink_small_viz = SomViz(chainlink_small._weights.reshape(-1, chainlink_dim), small_m, small_n)

 [ 30000 / 30000 ] 100% 0.01521 it/s

In [11]:
chainlink_small_pm1 = chainlink_small_viz.pmatrix(idata=chainlink, perc=1, title='P-matrix (1 percentile)')
chainlink_small_pm5 = chainlink_small_viz.pmatrix(idata=chainlink, perc=5, title='P-matrix (5 percentile)')
chainlink_small_pm18 = chainlink_small_viz.pmatrix(idata=chainlink, perc=18, title='P-matrix (18 percentile)')
chainlink_small_pm50 = chainlink_small_viz.pmatrix(idata=chainlink, perc=50, title='P-matrix (50 percentile)')

display(HBox([chainlink_small_pm1, chainlink_small_pm5, chainlink_small_pm18, chainlink_small_pm50]))

HBox(children=(FigureWidget({
    'data': [{'colorscale': [[0.0, '#440154'], [0.1111111111111111, '#482878'],
…

For all of the above visualizations the two rings and the break is not clearly visible. Increasing the percentile does give especially for the bottom of the visualization a ring like structure but the other ring and the break are not clearly visible.

In [12]:
chainlink_small_um = chainlink_small_viz.umatrix(title='U-matrix')

chainlink_small_usm1 = chainlink_small_viz.usmatrix(idata=chainlink, perc=1, title='U*-matrix (1 percentile)')
chainlink_small_usm5 = chainlink_small_viz.usmatrix(idata=chainlink, perc=5, title='U*-matrix (5 percentile)')
chainlink_small_usm18 = chainlink_small_viz.usmatrix(idata=chainlink, perc=18, title='U*-matrix (18 percentile)')
chainlink_small_usm50 = chainlink_small_viz.usmatrix(idata=chainlink, perc=50, title='U*-matrix (50 percentile)')

display(HBox([chainlink_small_um, chainlink_small_usm1, chainlink_small_usm5, chainlink_small_usm18, chainlink_small_usm50]))

HBox(children=(FigureWidget({
    'data': [{'colorscale': [[0.0, '#440154'], [0.1111111111111111, '#482878'],
…

As for the U-Matrix we do have similar issues as with the p-matrix. Ring like structures can be observed but not more. 

In [13]:
chainlink_small_qe = chainlink_small_viz.qe(idata=chainlink, title='Quantization error ')
chainlink_small_mqe = chainlink_small_viz.mqe(idata=chainlink, title='Mean quantization error ')

display(HBox([chainlink_small_qe, chainlink_small_mqe]))

HBox(children=(FigureWidget({
    'data': [{'colorscale': [[0.0, '#440154'], [0.1111111111111111, '#482878'],
…

The quantization error visualizations clearly gives a ring like structure for the bottom ring. The top ring can also not be observed very clearly, however a break in the bottom ring is observable. This type of visualization worked best for the problem at hand though.

### 2.2. Chain Link Dataset Large SOM

In [16]:
# Train
chainlink_large = som.MiniSom(large_m, large_n, chainlink_dim, sigma=0.8, learning_rate=0.7)
chainlink_large.train_random(chainlink, 5000, verbose=True)

# Visualizaton
chainlink_large_viz = SomViz(chainlink_large._weights.reshape(-1, chainlink_dim), large_m, large_n)

 [ 5000 / 5000 ] 100% 0.11525 it/s

In [17]:
chainlink_large_pm1 = chainlink_large_viz.pmatrix(idata=chainlink, perc=1, title='P-matrix (1 percentile)')
chainlink_large_pm5 = chainlink_large_viz.pmatrix(idata=chainlink, perc=5, title='P-matrix (5 percentile)')
chainlink_large_pm18 = chainlink_large_viz.pmatrix(idata=chainlink, perc=18, title='P-matrix (18 percentile)')
chainlink_large_pm50 = chainlink_large_viz.pmatrix(idata=chainlink, perc=50, title='P-matrix (50 percentile)')

display(HBox([chainlink_large_pm1, chainlink_large_pm5, chainlink_large_pm18, chainlink_large_pm50]))

HBox(children=(FigureWidget({
    'data': [{'colorscale': [[0.0, '#440154'], [0.1111111111111111, '#482878'],
…

In [18]:
chainlink_large_um = chainlink_large_viz.umatrix(title='U-matrix')

chainlink_large_usm1 = chainlink_large_viz.usmatrix(idata=chainlink, perc=1, title='U*-matrix (1 percentile)')
chainlink_large_usm5 = chainlink_large_viz.usmatrix(idata=chainlink, perc=5, title='U*-matrix (5 percentile)')
chainlink_large_usm18 = chainlink_large_viz.usmatrix(idata=chainlink, perc=18, title='U*-matrix (18 percentile)')
chainlink_large_usm50 = chainlink_large_viz.usmatrix(idata=chainlink, perc=50, title='U*-matrix (50 percentile)')

display(HBox([chainlink_large_um, chainlink_large_usm1, chainlink_large_usm5]))
display(HBox([chainlink_large_usm18, chainlink_large_usm50]))

HBox(children=(FigureWidget({
    'data': [{'colorscale': [[0.0, '#440154'], [0.1111111111111111, '#482878'],
…

HBox(children=(FigureWidget({
    'data': [{'colorscale': [[0.0, '#440154'], [0.1111111111111111, '#482878'],
…

In [19]:
chainlink_large_qe = chainlink_large_viz.qe(idata=chainlink, title='Quantization error ')
chainlink_large_mqe = chainlink_large_viz.mqe(idata=chainlink, title='Mean quantization error ')

display(HBox([chainlink_large_qe, chainlink_large_mqe]))

HBox(children=(FigureWidget({
    'data': [{'colorscale': [[0.0, '#440154'], [0.1111111111111111, '#482878'],
…

Increasing the size of the SOM would not yield better results quite the contrary we do observe less structure then bevore. Increasing the number of training iterations and "playing" with the training parameters would in general not improve the results. 

### 2.3. 10 Clusters Dataset Small SOM

In [20]:
# Train
clusters = SOMToolBox_Parse('datasets/clusters.vec')
clusters, _, _, _ = clusters.read_weight_file()
clusters = MinMaxScaler().fit_transform(clusters)
clusters_dim = clusters.shape[-1]

clusters_small = som.MiniSom(small_m, small_n, clusters_dim, sigma=0.8, learning_rate=0.7)
clusters_small.train_random(clusters, 10000, verbose=True)

# Visualizaton
clusters_small_viz = SomViz(clusters_small._weights.reshape(-1, clusters_dim), small_m, small_n)

 [ 10000 / 10000 ] 100% 0.01527 it/s

In [21]:
clusters_small_pm1 = clusters_small_viz.pmatrix(idata=clusters, perc=1, title='P-matrix (1 percentile)')
clusters_small_pm5 = clusters_small_viz.pmatrix(idata=clusters, perc=5, title='P-matrix (5 percentile)')
clusters_small_pm18 = clusters_small_viz.pmatrix(idata=clusters, perc=18, title='P-matrix (18 percentile)')
clusters_small_pm50 = clusters_small_viz.pmatrix(idata=clusters, perc=50, title='P-matrix (50 percentile)')

display(HBox([clusters_small_pm1, clusters_small_pm5, clusters_small_pm18, clusters_small_pm50]))

HBox(children=(FigureWidget({
    'data': [{'colorscale': [[0.0, '#440154'], [0.1111111111111111, '#482878'],
…

As described in the introduction of the evaluation for the clusters dataset it is important that 10 distinct are visible in the visualization. For the high percentile P-matrices we can observe distinct clusters but they are to few and the borders are not sharp/clearly observable. 

In [22]:
clusters_small_um = clusters_small_viz.umatrix(title='U-matrix')

clusters_small_usm1 = clusters_small_viz.usmatrix(idata=clusters, perc=1, title='U*-matrix (1 percentile)')
clusters_small_usm5 = clusters_small_viz.usmatrix(idata=clusters, perc=5, title='U*-matrix (5 percentile)')
clusters_small_usm18 = clusters_small_viz.usmatrix(idata=clusters, perc=18, title='U*-matrix (18 percentile)')
clusters_small_usm50 = clusters_small_viz.usmatrix(idata=clusters, perc=50, title='U*-matrix (50 percentile)')

display(HBox([clusters_small_um, clusters_small_usm1, clusters_small_usm5, clusters_small_usm18, clusters_small_usm50]))

HBox(children=(FigureWidget({
    'data': [{'colorscale': [[0.0, '#440154'], [0.1111111111111111, '#482878'],
…

For the U-Matrix we do observe similar issues as with the P-Matrix. There are however clear borders visible.

In [23]:
clusters_small_qe = clusters_small_viz.qe(idata=clusters, title='Quantization error ')
clusters_small_mqe = clusters_small_viz.mqe(idata=clusters, title='Mean quantization error ')

display(HBox([clusters_small_qe, clusters_small_mqe]))

HBox(children=(FigureWidget({
    'data': [{'colorscale': [[0.0, '#440154'], [0.1111111111111111, '#482878'],
…

The quantization error implementation yields very good results. The ten clusters are nicely visible and clearly seperated.

### 2.4 Clusters Dataset Large SOM

In [25]:
# Train
clusters_large = som.MiniSom(large_m, large_n, clusters_dim, sigma=0.8, learning_rate=0.7)
clusters_large.train_random(clusters, 5000, verbose=True)

# Visualizaton
clusters_large_viz = SomViz(clusters_large._weights.reshape(-1, clusters_dim), large_m, large_n)

 [ 5000 / 5000 ] 100% 0.11133 it/s

In [26]:
clusters_large_pm1 = clusters_large_viz.pmatrix(idata=clusters, perc=1, title='P-matrix (1 percentile)')
clusters_large_pm5 = clusters_large_viz.pmatrix(idata=clusters, perc=5, title='P-matrix (5 percentile)')
clusters_large_pm18 = clusters_large_viz.pmatrix(idata=clusters, perc=18, title='P-matrix (18 percentile)')
clusters_large_pm50 = clusters_large_viz.pmatrix(idata=clusters, perc=50, title='P-matrix (50 percentile)')

display(HBox([clusters_large_pm1, clusters_large_pm5, clusters_large_pm18, clusters_large_pm50]))

HBox(children=(FigureWidget({
    'data': [{'colorscale': [[0.0, '#440154'], [0.1111111111111111, '#482878'],
…

In [27]:
clusters_large_um = clusters_large_viz.umatrix(title='U-matrix')

clusters_large_usm1 = clusters_large_viz.usmatrix(idata=clusters, perc=1, title='U*-matrix (1 percentile)')
clusters_large_usm5 = clusters_large_viz.usmatrix(idata=clusters, perc=5, title='U*-matrix (5 percentile)')
clusters_large_usm18 = clusters_large_viz.usmatrix(idata=clusters, perc=18, title='U*-matrix (18 percentile)')
clusters_large_usm50 = clusters_large_viz.usmatrix(idata=clusters, perc=50, title='U*-matrix (50 percentile)')

display(HBox([clusters_large_um, clusters_large_usm1, clusters_large_usm5]))
display(HBox([clusters_large_usm18, clusters_large_usm50]))

HBox(children=(FigureWidget({
    'data': [{'colorscale': [[0.0, '#440154'], [0.1111111111111111, '#482878'],
…

HBox(children=(FigureWidget({
    'data': [{'colorscale': [[0.0, '#440154'], [0.1111111111111111, '#482878'],
…

In [28]:
clusters_large_qe = clusters_large_viz.qe(idata=clusters, title='Quantization error ')
clusters_large_mqe = clusters_large_viz.mqe(idata=clusters, title='Mean quantization error ')

display(HBox([clusters_large_qe, clusters_large_mqe]))

HBox(children=(FigureWidget({
    'data': [{'colorscale': [[0.0, '#440154'], [0.1111111111111111, '#482878'],
…

As for the large SOM for the clusters data set we do observe similar behaviour then for the small SOM. However the different clusters are harder to make out for the quantization error simply by there size.

### 2.5. Chain Link Dataset Pre-Trained SOM

In [29]:
# Read from SOMToolBox
chainlink = SOMToolBox_Parse('datasets/chainlink.vec')
chainlink, _, _, _ = chainlink.read_weight_file()

chainlink_pretrained = SOMToolBox_Parse('datasets/chainlink.wgt.gz')
chainlink_pretrained, chainlink_dim, chainlink_n, chainlink_m = chainlink_pretrained.read_weight_file()

# Visualizaton
chainlink_viz = SomViz(chainlink_pretrained.values.reshape(-1, chainlink_dim), chainlink_m, chainlink_n)

In [30]:
chainlink_pm1 = chainlink_viz.pmatrix(color='reds', idata=chainlink, perc=1, title='P-matrix (1 percentile)')
chainlink_pm5 = chainlink_viz.pmatrix(color='reds', idata=chainlink, perc=5, title='P-matrix (5 percentile)')
chainlink_pm18 = chainlink_viz.pmatrix(color='reds', idata=chainlink, perc=18, title='P-matrix (18 percentile)')
chainlink_pm50 = chainlink_viz.pmatrix(color='reds', idata=chainlink, perc=50, title='P-matrix (50 percentile)')

display(HBox([chainlink_pm1, chainlink_pm5, chainlink_pm18, chainlink_pm50]))

HBox(children=(FigureWidget({
    'data': [{'colorscale': [[0.0, 'rgb(255,245,240)'], [0.125,
                …

<table><tr><td><img src="images/PMatrix__palette=Redscale32_interpolated__perc=1.png" width="350"/></td><td><img src="images/PMatrix__palette=Redscale32_interpolated__perc=5.png" width="350"/></td><td><img src="images/PMatrix__palette=Redscale32_interpolated__perc=18.png" width="350"/></td><td><img src="images/PMatrix__palette=Redscale32_interpolated__perc=50.png" width="350"/></td></tr></table>

In [31]:
chainlink_um = chainlink_viz.umatrix(color='reds', title='U-matrix')

chainlink_usm1 = chainlink_viz.usmatrix(color='reds_r', idata=chainlink, perc=1, title='U*-matrix (1 percentile)')
chainlink_usm5 = chainlink_viz.usmatrix(color='reds_r', idata=chainlink, perc=5, title='U*-matrix (5 percentile)')
chainlink_usm18 = chainlink_viz.usmatrix(color='reds_r', idata=chainlink, perc=18, title='U*-matrix (18 percentile)')
chainlink_usm50 = chainlink_viz.usmatrix(color='reds_r', idata=chainlink, perc=50, title='U*-matrix (50 percentile)')

display(HBox([chainlink_um, chainlink_usm1, chainlink_usm5]))
display(HBox([chainlink_usm18, chainlink_usm50]))

HBox(children=(FigureWidget({
    'data': [{'colorscale': [[0.0, 'rgb(255,245,240)'], [0.125,
                …

HBox(children=(FigureWidget({
    'data': [{'colorscale': [[0.0, 'rgb(103,0,13)'], [0.125, 'rgb(165,15,21)'],
…

<table><tr><td><img src="images/UMatrix__palette=Redscale32_interpolated.png" width="350"/></td><td><img src="images/UStarMatrix__palette=Redscale32_interpolated__perc=1.png" width="350"/></td><td><img src="images/UStarMatrix__palette=Redscale32_interpolated__perc=5.png" width="350"/></td><td><img src="images/UStarMatrix__palette=Redscale32_interpolated__perc=18.png" width="350"/></td><td><img src="images/UStarMatrix__palette=Redscale32_interpolated__perc=50.png" width="350"/></td></tr></table>

In [32]:
chainlink_qe = chainlink_viz.qe(color='reds', idata=chainlink, title='Quantization error ')
chainlink_mqe = chainlink_viz.mqe(color='reds', idata=chainlink, title='Mean quantization error ')

display(HBox([chainlink_qe, chainlink_mqe]))

HBox(children=(FigureWidget({
    'data': [{'colorscale': [[0.0, 'rgb(255,245,240)'], [0.125,
                …

<table><tr><td><img src='images/QuantizationErr__palette=Redscale32.png' width='350'></td><td><img src='images/MeanQuantizationErr__palette=Redscale32.png' width='350'></td></tr></table>

All of the above visualizations do look the same wheter they are coming from the Java SOMToolbox or if they are coming from our implementations which verifies the correctness of the implementation. There was only one issue with generation the quantization error visualization through the Java SOMToolbox as desribed in the TUWEL forum. We tried to rebuild the Toolbox with the suggested fix but finally couldn't run it.