## Example of how to use HSNE Python parser for reading HSNE hierarchies
Limited explanation on how an HSNE hierarchy is structured

In [7]:
from HSNE_parser import read_HSNE_binary
import pandas as pd
import numpy as np

Inside /High_dimensional_inspector, run:

    ../build/applications/command_line_tools/hsne_cmd MNIST_1000.bin 1000 784 -a mnist
    

To generate mnist.hsne, an HSNE hierarchy created from 1000 samples of the MNIST dataset. This sample is provided inside HSNE-clustering/High_dimensional_inspector/data.

Read the hierarchy from a HSNE binary file and print progress:

In [8]:
hsne = read_HSNE_binary(filename="./sample_data/mnis_aoi.hsne", verbose=True)

Number of scales 3
Start reading first scale of size 1000
Done reading first scale..

Next scale: 1
Scale size: 236
Reading transmatrix..
Reading landmarks of scale to original data..
Reading landmarks to previous scale..
Reading landmark weights..
Reading previous scale to current scale..
Reading area of influence..

Next scale: 2
Scale size: 29
Reading transmatrix..
Reading landmarks of scale to original data..
Reading landmarks to previous scale..
Reading landmark weights..
Reading previous scale to current scale..
Reading area of influence..
Total time spent parsing hierarchy and building objects: 0.068511


In [9]:
# How many scales do I have?
hsne.num_scales

3

1 data scale and 2 landmarkscales

In [10]:
for scale in hsne:
    print(scale)

HSNE datascale 0 with 1000 datapoints
HSNE subscale 1 with 236 datapoints
HSNE subscale 2 with 29 datapoints


Intra scale similarities are stored in scale.tmatrix, a sparse matrix of size: scalesize * scalesize:

In [11]:
#scale 0 is just a k=30 knn graph
hsne[0].tmatrix

<1000x1000 sparse matrix of type '<class 'numpy.float64'>'
	with 30000 stored elements in COOrdinate format>

In [12]:
hsne[1].tmatrix

<236x236 sparse matrix of type '<class 'numpy.float64'>'
	with 11445 stored elements in COOrdinate format>

Inter scale similarities are stored in scale.area_of_influence (datascale does not have an area of inluence). A sparse matrix of size previous scale * current scale:

In [13]:
hsne[1].area_of_influence

<1000x236 sparse matrix of type '<class 'numpy.float64'>'
	with 37321 stored elements in Compressed Sparse Column format>

Landmark information is stored in each scale (not the datascale) of the hsne object:

In [14]:
# Which original datapoint is each landmark?
# Landmark #0 in this scale was originally datapoint #1
hsne[1].lm_to_original[:20]

[1, 3, 8, 14, 17, 23, 26, 39, 40, 42, 43, 57, 72, 74, 95, 96, 97, 98, 102, 103]

In [15]:
# Which landmark in this scale was which landmark  in the previous scale?
# Landmark #0 in this scale was landmark #5 in the previous scale
hsne[2].lm_to_previous

[5,
 6,
 13,
 16,
 17,
 30,
 31,
 35,
 42,
 43,
 61,
 75,
 79,
 104,
 106,
 114,
 116,
 117,
 118,
 136,
 139,
 140,
 156,
 182,
 192,
 194,
 197,
 214,
 233]

### important !
For each landmark on the previous scale, which point on this scale best represents it?

So original datapoint #0 is best represented by landmark #192 on scale 1.
Also, landmark #0 on scale 1 is best represented by landmark #12 on scale 2:


In [16]:

print(hsne[1].best_representatives[0:10])
print(hsne[2].best_representatives[0:10])

[192 153 106   5 231 229  37  31 104 225]
[12  0 13 13  3  0 22 18 13 22]


In [17]:
hsne[2].best_representatives

array([12,  0, 13, 13,  3,  0, 22, 18, 13, 22, 22, 28, 13,  4, 12, 22,  3,
        2, 13, 23, 13, 13, 28, 20, 12, 13, 24, 18, 13, 28, 15,  8, 16, 28,
       12, 20,  4, 13, 24, 24,  3, 27,  6, 19,  1, 23, 21,  6, 23,  3,  1,
        6, 28, 22, 15, 23,  3,  1, 25,  9,  4, 17,  1, 28, 13,  6,  3, 28,
       23, 12, 23, 23, 13, 18, 22, 27, 21, 28,  8, 12,  6, 28, 26, 23, 21,
        3, 11, 21, 20, 23,  0, 13, 13, 18,  9, 28, 21, 23, 28, 28, 23, 28,
       18, 27, 13, 12, 28,  4, 28, 23,  1, 28, 24, 28, 18, 28, 20, 10, 15,
        8, 22, 27,  3, 25, 18, 13, 13, 21, 24, 23,  0, 13, 23,  0, 25, 24,
       20, 24, 25, 16, 21,  7,  3, 13,  0, 20,  1,  4, 23, 13, 22, 22, 13,
       12,  5, 23, 23, 12, 13, 13, 24, 23,  3, 12,  6, 23, 28, 13,  8, 15,
       23,  3, 28, 22,  6,  5, 12,  0, 23,  1,  3, 25, 22, 27, 13, 12, 27,
        5,  8,  3, 24, 23, 24, 20, 27,  6, 20, 18, 22,  3,  5,  3, 25, 25,
       24, 13, 23,  5, 24, 23, 12,  6, 23,  8, 25, 23,  5, 22, 27,  1,  5,
       24, 27,  3, 12, 28

The previous cell does not show which original datapoint is best represented by points on scale 2, these best representatives can be propagated through the scales with the get_datascale_mappings() function of the full hsne object.

The argument passed to get_datascale_mappings() is the scale for which you would like the mapping, scale 1 is simply the same as hsne[1].best_representatives but as a dictionary. For scale 2 the mapping gets propagated down through scale 1's best representatives.


In [18]:
list(hsne.get_datascale_mappings(1).values())[:30]

[192,
 153,
 106,
 5,
 231,
 229,
 37,
 31,
 104,
 225,
 188,
 137,
 17,
 216,
 18,
 70,
 50,
 162,
 197,
 29,
 139,
 185,
 77,
 1,
 125,
 42,
 6,
 42,
 182,
 129]

Original datapoint #0 is best represented by landmark #24 on scale 2:

In [19]:
list(hsne.get_datascale_mappings(2).values())[:30]

[24,
 12,
 28,
 0,
 1,
 5,
 13,
 8,
 13,
 28,
 8,
 24,
 2,
 5,
 13,
 23,
 1,
 3,
 18,
 28,
 16,
 12,
 28,
 0,
 13,
 6,
 22,
 6,
 22,
 23]