# Dimensionality Reduction

The reduce feature reduces the dimensionality of an array or list of arrays.

The default reduction method is Principal Component Analysis, with a variety of models supported.

Supported models: PCA, IncrementalPCA, SparsePCA, MiniBatchSparsePCA, KernelPCA, FastICA, FactorAnalysis, TruncatedSVD, DictionaryLearning, MiniBatchDictionaryLearning, TSNE, Isomap, SpectralEmbedding, LocallyLinearEmbedding, and MDS.

## Import Hypertools

In [2]:
import hypertools as hyp

## Load your data

In this case, we have used one of the sample datasets built into the package.

In [3]:
weights = hyp.load('weights_avg')

We can take a cursory look at the data structure (namely, the type and size of 'weights' and its components).

In [98]:
print("Length: " + str(len(weights))); print("Type: " + str(type(weights))); print(); print('Each element contains a '+ str(type(weights[0][0]))+'s'+' consisting of '+ str(type(weights[0][0][0]))+'s')

Length: 2
Type: <class 'list'>

Each element contains a <class 'numpy.ndarray'>s consisting of <class 'numpy.float32'>s


# Reduce array

Let's look at one array from the dataset above.

In [61]:
print('Array shape: ', weights[0].shape)

weights[0]

Array shape:  (300, 100)


array([[ 1.54960787,  0.00836554,  0.53535277, ...,  0.57384634,
         1.17235112,  0.3465665 ],
       [ 0.79020196,  0.06576407,  0.12130573, ..., -0.16467266,
         0.57102633, -0.19887382],
       [-0.02059509,  0.06284615,  0.07327231, ..., -0.16656084,
         0.32707059, -0.13113342],
       ..., 
       [-0.4469595 , -0.00795435,  0.66622794, ...,  0.54065955,
        -0.00273777,  1.2662859 ],
       [-0.26116031, -0.19556259,  0.6995216 , ...,  0.32840028,
        -0.11054993,  1.65495527],
       [-0.32953897, -0.07440869,  0.64839333, ...,  0.04698723,
        -0.23190361,  1.67194223]], dtype=float32)

To reduce this single array, simply pass the array to hyp.reduce, as below.

In [62]:
reduced_array = hyp.reduce(weights[0])

print('Reduced array shape: ',reduced_array.shape)
reduced_array

Reduced array shape:  (300, 3)


array([[  6.48068333e+00,   1.40102780e+00,   1.59104919e+00],
       [  2.84769726e+00,   8.96467865e-02,  -8.98378491e-02],
       [  1.48038733e+00,  -1.26063561e+00,  -7.48025596e-01],
       [  1.05186558e+00,  -1.31629324e+00,  -1.00285459e+00],
       [  7.00260162e-01,  -8.85070622e-01,  -9.83820438e-01],
       [  5.33612728e-01,  -7.32676804e-01,  -5.91460764e-01],
       [  2.62964189e-01,  -6.85500383e-01,  -6.77563190e-01],
       [  1.59012482e-01,  -6.71859443e-01,  -7.72416592e-01],
       [ -8.59968290e-02,  -6.79191232e-01,  -8.61991227e-01],
       [  1.36606023e-02,  -7.69169748e-01,  -1.20105219e+00],
       [  4.61160243e-01,  -1.01595330e+00,  -1.14419591e+00],
       [  6.45726144e-01,  -6.87333822e-01,  -8.25198054e-01],
       [  5.96288681e-01,  -6.04023159e-01,  -4.66835350e-01],
       [  5.54403141e-02,  -3.93152535e-01,  -2.82153070e-01],
       [ -8.27722907e-01,  -5.17250299e-01,  -3.39220464e-01],
       [ -7.25380301e-01,  -2.73191214e-01,  -2.0019578

We can see that the data has been reduced from 100 features to 3 features.


# Reduce list of arrays

A list or numpy array of multiple arrays can also be reduced by simply passing to hyp.reduce. Here we show this with two arrays in the weights dataset.

First, let's examine the arrays in the weights dataset (below).

In [85]:
print('Weights is a ', type(weights), ' containing ', len(weights), ' elements')
print()

print('Element 0 is an ',type(weights[0]), ' with shape ',reduced_arrays[0].shape) 
print('Element 1 is an ',type(weights[0]), ' with shape ',reduced_arrays[1].shape)  
print()

weights

Weights is a  <class 'list'>  containing  2  elements

Element 0 is an  <class 'numpy.ndarray'>  with shape  (300, 3)
Element 1 is an  <class 'numpy.ndarray'>  with shape  (300, 3)



[array([[ 1.54960787,  0.00836554,  0.53535277, ...,  0.57384634,
          1.17235112,  0.3465665 ],
        [ 0.79020196,  0.06576407,  0.12130573, ..., -0.16467266,
          0.57102633, -0.19887382],
        [-0.02059509,  0.06284615,  0.07327231, ..., -0.16656084,
          0.32707059, -0.13113342],
        ..., 
        [-0.4469595 , -0.00795435,  0.66622794, ...,  0.54065955,
         -0.00273777,  1.2662859 ],
        [-0.26116031, -0.19556259,  0.6995216 , ...,  0.32840028,
         -0.11054993,  1.65495527],
        [-0.32953897, -0.07440869,  0.64839333, ...,  0.04698723,
         -0.23190361,  1.67194223]], dtype=float32),
 array([[ 1.19141841, -0.06547415,  0.44853967, ...,  0.06766402,
          1.00010574,  0.54140496],
        [ 0.63233513,  0.07012342,  0.02000804, ...,  0.00565576,
          0.44873011, -0.11082365],
        [ 0.32301277,  0.31087267,  0.02237374, ..., -0.29169708,
          0.29740322, -0.19899368],
        ..., 
        [-0.22067139, -0.09823959,  0

In [84]:
reduced_arrays = hyp.reduce(weights)

print('Shape of first reduced array: ',reduced_arrays[0].shape)
print('Shape of second reduced array: ',reduced_arrays[1].shape)

reduced_arrays

Shape of first reduced array:  (300, 3)
Shape of second reduced array:  (300, 3)


[array([[  6.57635307e+00,   1.02127421e+00,   7.33844161e-01],
        [  2.81822491e+00,  -2.63239592e-02,  -6.31293178e-01],
        [  1.33429205e+00,  -1.27908039e+00,  -1.07068455e+00],
        [  8.80583942e-01,  -1.26772261e+00,  -1.12553036e+00],
        [  5.68728268e-01,  -8.71274769e-01,  -1.02367878e+00],
        [  4.43763703e-01,  -6.52167678e-01,  -4.61290509e-01],
        [  1.87882468e-01,  -6.41686857e-01,  -5.18207669e-01],
        [  5.73547669e-02,  -6.86729789e-01,  -6.19583011e-01],
        [ -1.80774406e-01,  -6.29606366e-01,  -7.03924298e-01],
        [ -9.42547917e-02,  -7.07274795e-01,  -1.21734357e+00],
        [  3.30025136e-01,  -9.53989089e-01,  -1.11944926e+00],
        [  6.04858220e-01,  -6.67118311e-01,  -8.07462633e-01],
        [  5.50228655e-01,  -6.19037807e-01,  -5.49928904e-01],
        [  2.75523886e-02,  -3.87824208e-01,  -3.63377810e-01],
        [ -9.00222540e-01,  -4.54131037e-01,  -2.56106317e-01],
        [ -7.78993726e-01,  -2.99503654e

We can see that each array has been reduced from 100 features to 3 features, with the number of datapoints unchanged.

# Reduce list of arrays (TSNE)

You can use different reduction methods, by passing a string to the reduce argument.

In the example below, we reduce multiple arrays at once, using TSNE.

Supported reduction algorithms: PCA, IncrementalPCA, SparsePCA, MiniBatchSparsePCA, KernelPCA, FastICA, FactorAnalysis, TruncatedSVD, DictionaryLearning, MiniBatchDictionaryLearning, TSNE, Isomap, SpectralEmbedding, LocallyLinearEmbedding, and MDS

In [93]:
reduced_TSNE = hyp.reduce(weights, reduce='TSNE')

print('Shape of first reduced array: ',reduced_TSNE[0].shape)
print('Shape of second reduced array: ',reduced_TSNE[1].shape)

reduced_TSNE

Shape of first reduced array:  (300, 3)
Shape of second reduced array:  (300, 3)


[array([[  1.91547718e+01,  -7.71365128e+01,  -9.49014435e+01],
        [  1.36020164e+01,  -5.41406975e+01,  -9.62781754e+01],
        [  3.72801056e+01,   3.22934189e+01,  -1.20189835e+02],
        [ -9.37718735e+01,   2.64905033e+01,  -1.34823694e+01],
        [  4.10361061e+01,   4.55372543e+01,  -1.04399658e+02],
        [  3.59200096e+01,   3.56045914e+01,  -8.82849731e+01],
        [  3.30247307e+01,   4.51899529e+01,  -7.75100632e+01],
        [  4.30793648e+01,   4.20329819e+01,  -6.89706116e+01],
        [  4.60715370e+01,   5.59333000e+01,  -6.87379532e+01],
        [  2.91411419e+01,   6.08762741e+01,  -6.26061745e+01],
        [  2.55557938e+01,   4.74457207e+01,  -5.08233376e+01],
        [  3.88723221e+01,   4.65474586e+01,  -4.64939156e+01],
        [  5.22211113e+01,   4.26720467e+01,  -4.75367889e+01],
        [  6.36967850e+01,   4.35067978e+01,  -5.51431198e+01],
        [ -6.03790474e+00,   8.42079544e+01,  -1.01008825e+01],
        [ -8.08094215e+00,   8.63507538e

We can see that each array has been reduced from 100 features to 3 features, with the number of datapoints unchanged.

# Reduce to specified dimension

You may prefer to reduce to a specific number of features, rather than defaulting the three dimensions.

To achieve this, simply pass the number of desired features (as an int) to the ndims argument, as below.

In [95]:
reduced_4 = hyp.reduce(weights, ndims = 4)

print('Shape of first reduced array: ',reduced_4[0].shape)
print('Shape of second reduced array: ',reduced_4[1].shape)

reduced_4

Shape of first reduced array:  (300, 4)
Shape of second reduced array:  (300, 4)


[array([[ 6.56683064,  1.00051618,  0.84644157, -0.90303159],
        [ 2.81139851, -0.03778996, -0.54789877, -0.752294  ],
        [ 1.33057106, -1.28318071, -1.03339791, -0.38748556],
        ..., 
        [ 4.14029932, -2.73296952, -0.53481078,  0.34857911],
        [ 3.8724134 , -2.51208663, -0.82597673,  0.07994185],
        [ 3.02357054, -2.57876945, -1.2606622 ,  0.01703015]], dtype=float32),
 array([[ 5.31391048, -0.51007557,  0.64937019, -0.99470866],
        [ 2.22902942, -0.66501701, -0.31045339, -0.77325559],
        [ 1.85323167, -0.8681494 , -0.23892149, -0.30891752],
        ..., 
        [ 4.83702612, -2.96435165, -0.76967806,  0.20047435],
        [ 3.52211094, -3.43004131, -1.54988134,  0.04829234],
        [ 2.72573423, -3.493891  , -1.93586457, -0.18363315]], dtype=float32)]

We can see that each array has been reduced from 100 features to 4 features, with the number of datapoints unchanged.

# Reduce list of arrays with specific parameters

For finer control of parameters, a dictionary of model parameters may be passed to the reduce argument, in addition to the desired reduction method.

(See scikit-learn specific model docs for details on parameters supported for each model.)

In [101]:
reduced_params = hyp.reduce(weights, reduce={'model' : 'PCA', 'params' : {'whiten' : True}})

In [102]:
reduced_params

[array([[  5.51344490e+00,  -1.02904403e+00,  -1.17412424e+00, ...,
           4.23884600e-01,  -1.36983907e+00,  -8.12435269e-01],
        [  2.35665941e+00,   4.76660300e-03,   7.59431601e-01, ...,
          -4.45827156e-01,   9.79062736e-01,   2.11420918e+00],
        [  1.11892247e+00,   1.23503411e+00,   1.43317711e+00, ...,
           5.63699268e-02,   4.23156530e-01,   1.87854576e+00],
        ..., 
        [  3.51596212e+00,   2.67474270e+00,   7.75908291e-01, ...,
          -1.43955007e-01,   1.26324189e+00,  -9.26563621e-01],
        [  3.28874540e+00,   2.45745897e+00,   1.19165909e+00, ...,
          -1.00185037e+00,  -1.08509517e+00,   9.18616533e-01],
        [  2.57137346e+00,   2.52117801e+00,   1.80269670e+00, ...,
          -1.45784855e+00,  -4.17080522e-01,  -9.61257398e-01]], dtype=float32),
 array([[ 4.46609974,  0.45592675, -0.90053737, ..., -0.3476693 ,
          0.85718226, -0.1351511 ],
        [ 1.87477517,  0.62777382,  0.44412535, ...,  1.55562294,
         