# Normalization

The normalize feature Z-transforms your data (over columns or rows of an array, or list of arrays).

By default, hypertools normalizes *across* the columns of all lists passed, but also affords the option to normalize columns *within* individual lists. Alternatively, you can also normalize by row. 

The function returns an array or list of arrays where the columns or rows are z-scored (output type same as input type).

This feature is especially useful for data reduction and machine learning techniques that are sensitive to scaling differences between features.

## Import Packages

In [1]:
import hypertools as hyp
import numpy as np

## Generate synthetic data

In [2]:
cluster1 = np.random.multivariate_normal(np.zeros(3), np.eye(3), size=100)
cluster2 = np.random.multivariate_normal(np.zeros(3)+10, np.eye(3), size=100)

data = [cluster1, cluster2]

## Normalizing (Column Default)

Simply pass the raw data to hyp.normalize to z-score across lists.

In [3]:
hyp.normalize(data)

[array([[-1.0596598 , -0.82324679, -1.02435295],
        [-0.85707316, -1.32910039, -1.02841027],
        [-0.7476356 , -1.14074642, -1.1830086 ],
        [-1.0835362 , -1.12927051, -0.91428867],
        [-1.03366927, -1.12735244, -0.82135633],
        [-1.0712721 , -0.76427735, -1.33276169],
        [-0.97489473, -1.47180081, -0.73680799],
        [-1.1599593 , -0.9915049 , -0.96914253],
        [-0.96885698, -1.47054313, -0.56438126],
        [-1.19608322, -0.82935202, -1.14319611],
        [-0.95625906, -1.31356653, -0.91983784],
        [-0.84344611, -0.791575  , -0.92011251],
        [-1.20994237, -0.50074137, -1.07482079],
        [-0.95072833, -0.90804156, -0.97581802],
        [-0.64061124, -1.10792348, -1.16009756],
        [-1.2422604 , -1.02417097, -0.67356059],
        [-1.0468053 , -1.01063129, -1.19287608],
        [-1.03566015, -1.06228817, -0.73566981],
        [-0.81074559, -0.90001251, -1.01198538],
        [-1.12028093, -0.91217794, -0.81029196],
        [-0.79985973

## Normalizing (Specified Cols or Rows)

Or, to specify a different normalization, pass one of the following arguments as a string, as shown in the examples below.

+ 'across' - columns z-scored across passed lists (default)
+ 'within' - columns z-scored within passed lists
+ 'row' - rows z-scored 

In [4]:
hyp.normalize(data, normalize = 'across')

[array([[-1.0596598 , -0.82324679, -1.02435295],
        [-0.85707316, -1.32910039, -1.02841027],
        [-0.7476356 , -1.14074642, -1.1830086 ],
        [-1.0835362 , -1.12927051, -0.91428867],
        [-1.03366927, -1.12735244, -0.82135633],
        [-1.0712721 , -0.76427735, -1.33276169],
        [-0.97489473, -1.47180081, -0.73680799],
        [-1.1599593 , -0.9915049 , -0.96914253],
        [-0.96885698, -1.47054313, -0.56438126],
        [-1.19608322, -0.82935202, -1.14319611],
        [-0.95625906, -1.31356653, -0.91983784],
        [-0.84344611, -0.791575  , -0.92011251],
        [-1.20994237, -0.50074137, -1.07482079],
        [-0.95072833, -0.90804156, -0.97581802],
        [-0.64061124, -1.10792348, -1.16009756],
        [-1.2422604 , -1.02417097, -0.67356059],
        [-1.0468053 , -1.01063129, -1.19287608],
        [-1.03566015, -1.06228817, -0.73566981],
        [-0.81074559, -0.90001251, -1.01198538],
        [-1.12028093, -0.91217794, -0.81029196],
        [-0.79985973

In [5]:
hyp.normalize(data, normalize = 'within')

[array([[-0.44790918,  0.7842488 , -0.24265055],
        [ 0.73640722, -1.74688136, -0.26593402],
        [ 1.37617646, -0.80441816, -1.15311801],
        [-0.58749002, -0.74699633,  0.38896864],
        [-0.29596916, -0.73739895,  0.9222738 ],
        [-0.51579439,  1.07931305, -2.01249694],
        [ 0.04762529, -2.46090877,  1.40746622],
        [-1.03425752, -0.05766115,  0.07418216],
        [ 0.08292182, -2.4546157 ,  2.39696099],
        [-1.24543707,  0.75370014, -0.92464852],
        [ 0.15656897, -1.66915485,  0.35712396],
        [ 0.8160706 ,  0.9427243 ,  0.35554773],
        [-1.32645727,  2.39796306, -0.53226724],
        [ 0.18890147,  0.35996276,  0.03587389],
        [ 2.00183819, -0.64018261, -1.02163987],
        [-1.51538768, -0.22111177,  1.77042023],
        [-0.37276206, -0.15336349, -1.20974395],
        [-0.30760783, -0.41183807,  1.41399781],
        [ 1.00723703,  0.40013757, -0.17167752],
        [-0.80229878,  0.33926565,  0.98576822],
        [ 1.07087548

In [6]:
hyp.normalize(data, normalize = 'row')

[array([[ -1.12812249e+00,   1.30264417e+00,  -1.74521677e-01],
        [  9.77334531e-01,  -1.37388640e+00,   3.96551865e-01],
        [  1.41097972e+00,  -7.88267711e-01,  -6.22712009e-01],
        [ -4.18558929e-01,  -9.60595023e-01,   1.37915395e+00],
        [ -3.09567176e-01,  -1.04025872e+00,   1.34982590e+00],
        [ -1.38453552e-01,   1.28808811e+00,  -1.14963456e+00],
        [  2.38133176e-01,  -1.32632358e+00,   1.08819040e+00],
        [ -1.31190393e+00,   1.98588806e-01,   1.11331512e+00],
        [  6.65053182e-02,  -1.25664253e+00,   1.19013721e+00],
        [ -1.03887405e+00,   1.35043373e+00,  -3.11559684e-01],
        [  4.88069033e-01,  -1.39353070e+00,   9.05461662e-01],
        [ -3.89801018e-02,   1.24376960e+00,  -1.20478950e+00],
        [ -1.00215657e+00,   1.36523201e+00,  -3.63075441e-01],
        [ -1.41415310e+00,   6.95751420e-01,   7.18401679e-01],
        [  1.41322724e+00,  -7.52347396e-01,  -6.60879843e-01],
        [ -1.06128499e+00,  -2.78836126e