# Normalization

## Definition  
- Mapping the data to (the default is [0,1]) by transforming the original data

## Formula  
$ X^{\prime}=\frac{x-\min }{\max -\min } $  
$ X^{\prime \prime}=X^{\prime} *(m x-m i)+m i $

## API  
- sklearn.preprocessing.MinMaxScaler (feature_range=(0,1)…)
MinMaxScalar.fit_transform(X)
X: numpy array format data [n_samples,n_features]
Return value: Array with the same shape after conversion

## Data calculation

- We perform calculations on the following data,saved in dating.txt

In [3]:
import pandas as pd
from sklearn.preprocessing import MinMaxScaler



In [34]:
def minmax_demo():
    """
    Perfomance of the normalization
    :return: None
    """
    # import the dataset
    
    data = pd.read_csv("./data/dating.txt")
    print("original data: ")
    print(data)
    print()
    
    # 1. insttantiate a MinMaxscaler class
    transfer = MinMaxScaler()
    
    # 2. use the fit_transform,update the data
    data = transfer.fit_transform(data[["milage", "Liters", "Consumtime"]])
    print("After the normalization:")
    print(data)
    
    return None

In [35]:
minmax_demo()

original data: 
     milage     Liters  Consumtime  target
0     40920   8.326976    0.953952       3
1     14488   7.153469    1.673904       2
2     26052   1.441871    0.805124       1
3     75136  13.147394    0.428964       1
4     38344   1.669788    0.134296       1
..      ...        ...         ...     ...
995   11145   3.410627    0.631838       2
996   68846   9.974715    0.669787       1
997   26575  10.650102    0.866627       3
998   48111   9.134528    0.728045       3
999   43757   7.882601    1.332446       3

[1000 rows x 4 columns]

After the normalization:
[[0.44832535 0.39805139 0.56233353]
 [0.15873259 0.34195467 0.98724416]
 [0.28542943 0.06892523 0.47449629]
 ...
 [0.29115949 0.50910294 0.51079493]
 [0.52711097 0.43665451 0.4290048 ]
 [0.47940793 0.3768091  0.78571804]]


# Standardization

## Definition

- By transforming the original data, the data is transformed to a mean value of 0 and a standard deviation of 1

## Formula

$ X^{\prime}=\frac{x-\text { mean }}{\sigma} $

## API

sklearn.preprocessing.StandardScaler()  
After processing, for each column, all data are clustered around the mean 0 and the standard deviation is 1.  
StandardScaler.fit_transform(X)  
X: numpy array format data [n_samples,n_features]  
Return value: Array with the same shape after conversion

## Data calculation

In [25]:
import pandas as pd
from sklearn.preprocessing import StandardScaler

In [32]:
def stand_demo():
    """
    performance of the standardlization
    return None
    """
    # import the dataset
    
    data = pd.read_csv("./data/dating.txt")
    print("original data: ")
    print(data)
    print()
    # 1. Instantiate StandardScaler
    transfer = StandardScaler()
    
    # 2. Transform by fit_transform
    data = transfer.fit_transform(data[["milage", "Liters", "Consumtime"]])
    print("After the standardlization:")
    print(data)
    print("the mean of every feature")
    print(transfer.mean_)
    print("the variance of every feature")
    print(transfer.var_)
    
    return None

In [33]:
stand_demo()

original data: 
     milage     Liters  Consumtime  target
0     40920   8.326976    0.953952       3
1     14488   7.153469    1.673904       2
2     26052   1.441871    0.805124       1
3     75136  13.147394    0.428964       1
4     38344   1.669788    0.134296       1
..      ...        ...         ...     ...
995   11145   3.410627    0.631838       2
996   68846   9.974715    0.669787       1
997   26575  10.650102    0.866627       3
998   48111   9.134528    0.728045       3
999   43757   7.882601    1.332446       3

[1000 rows x 4 columns]

After the standardlization:
[[ 0.33193158  0.41660188  0.24523407]
 [-0.87247784  0.13992897  1.69385734]
 [-0.34554872 -1.20667094 -0.05422437]
 ...
 [-0.32171752  0.96431572  0.06952649]
 [ 0.65959911  0.60699509 -0.20931587]
 [ 0.46120328  0.31183342  1.00680598]]
the mean of every feature
[3.36354210e+04 6.55996083e+00 8.32072997e-01]
the variance of every feature
[4.81628039e+08 1.79902874e+01 2.46999554e-01]
