#**Data Normalization in Data Mining**
Normalization is used to scale the data of an attribute so that it falls in a smaller range, such as -1.0 to 1.0 or 0.0 to 1.0. It helps certain machine learning agorithms avoid bias in their predictions. Many machine learning algorithms require the input attributes to be scaled as the cost function used to optimize the weights/parameters will be influenced by values from differing scales. **It is not a prerequisite for all algorithms though, and should only be used when required. Currently, there is a fashion in the machine learning world to do it as a mater of course. Any transformation approach will reduce the amount of information arising from the variable in question**.

Situations where you may want to standarise are:

* The variables are measuring different physical quantities

* Numeric values are on vastly different scales of magnitude

* There is no evidence that variables with high variation should be considered more important.

Situations where you will not:

* Where variables are the same physical quantity and are roughly the same magnitude
* you should not standarise varibles that do not change between samples. It may be worthwhile excluding them

* if you have such physically related variables, your measurement noise may be roughly the same for all variables, but the signal intensity varies much more. I.e. variables with low values have higher relative noise. Standardizing would blow up the noise. In other words, you may have to decide whether you want relative or absolute noise to be standardized.


#**Min-Max Normalization**
Mix-Max normalization uses the max and min values of series in order to convert the series to a series of values between 0 and 1.

A Min-Max scaling is typically done using the following equation:

>> $X^{'}=\frac{X−X_{min}}{X_{max}−X_{min}}$






The code below demonstrates Min-Max noormalisation using both pandas and numpy.


In [None]:
import pandas as pd

s1 = pd.Series([1, 2, 3, 4, 5, 6], index=(range(6)))
s2 = pd.Series([10, 9, 8, 7, 6, 5], index=(range(6)))
df = pd.DataFrame(s1, columns=['s1'])
df['s2'] = s2
df

In [None]:
from mlxtend.preprocessing import minmax_scaling

minmax_scaling(df, columns=['s1', 's2'])

In [None]:
import numpy as np

X = np.array([[1, 10], [2, 9], [3, 8],
              [4, 7], [5, 6], [6, 5]])
X

In [None]:
from mlxtend.preprocessing import minmax_scaling

minmax_scaling(X, columns=[0, 1])

# **Z-Score Normalisation**

Z-score normalization converts a variable to a standard normal distribution, using the following formula:

>> $X^{'}=\frac{X−\bar{X}}{ \sigma}$

where $\bar{X}$ is the series average and $\sigma$ is the sample standard deviation.
The one benifit to this technique is outliers will have less impact that the other 2 techniques. It will also center the attribute so the interpretation of the estimated weights/parameter estimates will change in your analysis.

</br>
The following code shows how we can do this.


In [None]:
from scipy import stats
import numpy as np

b = np.array([ 0.7972,  0.0767,  0.4383,  0.7866,  0.8091,  0.1954,
                   0.6307, 0.6599,  0.1065,  0.0508])
np.round(stats.zscore(b),2)


# **Normalization by Decimal Scaling**

With Normalization by decimal scaling we are basically taking the largest number in our dataset divided by 10 to the power of j such that this number is less than 1. It can be described as follows:

>> $X^{'}=\frac{X}{ 10^j}$ where max $X^{'}<1$

In [None]:

import math
import numpy as np
import pandas as pd
X=pd.Series([11,2.4, -100])
j = round(math.log10(max(np.abs(X))))+1
print(j)
X_new=X.apply(lambda num:num/(10**j))
print(X_new)

# **Review**
Scaling or standardization should only be done unless it's absolutely necessary. If the algorithm you use requires it then yes go ahead. So for example if you are using a neural network then you will have to do it. However, if you are using a Linear Regression model then there is absolutely no need to do it. In fact it can casue quite a lot of confusion when you are trying to interpret parameter estimates from interaction variables, (Preacher, 2003).


Can you point out a drawback to where these scaling process have a major weakness? Which techniques will be less effected by outliers?