>In statistics and machine learning, data standardization is a process of converting data to z-score values based on the mean and standard deviation of the data.<br>
>One of the first steps in feature engineering for many machine learning models is ensuring that the data is scaled properly.<br>
Some models, such as linear regression, KNN, and SVM, for example, are heavily affected by features with different scales.<br>
Other, such as decision trees, bagging, and boosting algorithms generally do not require any data scaling.

Two of the most popular feature scaling techniques are:
1. Z-Score Standardization
2. Min-Max Normalization

![image-3.png](attachment:image-3.png) ![image-4.png](attachment:image-4.png)

In [16]:
import pandas as pd 
df = pd.read_csv(r"C:\Users\visha\OneDrive\Scaler Academy\Datasets\churn_logistic.csv")
df.head()

Unnamed: 0,Account Length,VMail Message,Day Mins,Eve Mins,Night Mins,Intl Mins,CustServ Calls,Intl Plan,VMail Plan,Day Calls,...,Eve Calls,Eve Charge,Night Calls,Night Charge,Intl Calls,Intl Charge,State,Area Code,Phone,Churn
0,128,25,265.1,197.4,244.7,10.0,1,0,1,110,...,99,16.78,91,11.01,3,2.7,KS,415,382-4657,0
1,107,26,161.6,195.5,254.4,13.7,1,0,1,123,...,103,16.62,103,11.45,3,3.7,OH,415,371-7191,0
2,137,0,243.4,121.2,162.6,12.2,0,0,0,114,...,110,10.3,104,7.32,5,3.29,NJ,415,358-1921,0
3,84,0,299.4,61.9,196.9,6.6,2,1,0,71,...,88,5.26,89,8.86,7,1.78,OH,408,375-9999,0
4,75,0,166.7,148.3,186.9,10.1,3,1,0,113,...,122,12.61,121,8.41,3,2.73,OK,415,330-6626,0


In [17]:
from sklearn.preprocessing import MinMaxScaler, StandardScaler
## Data Preparation , Standardiation, MinMax scaling 
# Here we are using Min Max scaling 
mm = MinMaxScaler()

In [18]:
cols = df.columns
df[cols] = mm.fit_transform(df[cols])

ValueError: could not convert string to float: 'KS'

In [19]:
# Drop the columns State, Area code and Phone , they are not useful in prediction 
df = df.drop(['State', 'Area Code', 'Phone'], axis = 1)

In [31]:
mm_scale_df = df.copy()
z_scale_df = df.copy()

In [28]:
cols = df.columns
mm_scale_df[cols] = mm.fit_transform(mm_scale_df[cols])

In [29]:
## Verify scaled data 
mm_scale_df.head()

Unnamed: 0,Account Length,VMail Message,Day Mins,Eve Mins,Night Mins,Intl Mins,CustServ Calls,Intl Plan,VMail Plan,Day Calls,Day Charge,Eve Calls,Eve Charge,Night Calls,Night Charge,Intl Calls,Intl Charge,Churn
0,0.524793,0.490196,0.755701,0.542755,0.59575,0.5,0.111111,0.0,1.0,0.666667,0.755701,0.582353,0.542866,0.408451,0.595935,0.15,0.5,0.0
1,0.438017,0.509804,0.460661,0.537531,0.62184,0.685,0.111111,0.0,1.0,0.745455,0.460597,0.605882,0.53769,0.492958,0.622236,0.15,0.685185,0.0
2,0.561983,0.0,0.693843,0.333242,0.374933,0.61,0.0,0.0,0.0,0.690909,0.69383,0.647059,0.333225,0.5,0.375374,0.25,0.609259,0.0
3,0.342975,0.0,0.853478,0.170195,0.467187,0.33,0.222222,1.0,0.0,0.430303,0.853454,0.517647,0.170171,0.394366,0.467424,0.35,0.32963,0.0
4,0.305785,0.0,0.4752,0.407754,0.44029,0.505,0.333333,1.0,0.0,0.684848,0.475184,0.717647,0.407959,0.619718,0.440526,0.15,0.505556,0.0


In [30]:
df.head()

Unnamed: 0,Account Length,VMail Message,Day Mins,Eve Mins,Night Mins,Intl Mins,CustServ Calls,Intl Plan,VMail Plan,Day Calls,Day Charge,Eve Calls,Eve Charge,Night Calls,Night Charge,Intl Calls,Intl Charge,Churn
0,128,25,265.1,197.4,244.7,10.0,1,0,1,110,45.07,99,16.78,91,11.01,3,2.7,0
1,107,26,161.6,195.5,254.4,13.7,1,0,1,123,27.47,103,16.62,103,11.45,3,3.7,0
2,137,0,243.4,121.2,162.6,12.2,0,0,0,114,41.38,110,10.3,104,7.32,5,3.29,0
3,84,0,299.4,61.9,196.9,6.6,2,1,0,71,50.9,88,5.26,89,8.86,7,1.78,0
4,75,0,166.7,148.3,186.9,10.1,3,1,0,113,28.34,122,12.61,121,8.41,3,2.73,0


In [33]:
z_scale_df.head()

Unnamed: 0,Account Length,VMail Message,Day Mins,Eve Mins,Night Mins,Intl Mins,CustServ Calls,Intl Plan,VMail Plan,Day Calls,Day Charge,Eve Calls,Eve Charge,Night Calls,Night Charge,Intl Calls,Intl Charge,Churn
0,128,25,265.1,197.4,244.7,10.0,1,0,1,110,45.07,99,16.78,91,11.01,3,2.7,0
1,107,26,161.6,195.5,254.4,13.7,1,0,1,123,27.47,103,16.62,103,11.45,3,3.7,0
2,137,0,243.4,121.2,162.6,12.2,0,0,0,114,41.38,110,10.3,104,7.32,5,3.29,0
3,84,0,299.4,61.9,196.9,6.6,2,1,0,71,50.9,88,5.26,89,8.86,7,1.78,0
4,75,0,166.7,148.3,186.9,10.1,3,1,0,113,28.34,122,12.61,121,8.41,3,2.73,0


### Implement Z-Score

In [14]:
# Calculate the z-score from with scipy
import scipy.stats as stats
values = [4,5,6,6,6,7,8,12,13,13,14,18]

zscores = stats.zscore(values)
print(zscores)

[-1.2493901  -1.01512945 -0.78086881 -0.78086881 -0.78086881 -0.54660817
 -0.31234752  0.62469505  0.85895569  0.85895569  1.09321633  2.0302589 ]


In [35]:
z_scale_df['Account Length'] = stats.zscore(z_scale_df['Account Length'] )

In [37]:
z_scale_df

Unnamed: 0,Account Length,VMail Message,Day Mins,Eve Mins,Night Mins,Intl Mins,CustServ Calls,Intl Plan,VMail Plan,Day Calls,Day Charge,Eve Calls,Eve Charge,Night Calls,Night Charge,Intl Calls,Intl Charge,Churn
0,0.672450,25,265.1,197.4,244.7,10.0,1,0,1,110,45.07,99,16.78,91,11.01,3,2.70,0
1,0.141137,26,161.6,195.5,254.4,13.7,1,0,1,123,27.47,103,16.62,103,11.45,3,3.70,0
2,0.900156,0,243.4,121.2,162.6,12.2,0,0,0,114,41.38,110,10.30,104,7.32,5,3.29,0
3,-0.440777,0,299.4,61.9,196.9,6.6,2,1,0,71,50.90,88,5.26,89,8.86,7,1.78,0
4,-0.668482,0,166.7,148.3,186.9,10.1,3,1,0,113,28.34,122,12.61,121,8.41,3,2.73,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5695,3.101309,0,171.5,160.0,212.4,5.0,1,1,0,99,29.16,103,13.60,102,9.56,2,1.35,1
5696,0.748352,0,131.6,179.3,251.2,15.5,1,0,0,95,22.37,109,15.24,129,11.30,3,4.19,1
5697,0.773653,0,291.2,234.2,191.7,8.9,1,0,0,104,49.50,132,19.91,87,8.63,3,2.40,1
5698,-0.035967,0,113.3,197.9,284.5,11.7,4,0,0,96,19.26,89,16.82,93,12.80,2,3.16,1
