# Feature Scaling
## Feature scaling is the process of normalising the range of features in a dataset.
### Feature scaling is important because Variables with bigger magnitude / larger value range dominate over those with smaller magnitude / value range.
### Also , Gradient descent converges faster when features are on similar scales

Some techniques for feature scaling are - 

1. Standardisation.
2. Mean Normalisation. 
3. MinMax Scaling
4. MaxAbsScaling

## Standardisation. 

Standardisation involves centering the variable at zero, and standardising the variance to 1. The procedure involves subtracting the mean of each observation and then dividing by the standard deviation:

### z = (x - xmean) / stddev

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn import preprocessing 

In [2]:
df = pd.read_csv('Data.csv')
X = df.iloc[:, :-1].values
y = df.iloc[:, -1].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

In [None]:
df.columns

In [3]:
sc = preprocessing.StandardScaler()

In [None]:
#X_train[: , 1:] #All rows, 2 columns 
#x = X_train[: , 1:].values

In [4]:
X_train[: , 1:]= sc.fit_transform(X_train[: , 1:])

In [5]:
X_test[: , 1:]= sc.transform(X_test[: , 1:])

In [6]:
X_test , X_train

(array([['Germany', -1.3578475614000274, -0.8277899615678128],
        ['Germany', 1.8641636012441045, 2.02036871975873]], dtype=object),
 array([['Germany', 0.2531580199220386, nan],
        ['France', -0.23014365447458116, 0.44897082661305115],
        ['Spain', -1.841149235796647, -1.4170641714974423],
        ['Spain', nan, -1.0242146982110225],
        ['France', 1.5419624849796914, 1.62751924647231],
        ['Spain', -0.06904309634237459, -0.14030338331657835],
        ['France', 0.8975602524508649, 0.9400326682210757],
        ['France', -0.5523447707389944, -0.4349404882813931]], dtype=object))

In [10]:
X_test_scaled =pd.DataFrame(data = X_test,columns = ['Country' , 'Age' , 'Salary'])
X_train_scaled = pd.DataFrame(data = X_train,columns = ['Country' , 'Age' , 'Salary'])

## MinMax Scaling 
 
Minimum and maximum scaling squeezes the values between 0 and 1. It subtracts the minimum value from all the observations, and then divides it by the value range:

### X_scaled = (X - X.min / (X.max - X.min)

In [14]:
scaler = preprocessing.MinMaxScaler()

In [16]:
df1 = df.copy()

In [17]:
x1 = df1.iloc[:, :-1].values
y1 = df1.iloc[:, -1].values

In [18]:
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

In [19]:
x_train[: , 1:]= scaler.fit_transform(x_train[: , 1:])

In [20]:
x_test[: , 1:]= scaler.fit_transform(x_test[: , 1:])

In [21]:
x_test , x_train

(array([['Germany', 0.0, 0.0],
        ['Germany', 1.0, 1.0]], dtype=object),
 array([['Germany', 0.6190476190476191, nan],
        ['France', 0.4761904761904763, 0.612903225806452],
        ['Spain', 0.0, 0.0],
        ['Spain', nan, 0.12903225806451624],
        ['France', 1.0, 1.0],
        ['Spain', 0.5238095238095237, 0.4193548387096775],
        ['France', 0.8095238095238093, 0.774193548387097],
        ['France', 0.38095238095238093, 0.3225806451612905]], dtype=object))