# Data Normalization

Steps 
- Feature scaling
- min-max method
- Z-score (standard score)
- Log transformation

In [1]:
# importing dependencies
import numpy as np
import pandas as pd 
import seaborn as sns 

In [2]:
# loading dataset
ds2 = sns.load_dataset("titanic")

## Data Normalization 


In [3]:
dn2 = ds2[['age','fare']]
dn3 = ds2[['age','fare']]
dn4 = ds2[['age','fare']]
dn5 = ds2[['age','fare']]
dn2.head()

Unnamed: 0,age,fare
0,22.0,7.25
1,38.0,71.2833
2,26.0,7.925
3,35.0,53.1
4,35.0,8.05


### Simple Method of Normalization 
- Feature scaling
    * x(new)=x(old)/x(max)
- min-max method
    - x(new)=(x(old)-x(min))/(x(max)-x(min))
- Z-score (standard score)
    * x(new)=x(old)-x(min)/x(std)
- Log transformation
    * x(new)=log(old)

## Feature scaling
    * x(new)=x(old)/x(max)

In [4]:
# feature scaling
dn2['fare'] = dn2['fare']/dn2['fare'].max()
dn2.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dn2['fare'] = dn2['fare']/dn2['fare'].max()


Unnamed: 0,age,fare
0,22.0,0.014151
1,38.0,0.139136
2,26.0,0.015469
3,35.0,0.103644
4,35.0,0.015713


## min-max method
    * x(new)=(x(old)-x(min))/(x(max)-x(min))

In [5]:
# mim-max Method
dn3['fare'] = (dn3['fare']-dn3['fare'].min())/(dn3['fare'].max()-dn3['fare'].min())
dn3.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dn3['fare'] = (dn3['fare']-dn3['fare'].min())/(dn3['fare'].max()-dn3['fare'].min())


Unnamed: 0,age,fare
0,22.0,0.014151
1,38.0,0.139136
2,26.0,0.015469
3,35.0,0.103644
4,35.0,0.015713


In [6]:
# min-max Method
dn3['age'] = (dn3['age']-dn3['age'].min())/(dn3['age'].max()-dn3['age'].min())
dn3.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dn3['age'] = (dn3['age']-dn3['age'].min())/(dn3['age'].max()-dn3['age'].min())


Unnamed: 0,age,fare
0,0.271174,0.014151
1,0.472229,0.139136
2,0.321438,0.015469
3,0.434531,0.103644
4,0.434531,0.015713


## Z-score (standard score)
    * x(new)=x(old)-x(min)/x(std)

In [7]:
# Z-score Method
dn4['fare'] = (dn4['fare']-dn4['fare'].mean())/dn4['fare'].std()
dn4.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dn4['fare'] = (dn4['fare']-dn4['fare'].mean())/dn4['fare'].std()


Unnamed: 0,age,fare
0,22.0,-0.502163
1,38.0,0.786404
2,26.0,-0.48858
3,35.0,0.420494
4,35.0,-0.486064


## Log transformation
    * x(new)=log(old)

In [8]:
# Log Transformation
dn5['fare'] = np.log(dn5['fare'])
dn5.head()

  result = getattr(ufunc, method)(*inputs, **kwargs)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dn5['fare'] = np.log(dn5['fare'])


Unnamed: 0,age,fare
0,22.0,1.981001
1,38.0,4.266662
2,26.0,2.070022
3,35.0,3.972177
4,35.0,2.085672
