# Data Transformation Example

Datafile: "14_Foreign_Exchange_Rates_PureNumeric.csv" - A modified version with only numeric data.

2020-10-20 - Jingwei Liu
<br>2022-10-16 - Jeff Smith

In [None]:
#import the tools:numpy,pandas and matplotlib
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [None]:
# Read the data and get the dataframe
fname = "../data/14_Foreign_Exchange_Rates_PureNumeric.csv"
df = pd.read_csv(fname)
df

In [None]:
# some statistics
df.describe()

In [None]:
# Before applying the normalizations, let's look at Japan's raw data
df['JAPAN - YEN/US$'].describe()

In [None]:
# Min-Max Normaliztion
def MinMax(column):
    minv = column.min()
    maxv = column.max()
    normcolumn = column.apply(lambda x: (x-minv)/(maxv-minv))
    return(normcolumn)
    

In [None]:
# Using Min-Max normalization to transform Japan column
MinMaxJP = MinMax(df['JAPAN - YEN/US$'])
MinMaxJP.describe()

In [None]:
# Z-Score Normalization
def ZScore(column):
    meanv = column.mean()
    stdv = column.std()  # sample std
    normcolumn = column.apply(lambda x: (x-meanv)/stdv)
    return(normcolumn)

In [None]:
# Using Z-Score normalization to transform Japan column
ZScoreJP = ZScore(df['JAPAN - YEN/US$'])
ZScoreJP.describe()

In [None]:
# Decimal Scaling
def DScaling(column, d):
    normcolumn = column.apply(lambda x: x/(10**d))
    return(normcolumn)

In [None]:
# Using Decimal Scaling normalization to transform Japan column
DScaleJP = DScaling(df['JAPAN - YEN/US$'],2)
DScaleJP.describe()

### Now we can answer the question from the slides

In [None]:
# Apply Min-Max normolization to Australia, UK and Japan columns and check the stddev
MMAUS = MinMax(df['AUSTRALIA - AUSTRALIAN DOLLAR/US$'])
MMUK = MinMax(df['UNITED KINGDOM - UNITED KINGDOM POUND/US$'])
MMJP = MinMax(df['JAPAN - YEN/US$'])
# form new dataframe
MMdf = pd.concat([MMAUS, MMUK, MMJP], axis = 1)
MMdf.describe()

And we see that under the same range [0,1], Japan exchange rate is not more unstable.

In [None]:
# Apply Z-score normolization to Australia, UK and Japan columns and check the stddev
ZScoreAUS = ZScore(df['AUSTRALIA - AUSTRALIAN DOLLAR/US$'])
ZScoreUK = ZScore(df['UNITED KINGDOM - UNITED KINGDOM POUND/US$'])
ZScoreJP = ZScore(df['JAPAN - YEN/US$'])
# form new dataframe
Zdf = pd.concat([ZScoreAUS, ZScoreUK, ZScoreJP], axis = 1)
Zdf.describe()

Of course the intent of Z-score normailization is to scale such that the mean 0 and the std dev is 1, so it's not a great comparison!

In [None]:
# Comparing to the original data
df[['AUSTRALIA - AUSTRALIAN DOLLAR/US$','UNITED KINGDOM - UNITED KINGDOM POUND/US$','JAPAN - YEN/US$']].describe()