**<h1>Python Machine Learning Project</h1>**

<h2>Created by Brian Chairez<h2>

**<h3><u>Step 1 - Reading Data</u></h3>**

Using the <b>pandas.read_csv()</b> method, the provided csv files will be read in and loaded into dataframes.

The <i>"bank.csv"</i> file will be the training dataframe while <i>"bank-full.csv"</i> will be the testing dataframe.

In [10]:
import pandas as pd

trainDF = pd.read_csv('bank.csv', sep=';')
testDF = pd.read_csv('bank-full.csv', sep=';')

**<h3><u>Step 2 - Data Preprocessing</u></h3>**

Some of the features are categorical variables and will need to be turned into numbers using the <b>pandas.get_dummies()</b> method passing in <i>drop_first=True</i>.

In [11]:
trainDF = pd.get_dummies(trainDF, drop_first=True)
testDF = pd.get_dummies(testDF, drop_first=True)

The ['duration'] and ['y_yes'] feature will need to be droppped from both training and testing dataframes however the ['y_yes'] will become the target.

In [12]:
trainTarget = trainDF['y_yes']
trainDF = trainDF.drop(columns=['duration', 'y_yes'])

testTarget = testDF['y_yes']
testDF = testDF.drop(columns=['duration', 'y_yes'])

Non-categorical features must be standardized in order to utilize K-Nearest Neighbor (KNN) and Support Vector Machine (SVM) classifiers. 

The specific features are:
    <ul>
        <li>age</li> 
        <li>campaign</li> 
        <li>pdays</li>
        <li>previous</li>
        <li>emp.var.rate</li>
        <li>cons.price.idx</li>
        <li>cons.conf.idx</li>
        <li>euribor3m</li>
        <li>nr.employed</li>
    </ul>


This is done by subtracting the initial value with the mean and then dividing that result by the standard deviation of the respective feature:

x' = ( x<sub>n</sub> - x̅ ) / σ
    

In [13]:
# Training data standardization
for index, row in trainDF.iterrows():
  row['age'] = (row['age'] - trainDF['age'].mean())/trainDF['age'].std()

for index, row in trainDF.iterrows():
  row['campaign'] = (row['campaign'] - trainDF['campaign'].mean())/trainDF['campaign'].std()

for index, row in trainDF.iterrows():
  row['pdays'] = (row['pdays'] - trainDF['pdays'].mean())/trainDF['pdays'].std()

for index, row in trainDF.iterrows():
  row['previous'] = (row['previous'] - trainDF['previous'].mean())/trainDF['previous'].std()

for index, row in trainDF.iterrows():
  row['emp.var.rate'] = (row['emp.var.rate'] - trainDF['emp.var.rate'].mean())/trainDF['emp.var.rate'].std()

for index, row in trainDF.iterrows():
  row['cons.price.idx'] = (row['cons.price.idx'] - trainDF['cons.price.idx'].mean())/trainDF['cons.price.idx'].std()

for index, row in trainDF.iterrows():
  row['cons.conf.idx'] = (row['cons.conf.idx'] - trainDF['cons.conf.idx'].mean())/trainDF['cons.conf.idx'].std()

for index, row in trainDF.iterrows():
  row['euribor3m'] = (row['euribor3m'] - trainDF['euribor3m'].mean())/trainDF['euribor3m'].std()

for index, row in trainDF.iterrows():
  row['nr.employed'] = (row['nr.employed'] - trainDF['nr.employed'].mean())/trainDF['nr.employed'].std()

# Test data standardization
for index, row in testDF.iterrows():
  row['age'] = (row['age'] - testDF['age'].mean())/testDF['age'].std()

for index, row in testDF.iterrows():
  row['campaign'] = (row['campaign'] - testDF['campaign'].mean())/testDF['campaign'].std()

for index, row in testDF.iterrows():
  row['pdays'] = (row['pdays'] - testDF['pdays'].mean())/testDF['pdays'].std()

for index, row in testDF.iterrows():
  row['previous'] = (row['previous'] - testDF['previous'].mean())/testDF['previous'].std()

for index, row in testDF.iterrows():
  row['emp.var.rate'] = (row['emp.var.rate'] - testDF['emp.var.rate'].mean())/testDF['emp.var.rate'].std()

for index, row in testDF.iterrows():
  row['cons.price.idx'] = (row['cons.price.idx'] - testDF['cons.price.idx'].mean())/testDF['cons.price.idx'].std()

for index, row in testDF.iterrows():
  row['cons.conf.idx'] = (row['cons.conf.idx'] - testDF['cons.conf.idx'].mean())/testDF['cons.conf.idx'].std()

for index, row in testDF.iterrows():
  row['euribor3m'] = (row['euribor3m'] - testDF['euribor3m'].mean())/testDF['euribor3m'].std()

for index, row in testDF.iterrows():
  row['nr.employed'] = (row['nr.employed'] - testDF['nr.employed'].mean())/testDF['nr.employed'].std()
