# Lab-based Group Assignment 1:
## Prediction of Loyalty of Credit Applications for Risky Customers

<div class="LI-profile-badge"  data-version="v1" data-size="large" data-locale="en_US" data-type="horizontal" data-theme="light" data-vanity="drsalihtutun"><a class="LI-simple-link" href='https://www.linkedin.com/in/drsalihtutun/en-us?trk=profile-badge'>Salih Tutun, PhD</a></div>

![Imgur](https://i.imgur.com/r1U9dHD.png)

### Business Problem

<font color='green'>Understand the uncertain loyalty level (for payment difficulties) of customers given his/her background information collected in the application process. </font>

### Business Objectives

This application aims to identify patterns that indicate <font color='green'>if a customer has difficulty paying their installments</font> which may be used for taking actions such as denying the loan, reducing the amount of loan, lending (to risky applicants) at a higher interest rate, etc. And this will ensure that the consumers capable of repaying the loan will not be rejected.


![Imgur](https://i.imgur.com/tyqhhLR.png)

**Credit:** Investopedia

In [None]:
#Mount with Google Drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive




### <font color='red'>**Q1: Please explore the given dataset** </font>

Please read the dataset.

In [None]:
# Import the dataset named 'credit_data.csv'
# The data is kind of large. Please only load the first 1000 rows.
import pandas as pd
data = pd.read_csv('/content/drive/MyDrive/Deep Learning/Credit_Card_Dataset.csv', nrows = 1000).drop("Car Owner",axis=1)
data.head()

# Target = 1 => Fraud Customer

Unnamed: 0,Mobile Phone,Work Phone,Correct Phone Number,City Rating,Region Rating,House Owner,Income,Document,Target
0,1,1,1,2,2,1,1,1,0
1,1,1,1,1,1,0,0,1,0
2,1,1,1,2,2,1,0,0,1
3,1,1,1,2,2,1,0,1,0
4,1,1,1,2,2,1,0,0,0


In [None]:
#Column Description is shown below
pd.read_csv('/content/drive/MyDrive/Deep Learning/Credit Card Column Description.csv', sep = ';', index_col = 0)

Unnamed: 0_level_0,Description
Column Name,Unnamed: 1_level_1
Mobile Phone,"(0,1) 1: Client provided Mobile phone"
Work Phone,"(0,1) 1: Client provided Work phone"
Correct Phone Number,"(0,1) 1: Phone numbers provided are reachable"
City Rating,"(1,2,3) 1 is the best"
Region Rating,"(1,2,3) 1 is the best"
House Owner,"(0,1) 1: Client owns certain real estate"
Car Owner,"(0,1) 1: Client owns a car"
Income,Income of the client
Document,"(0,1) 1: Client provided additional document"
Target,"(0,1) 1: Client with payment difficulties"


In [None]:
# Please read house owner, income, document and target columns.
#View the head of the data
#We only look at three most related factors (House Owner, Income and Document) and the dependent variable Target
data_adj = data.loc[:,["House Owner", "Income", "Document", "Target"]]
data_adj.head(10)

Unnamed: 0,House Owner,Income,Document,Target
0,1,1,1,0
1,0,0,1,0
2,1,0,0,1
3,1,0,1,0
4,1,0,0,0
5,1,0,1,0
6,1,0,0,1
7,1,0,1,1
8,1,0,1,0
9,1,0,0,0


In [None]:
#View the main statistics of the data columns
#You may use DataFrame.description() method
data_adj.describe()

Unnamed: 0,House Owner,Income,Document,Target
count,1000.0,1000.0,1000.0,1000.0
mean,0.707,0.07,0.723,0.335
std,0.455366,0.255275,0.44774,0.472227
min,0.0,0.0,0.0,0.0
25%,0.0,0.0,0.0,0.0
50%,1.0,0.0,1.0,0.0
75%,1.0,0.0,1.0,1.0
max,1.0,1.0,1.0,1.0


In [None]:
#View the datatype of each column
#You may use the DataFrame.info() method
data_adj.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 4 columns):
 #   Column       Non-Null Count  Dtype
---  ------       --------------  -----
 0   House Owner  1000 non-null   int64
 1   Income       1000 non-null   int64
 2   Document     1000 non-null   int64
 3   Target       1000 non-null   int64
dtypes: int64(4)
memory usage: 31.4 KB


### <font color='red'>**Q2:Please prepare your dataset for the model** </font>

In [None]:
#Split the data into training set (80%) and testing set (20%)
import numpy as np

#Get all index first
all_index = np.array(data_adj.index)


#Randomly select 80% of the index as training index
#You may use np.random.choice() to sample from a set
train_index = np.random.choice(all_index,size=int(1000*0.8),replace=False)


#Get the rest as the test index
#You may use np.setdiff1d() to get the complement set
test_index = np.setdiff1d(all_index, train_index)


#Split the data using the split index
train_data = data_adj.loc[train_index,]
test_data = data_adj.loc[test_index,]
test_data

Unnamed: 0,House Owner,Income,Document,Target
9,1,0,0,0
13,0,0,1,1
24,0,0,1,1
27,1,0,1,0
30,1,0,1,0
...,...,...,...,...
991,1,1,1,0
992,0,0,1,1
995,0,0,0,0
996,1,0,1,0


### <font color='red'>**Q3:Please define your activation functions and initialize the weights of the model** </font>

- Sub-Question:
Think what happens when deriv = False vs when deriv = True.

In [None]:
#Build your own neural network below and print out the MSE
#If you meet difficulty, feel free to refer to "M1.6. Building Neural Networks.ipynb"

#Define a non-linear function first
#Don't forget to define the first order derivative of your non-linear function
#To combine them together, we may add a parameter in the non-linear function

import numpy as np

def relu(x,deriv=False):
    if(deriv==True):

        #return the first order derivative
        return x>0

    #return the function
    return (x>0)*x

def sigm(x,deriv=False):
    if(deriv==True):
      y=1/(1+np.exp(-x))
      return y*(1-y)

    return 1/(1+np.exp(-x))

 **If deriv = False:**

 relu: return x, with range 0 <= x

 sigm: return 1/(1+exp^-x), also with range 0 ~ 1

 **If deriv = True:**

 relu: return 1 if x > 0; return 0 otherwise

 sigm: return (1/(1+exp^-x))*(1-(1/(1+exp^-x)))

 **conclusion:**

 "deriv = False" returns the output of the activation functions

 "deriv = True" returns the first derivative of the activation functions

![Imgur](https://i.imgur.com/9TVc4LU.png)

Sub-Question: What would be X.shape() and y.shape() and why? Think about train dataset x variables and y variable.


In [None]:
#What is your X value? House Owner, Income and Document. Those three factors are related to whether one can pay the installments.
X = train_data.iloc[:,0:-1].values  # we can call X as a0 as well.

#What is your y value? Target. That is whether one can pay the installments.
y = train_data.iloc[:,-1].values

# You need to reshape your y data into a 2-dimensional array
y = np.reshape(y,(train_data.shape[0],1))


# randomly initialize our weights with as 2x-1
# w0 = 2*np.random.random((input,hidden)) - 1
# w1 = 2*np.random.random((hidden,output)) - 1
w0 = 2*np.random.random((3,10)) - 1
w1 = 2*np.random.random((10,1)) - 1

w0, w1

(array([[ 0.09131821, -0.51252902, -0.9178336 , -0.16235933, -0.16899279,
          0.49791022, -0.24615142, -0.00507297, -0.44918249,  0.8859761 ],
        [ 0.87915878,  0.86939396,  0.7552392 ,  0.60263716,  0.11005022,
          0.67361805,  0.5917771 ,  0.0127793 ,  0.67982155, -0.38549709],
        [-0.92327158,  0.01075822,  0.43383449, -0.01109355, -0.96600148,
         -0.03028141,  0.51653825, -0.11038408,  0.73073808,  0.58330962]]),
 array([[-0.18714167],
        [-0.25319273],
        [ 0.86450437],
        [-0.89366038],
        [-0.2071202 ],
        [-0.08704685],
        [-0.79893087],
        [-0.56719678],
        [-0.30899161],
        [ 0.91122307]]))

In [None]:
X.shape, y.shape

((800, 3), (800, 1))

**X.shape: The shape of X is (m, 3)**

train_data.iloc[:, 0:-1] selects all rows and all columns except the last one.

So, X includes: House Owner, Income, and Document.

X is an 3-dimensional array with a shape of (m,3), where m is the number of rows

**y.shape: The shape of y is (m, 1)**

train_data.iloc[:, -1] selects all rows and only the last column, which is the target variable indicating whether one can pay the installments.

y is an 1-dimensional array with a shape of (m,1), where m is the number of rows

**m=800**

the number of rows = the number of training data

Thus, m = 800 (1000*.8)

**X.shape = (800, 3), y.shape = (800, 1)**


![Imgur](https://i.imgur.com/85DTrAa.png)

### <font color='red'>**Q4:Please train your neural networks** </font>

In [None]:

#Write a long enough loop to train your model and minimize the cost

for j in range(60000):

    #forward propagation begins#
    # Feed forward through layers 0, 1, and 2
    a0 = X
    a1 = sigm(np.dot(a0,w0))
    a2 = sigm(np.dot(a1,w1))

    # how much did we miss the target value?
    a2_error= y - a2
    #forward propagation ends#

    # in what direction is the target value?
    # were we really sure? if so, don't change too much.
    # Backpropagation begins#
    a2_delta = a2_error*sigm(a2,deriv=True)

    # how much did each a1 value contribute to the a2 error (according to the weights)?
    a1_error = a2_delta.dot(w1.T)

    # in what direction is the target l1?
    # were we really sure? if so, don't change too much.
    a1_delta = a1_error * sigm(a1,deriv=True)
    # the update of weights
    w1 += a1.T.dot(a2_delta)
    w0 += a0.T.dot(a1_delta)
    # Backpropagation ends#
#Let's convert the probability into binary result
#1 means correct prediction; 0 means mistake
train_result = [1 if np.abs(e) < 0.5 else 0 for e in a2_error]

#what is our training accuracy
np.mean(train_result)


  return 1/(1+np.exp(-x))


0.32875

### <font color='red'>**Q5:Please test your model with test dataset** </font>

In [None]:
#Define a function to get your test error
#w0, w1 represent your trained model, test_data is your test data
def my_net(w0, w1, test_data):

    #Get your test data first
    X = test_data.iloc[:,0:3].values
    y = test_data.iloc[:,3].values
    y = np.reshape(y,(test_data.shape[0],1))

    #Input your data in the first layer
    a0 = X

    #Caculate the second layer
    a1 = sigm(np.dot(a0,w0))

    #Caculate the third layer
    a2 = sigm(np.dot(a1,w1))

    #What is your error?
    a2_error = y - a2

    #Finally return your error
    test_result = [1 if np.abs(e) < 0.5 else 0 for e in a2_error]
    return np.mean(test_result)

#Print the score
my_score = my_net(w0, w1, test_data)
print("Test Accuracy: " + str(my_score))

Test Accuracy: 0.22


  return 1/(1+np.exp(-x))


Note: Accuracy could be change for each run because we have random intial weights.

If you have questions, please contact with me.

Salih Tutun, PhD

salihtutun@wustl.edu