# Exercise 6.03: Classifying Credit Approval

In this exercise, we will be using the German credit approval dataset, and train a neural network to classify whether an individual is creditworthy or not.

The following steps will help you complete the exercise:

1.- Import the `loadtxt` method from `numpy`:

In [17]:
from numpy import loadtxt

2.- Create a variable called `file_url` containing the link to the raw dataset. Use `data/german_scaled.csv` file

In [18]:
url='https://raw.githubusercontent.com/applied-data-mining-master/syllabus_intelligencesystems/main/data/german_scaled.csv'

3.- Load the data into a variable called `data` using `loadtxt()` and specify the `delimiter=','` parameter. Print its content:

Output:

```
array([[0.        , 0.33333333, 0.02941176, ..., 0.      , 1.      ,
        1.        ],
       [1.        , 0.        , 0.64705882, ..., 0.      , 0.      ,
        1.        ],
       [0.        , 1.        , 0.11764706, ..., 1.      , 0.      ,
        1.        ],
       ...,
       [0.        , 1.        , 0.11764706, ..., 0.      , 0.      ,
        1.        ],
       [1.        , 0.33333333, 0.60294118, ..., 0.      , 1.      ,
        1.        ],
       [0.        , 0.        , 0.60294118, ..., 0.      , 0.      ,
        1.        ]])
```

In [19]:
data = loadtxt(url, delimiter=',')
data

array([[0.        , 0.33333333, 0.02941176, ..., 0.        , 1.        ,
        1.        ],
       [1.        , 0.        , 0.64705882, ..., 0.        , 0.        ,
        1.        ],
       [0.        , 1.        , 0.11764706, ..., 1.        , 0.        ,
        1.        ],
       ...,
       [0.        , 1.        , 0.11764706, ..., 0.        , 0.        ,
        1.        ],
       [1.        , 0.33333333, 0.60294118, ..., 0.        , 1.        ,
        1.        ],
       [0.        , 0.        , 0.60294118, ..., 0.        , 0.        ,
        1.        ]])

4.- Create a variable called `label` that contains the data only from the first column (this will be our response variable):

In [20]:
label = data[:, 0]

5.- Create a variable called `features` that contains all the data except for the first column (which corresponds to the response variable):

In [21]:
features = data[:, 1:]

6.- Import the `train_test_split` method from `sklearn.model_selection`

In [22]:
from sklearn.model_selection import train_test_split

7.- Split the data into training and testing sets and save the results into four variables called `features_train`, `features_test`, `label_train`, and `label_test`. Use $20\%$ of the data for testing and specify `random_state=7`

In [23]:
features_train, features_test, label_train, label_test = train_test_split(features, label, test_size=0.2, random_state=7)

8.- Import `numpy` as np, `tensorflow` as tf, and `layers` from `tensorflow.keras`

In [24]:
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers

9.- Set 1 as the seed for `numpy` and `tensorflow` using `np.random_seed()` and `tf.random.set_seed()`

In [25]:
np.random.seed(1)
tf.random.set_seed(1)

10.- Instantantiate a `tf.keras.Sequential()` class and save it into a variable called `model`

In [26]:
model = tf.keras.Sequential()

11.- Instantantiate a `layers.Dense()` class with 16 neurons, `activation='relu'`, and `input_shape=[19]`, then save it into a variable called `layer1`

In [27]:
layer1 = layers.Dense(16, activation='relu', input_shape=[19])

12.- Instantantiate a second `layers.Dense()` class with 1 neuron and `activation='sigmoid'`, then save it into a variable called `final_layer`

In [28]:
final_layer = layers.Dense(1, activation='sigmoid')

13.- Add the two layers you just defined to the model using `.add()`

In [29]:
model.add(layer1)
model.add(final_layer)

14.- Instantantiate a `tf.keras.optimizers.Adam()` class with 0.001 as the learning rate and save it into a variable called `optimizer`

In [30]:
optimizer = tf.keras.optimizers.Adam(0.001)

15.- Compile the neural network using `.compile()` with `loss='binary_crossentropy'`, `optimizer=optimizer, metrics=['accuracy']` as shown in the following code snippet 

In [31]:
model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])

16.- Print a summary of the model using `.summary()`

Output:

![Figure 6.13](img/fig6_13.jpg)

In [32]:
model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_2 (Dense)              (None, 16)                320       
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 17        
Total params: 337
Trainable params: 337
Non-trainable params: 0
_________________________________________________________________


This output summarizes the architecture of our neural networks. We can see it is composed of three layers, as expected, and we know each layer's output size and number of parameters, which corresponds to the weights and biases. For instance, the first layer has 16 neurons and 320 parameters to be learned (weights and biases).

17.- Next, fit the neural networks with the training set and specify `epochs=10`

Output:

![Figure 6.14](img/fig6_14.jpg)

In [34]:
m=model.fit(features_train, label_train, epochs=10)
m

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7f41ec97fdd0>

The output provides a lot of information about the training of the neural network. The first line tells us the training set was composed of 800 observations. Then we can see the results of each epoch:

Total processing time in seconds

Processing time by data sample in us/sample

Loss value and accuracy score

The final result of this neural network is the last epoch (epoch=10), where we achieved an accuracy score of 0.6888. But we can see that the trend was improving: the accuracy score was still increasing after each epoch. So, we may get better results if we train the neural network for longer by increasing the number of epochs or lowering the learning rate.