### Descrição do dataset

O dataset uitilizado se encontra no endereço: https://www.kaggle.com/burak3ergun/loan-data-set

O problema em questão é: uma empressa de emprestimos deseja automatizar a validação de elegibilidade de um cliente que deseja realizar um empréstimo.

Para resolver esse ploblema criei a rede neural para realizar a clasificação.

## Pacotes

In [30]:
!pip install tensorrt



In [35]:
import numpy as np
import tensorflow as tf
from tensorflow import keras  
import pandas as pd 


### Carregando a base de dados

In [36]:
data = pd.read_csv("loan_data_set.csv")

In [37]:
data.head()

Unnamed: 0,Loan_ID,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
0,LP001002,Male,No,0,Graduate,No,5849,0.0,,360.0,1.0,Urban,Y
1,LP001003,Male,Yes,1,Graduate,No,4583,1508.0,128.0,360.0,1.0,Rural,N
2,LP001005,Male,Yes,0,Graduate,Yes,3000,0.0,66.0,360.0,1.0,Urban,Y
3,LP001006,Male,Yes,0,Not Graduate,No,2583,2358.0,120.0,360.0,1.0,Urban,Y
4,LP001008,Male,No,0,Graduate,No,6000,0.0,141.0,360.0,1.0,Urban,Y


Imprimindo as informações do dataset

In [38]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 614 entries, 0 to 613
Data columns (total 13 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Loan_ID            614 non-null    object 
 1   Gender             601 non-null    object 
 2   Married            611 non-null    object 
 3   Dependents         599 non-null    object 
 4   Education          614 non-null    object 
 5   Self_Employed      582 non-null    object 
 6   ApplicantIncome    614 non-null    int64  
 7   CoapplicantIncome  614 non-null    float64
 8   LoanAmount         592 non-null    float64
 9   Loan_Amount_Term   600 non-null    float64
 10  Credit_History     564 non-null    float64
 11  Property_Area      614 non-null    object 
 12  Loan_Status        614 non-null    object 
dtypes: float64(4), int64(1), object(8)
memory usage: 62.5+ KB


## Checando a integridade do dataset

Checando se existem valores faltantes

In [39]:
data.isnull().sum()

Loan_ID               0
Gender               13
Married               3
Dependents           15
Education             0
Self_Employed        32
ApplicantIncome       0
CoapplicantIncome     0
LoanAmount           22
Loan_Amount_Term     14
Credit_History       50
Property_Area         0
Loan_Status           0
dtype: int64

Checando quais são os valores majoritarios de cada atributo que possui dados faltantes, para completarmos os dados faltantes com eles.

De acordo com a célula acima, são eles `Gender`,`Married`, `Dependents`,`Self_Employed`,`LoanAmount`,`Loan_Amount_Term` e `Credit_History`.


In [40]:
data['Gender'].value_counts()

Gender
Male      489
Female    112
Name: count, dtype: int64

In [41]:
data['Married'].value_counts()

Married
Yes    398
No     213
Name: count, dtype: int64

In [42]:
data['Dependents'].value_counts()

Dependents
0     345
1     102
2     101
3+     51
Name: count, dtype: int64

In [43]:
data['Self_Employed'].value_counts()

Self_Employed
No     500
Yes     82
Name: count, dtype: int64

In [44]:
data['LoanAmount'].value_counts()

LoanAmount
120.0    20
110.0    17
100.0    15
160.0    12
187.0    12
         ..
240.0     1
214.0     1
59.0      1
166.0     1
253.0     1
Name: count, Length: 203, dtype: int64

In [45]:
data['Loan_Amount_Term'].value_counts()

Loan_Amount_Term
360.0    512
180.0     44
480.0     15
300.0     13
240.0      4
84.0       4
120.0      3
60.0       2
36.0       2
12.0       1
Name: count, dtype: int64

In [46]:
data['Credit_History'].value_counts()

Credit_History
1.0    475
0.0     89
Name: count, dtype: int64

Na celula abaixo irei preencher os dados categoricos com os valores majoritarios, e os não categoricos `LoanAmount` e `Loan_Amount_Term` irei preencher com média de valor dos mesmos.

In [47]:
data['Gender'] = data['Gender'].fillna('Male')
data['Married'] = data['Married'].fillna('Yes')
data['Dependents'] = data['Dependents'].fillna('0')
data['Self_Employed'] = data['Self_Employed'].fillna('No')
data['LoanAmount'] = data['LoanAmount'].fillna(data['LoanAmount'].mean())
data['Loan_Amount_Term'] = data['Loan_Amount_Term'].fillna(data['Loan_Amount_Term'].mean())
data['Credit_History'] = data['Credit_History'].fillna(1.0)

Checando novamente os valores

In [49]:
data.isnull().sum()

Loan_ID              0
Gender               0
Married              0
Dependents           0
Education            0
Self_Employed        0
ApplicantIncome      0
CoapplicantIncome    0
LoanAmount           0
Loan_Amount_Term     0
Credit_History       0
Property_Area        0
Loan_Status          0
dtype: int64

## Transformando dados categóricos

Várias colunas do dataframe são categóricas ou seja são caracteres ou strings, logo precisamos transforma-las em valores numericos, são elas: `Gender`, `Married`, `Education`, `Self_Employed` e `Property_Area`.

In [50]:
from sklearn.preprocessing import LabelEncoder
gender_values = {'Female' : 0, 'Male' : 1} 
married_values = {'No' : 0, 'Yes' : 1}
education_values = {'Graduate' : 0, 'Not Graduate' : 1}
employed_values = {'No' : 0, 'Yes' : 1}
dependent_values = {'3+': 3, '0': 0, '2': 2, '1': 1}
loan_values = {'Y':1,'N':0}
area_values = {'Semiurban':2,'Rural':1,'Urban':0}
data.replace({'Gender': gender_values,
              'Married': married_values, 
              'Education': education_values,
              'Self_Employed': employed_values, 
              'Dependents': dependent_values,
              'Loan_Status': loan_values,
              'Property_Area': area_values,
              }, inplace=True)

  data.replace({'Gender': gender_values,


Retirando a coluna Loan_id, por se tratar de um codigo identificador ele não pode influenciar em nosso modelo.

In [51]:
data.drop(['Loan_ID'],axis=1,inplace=True)

Checando se todos os dados categoricos foram substituidos por valores numericos

In [52]:
data.head()

Unnamed: 0,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
0,1,0,0,0,0,5849,0.0,146.412162,360.0,1.0,0,1
1,1,1,1,0,0,4583,1508.0,128.0,360.0,1.0,1,0
2,1,1,0,0,1,3000,0.0,66.0,360.0,1.0,0,1
3,1,1,0,1,0,2583,2358.0,120.0,360.0,1.0,0,1
4,1,0,0,0,0,6000,0.0,141.0,360.0,1.0,0,1


## Separando o dataset em treino e teste

In [53]:
X = data.drop("Loan_Status", axis=1)
Y = data["Loan_Status"]

In [54]:
from sklearn.model_selection import train_test_split
train_set_X, test_set_X, train_set_Y, test_set_Y = train_test_split(X, Y, test_size=0.10, random_state=7)

Transformando os dados para que fiquem em uma menor escala.

In [55]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()

train_set_X = scaler.fit_transform(train_set_X)
test_set_X  = scaler.fit_transform(test_set_X)

In [56]:

print("train_set_X:\n", train_set_X[:2,:])
print("\ntest_set_X:\n", test_set_X[:2,:])

train_set_X:
 [[-2.02758751  0.73321515  1.2801035  -0.51861886 -0.39652579  1.48071539
  -0.54393797 -0.9157385   0.28778621  0.41169348 -1.22874108]
 [ 0.49319696  0.73321515  0.27922937 -0.51861886 -0.39652579  0.40852018
  -0.46477947  1.21341834  0.28778621  0.41169348 -1.22874108]]

test_set_X:
 [[ 0.26261287  0.69006556 -0.89016684 -0.6146363  -0.35675303 -0.74673396
  -0.82746285 -1.10113851  0.20754185 -2.4267033  -1.474686  ]
 [ 0.26261287  0.69006556  1.91612183  1.62697843 -0.35675303 -0.79206856
  -0.37295524 -0.53369795  2.14251848  0.41208169  0.93138063]]


In [57]:
n = train_set_X.shape[1]
m = train_set_X.shape[0]

print ("Number of attributes: n = " + str(n))
print ("Number of training examples: m = " + str(m))
print ("Train set X shape: " + str(train_set_X.shape))
print ("Train set Y shape: " + str(train_set_Y.shape))
print ("Test set X shape: " + str(test_set_X.shape))
print ("Test set Y shape: " + str(test_set_Y.shape))

Number of attributes: n = 11
Number of training examples: m = 552
Train set X shape: (552, 11)
Train set Y shape: (552,)
Test set X shape: (62, 11)
Test set Y shape: (62,)


## Criando o modelo

Procurei testar varias configurações de camadas, neurônios e funções de ativação, porem deixei essa que teve um melhor desempenho.

In [58]:
inputs = keras.Input(shape=(train_set_X.shape[1])) 
x = keras.layers.Dense(units=8, activation="tanh")(inputs)
x = keras.layers.Dense(units=12, activation="tanh")(inputs)
x = keras.layers.Dense(units=8, activation="tanh")(inputs)
outputs = keras.layers.Dense(units=1, activation="sigmoid")(x)
model = keras.Model(inputs=inputs, outputs=outputs)



2024-01-29 19:58:39.359877: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-01-29 19:58:39.376100: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2256] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...


In [59]:
processed_data = model(train_set_X)
print(processed_data.shape)

(552, 1)


In [60]:
model.summary()

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 11)]              0         
                                                                 
 dense_2 (Dense)             (None, 8)                 96        
                                                                 
 dense_3 (Dense)             (None, 1)                 9         
                                                                 
Total params: 105 (420.00 Byte)
Trainable params: 105 (420.00 Byte)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


## Compilação

Tentei utilizar outras metricas e otimisador, porém obtive um resultado inferior aos que foram apresentados nas aulas.

In [61]:
model.compile(optimizer="RMSprop", loss="mean_absolute_error", metrics=["accuracy","Precision","Recall"])

## Treinando o modelo

In [62]:
history = model.fit(train_set_X, train_set_Y, batch_size=64, epochs=1000)

print(history.history)

Epoch 1/1000
Epoch 2/1000
Epoch 3/1000
Epoch 4/1000
Epoch 5/1000
Epoch 6/1000
Epoch 7/1000
Epoch 8/1000
Epoch 9/1000
Epoch 10/1000
Epoch 11/1000
Epoch 12/1000
Epoch 13/1000
Epoch 14/1000
Epoch 15/1000
Epoch 16/1000
Epoch 17/1000
Epoch 18/1000
Epoch 19/1000
Epoch 20/1000
Epoch 21/1000
Epoch 22/1000
Epoch 23/1000
Epoch 24/1000
Epoch 25/1000
Epoch 26/1000
Epoch 27/1000
Epoch 28/1000
Epoch 29/1000
Epoch 30/1000
Epoch 31/1000
Epoch 32/1000
Epoch 33/1000
Epoch 34/1000
Epoch 35/1000
Epoch 36/1000
Epoch 37/1000
Epoch 38/1000
Epoch 39/1000
Epoch 40/1000
Epoch 41/1000
Epoch 42/1000
Epoch 43/1000
Epoch 44/1000
Epoch 45/1000
Epoch 46/1000
Epoch 47/1000
Epoch 48/1000
Epoch 49/1000
Epoch 50/1000
Epoch 51/1000
Epoch 52/1000
Epoch 53/1000
Epoch 54/1000
Epoch 55/1000
Epoch 56/1000
Epoch 57/1000
Epoch 58/1000
Epoch 59/1000
Epoch 60/1000
Epoch 61/1000
Epoch 62/1000
Epoch 63/1000
Epoch 64/1000
Epoch 65/1000
Epoch 66/1000
Epoch 67/1000
Epoch 68/1000
Epoch 69/1000
Epoch 70/1000
Epoch 71/1000
Epoch 72/1000
E

## Avaliação do Modelo

Avalie o desempenho da rede no conjunto de teste.

In [63]:
loss, acc, prec, rec = model.evaluate(test_set_X, test_set_Y)
print("Loss: %.2f" % loss,  "\nAccuracy: %.2f" % acc, "\nPrecision: %.2f" % prec, "\nRecall: %.2f" % rec)

Loss: 0.19 
Accuracy: 0.81 
Precision: 0.80 
Recall: 0.95


## Predição do modelo

In [64]:
predictions = model.predict(test_set_X)
print("Predictions: ", [round(x[0]) for x in predictions])
print("\nCorrect:     ", [round(x) for x in test_set_Y])

Predictions:  [0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1]

Correct:      [0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1]


## Comparação com outros modelos 

Infelizmente não encontrei outro modelo de deep learning para este dataset, porém comparando com um modelo de machine learning apresentado pela usuária [Rakshmitha Madhevan](https://www.kaggle.com/rakshmithamadhevan) e com meu próprio modelo de machine learning, o nosso modelo em deep learning apresenta um resultado superior, enquanto os modelos em machine learning apresentam uma acurácia por volta de 78% nosso modelo aponta para uma acurácia de 82% aproximadamente.

Modelo em machine learning postado no Keagle de autoria da Rakshmitha Madhevan: https://www.kaggle.com/rakshmithamadhevan/are-you-getting-the-loan-loan-status-prediction

## Agradecimento

Gostaria de agradecer ao professor Denilson por nos oferecer esse curso introdutório de Redes Neurais e Deep Learning, com toda certeza ajudou muito em minha caminha pela busca de adquirir conhecimentos em inteliência artificial.

## Referencias

GITHUB. **An overview of activation functions used in neural networks**.
Disponível em: https://adl1995.github.io/an-overview-of-activation-functions-used-in-neural-networks.html. Acesso em: 3 ago. 2021.

MACHINELEARNINGMASTERY. **How to Choose an Activation Function for Deep Learning**. 
Dispinível em: https://machinelearningmastery.com/choose-an-activation-function-for-deep-learning/. Acesso 3 ago. 2021.

TOWARDSDATASCIENCE. **Deep Learning: Which Loss and Activation Functions should I use?**. Disponível em: https://towardsdatascience.com/deep-learning-which-loss-and-activation-functions-should-i-use-ac02f1c56aa8 Acesso em: 3 ago. 2021.

PLURALSIGHT. **A Deep Learning Model to Perform Binary Classification**. Disponível em: https://www.pluralsight.com/guides/deep-learning-model-perform-binary-classification. Acesso em 3 ago. 2021.