# Curso: Redes Neurais e Deep Learning

Prof. Denilson Alves Pereira 
https://sites.google.com/ufla.br/denilsonpereira/ 
Departamento de Ciência da Computação - 
Instituto de Ciências Exatas e Tecnológicas - 
Universidade Federal de Lavras

# Problema

Elaborar uma rede neural artifical (ANN) para predizer se uma amostra de água é considerada potável, com base em medidas contidas no dataset disponível em https://www.kaggle.com/datasets/adityakadiwal/water-potability

Os parâmetros de entrada para o problema são:
- valor do PH
- dureza, causada por cálcio e sais de magnésio
- total de sólidos dissolvidos
- concentração de cloraminas
- concentração de sulfato
- condutividade
- concentração de carbono orgânico
- concentração de trialometanos
- turbidez

E a saída é binária de acordo com que a água seja potável ou não.

# Pacotes

In [1]:
import numpy as np # para computação científica
import tensorflow as tf # para computação númerica nos dados
from tensorflow import keras # para deep learning
import pandas as pd # para trabalhar com análise de dados

# Pré-processamento dos dados de treino e teste

## Lendo o dataset

In [2]:
data = pd.read_csv('potabilidade.csv') # lẽ o dataset
data.head(10) # mostra as 10 primeiras linhas do dataset

Unnamed: 0,ph,Hardness,Solids,Chloramines,Sulfate,Conductivity,Organic_carbon,Trihalomethanes,Turbidity,Potability
0,,204.890455,20791.318981,7.300212,368.516441,564.308654,10.379783,86.99097,2.963135,0
1,3.71608,129.422921,18630.057858,6.635246,,592.885359,15.180013,56.329076,4.500656,0
2,8.099124,224.236259,19909.541732,9.275884,,418.606213,16.868637,66.420093,3.055934,0
3,8.316766,214.373394,22018.417441,8.059332,356.886136,363.266516,18.436524,100.341674,4.628771,0
4,9.092223,181.101509,17978.986339,6.5466,310.135738,398.410813,11.558279,31.997993,4.075075,0
5,5.584087,188.313324,28748.687739,7.544869,326.678363,280.467916,8.399735,54.917862,2.559708,0
6,10.223862,248.071735,28749.716544,7.513408,393.663396,283.651634,13.789695,84.603556,2.672989,0
7,8.635849,203.361523,13672.091764,4.563009,303.309771,474.607645,12.363817,62.798309,4.401425,0
8,,118.988579,14285.583854,7.804174,268.646941,389.375566,12.706049,53.928846,3.595017,0
9,11.180284,227.231469,25484.508491,9.0772,404.041635,563.885481,17.927806,71.976601,4.370562,0


## Removendo linhas que possuem ausência de dados no dataset

A forma de lidar com a ausência de dados foi remover a linha completa onde um ou mais valores nulos (NaN) forem encontrados. A consequência disso é uma enorme redução do dataset disponível.

In [3]:
data.dropna(axis="index", how='any', inplace=True) # remove linhas onde for encontrado valor nulo
data.head(10) # mostra as 10 primeiras linhas do dataset após a remocão

Unnamed: 0,ph,Hardness,Solids,Chloramines,Sulfate,Conductivity,Organic_carbon,Trihalomethanes,Turbidity,Potability
3,8.316766,214.373394,22018.417441,8.059332,356.886136,363.266516,18.436524,100.341674,4.628771,0
4,9.092223,181.101509,17978.986339,6.5466,310.135738,398.410813,11.558279,31.997993,4.075075,0
5,5.584087,188.313324,28748.687739,7.544869,326.678363,280.467916,8.399735,54.917862,2.559708,0
6,10.223862,248.071735,28749.716544,7.513408,393.663396,283.651634,13.789695,84.603556,2.672989,0
7,8.635849,203.361523,13672.091764,4.563009,303.309771,474.607645,12.363817,62.798309,4.401425,0
9,11.180284,227.231469,25484.508491,9.0772,404.041635,563.885481,17.927806,71.976601,4.370562,0
10,7.36064,165.520797,32452.614409,7.550701,326.624353,425.383419,15.58681,78.740016,3.662292,0
12,7.119824,156.704993,18730.813653,3.606036,282.34405,347.715027,15.929536,79.500778,3.445756,0
15,6.347272,186.732881,41065.234765,9.629596,364.487687,516.743282,11.539781,75.071617,4.376348,0
17,9.18156,273.813807,24041.32628,6.90499,398.350517,477.974642,13.387341,71.457362,4.503661,0


## Separando as entradas e saídas

### Entradas

In [4]:
X = data.drop("Potability", axis=1)
X.head(10)

Unnamed: 0,ph,Hardness,Solids,Chloramines,Sulfate,Conductivity,Organic_carbon,Trihalomethanes,Turbidity
3,8.316766,214.373394,22018.417441,8.059332,356.886136,363.266516,18.436524,100.341674,4.628771
4,9.092223,181.101509,17978.986339,6.5466,310.135738,398.410813,11.558279,31.997993,4.075075
5,5.584087,188.313324,28748.687739,7.544869,326.678363,280.467916,8.399735,54.917862,2.559708
6,10.223862,248.071735,28749.716544,7.513408,393.663396,283.651634,13.789695,84.603556,2.672989
7,8.635849,203.361523,13672.091764,4.563009,303.309771,474.607645,12.363817,62.798309,4.401425
9,11.180284,227.231469,25484.508491,9.0772,404.041635,563.885481,17.927806,71.976601,4.370562
10,7.36064,165.520797,32452.614409,7.550701,326.624353,425.383419,15.58681,78.740016,3.662292
12,7.119824,156.704993,18730.813653,3.606036,282.34405,347.715027,15.929536,79.500778,3.445756
15,6.347272,186.732881,41065.234765,9.629596,364.487687,516.743282,11.539781,75.071617,4.376348
17,9.18156,273.813807,24041.32628,6.90499,398.350517,477.974642,13.387341,71.457362,4.503661


### Saídas

In [5]:
Y = data["Potability"]
Y.head(10)

3     0
4     0
5     0
6     0
7     0
9     0
10    0
12    0
15    0
17    0
Name: Potability, dtype: int64

## Criando conjuntos de treino e teste para as entradas e saídas

In [6]:
from sklearn.model_selection import train_test_split
train_set_X, test_set_X, train_set_Y, test_set_Y = train_test_split(X, Y, test_size=0.20)

## Padronizando os atributos

In [7]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
train_set_X = scaler.fit_transform(train_set_X)
test_set_X  = scaler.fit_transform(test_set_X)

## Obtendo número de atributos e exemplos de treinamento

In [8]:
n = train_set_X.shape[1] # número de atributos
m = train_set_X.shape[0] # número de exemplos para treinamento

print ("Número de atributos: n = " + str(n))
print ("Número de exemplos para treinamento: m = " + str(m))

Número de atributos: n = 9
Número de exemplos para treinamento: m = 1608


# Definição do modelo
- Camada de entrada: de acordo com as 9 entradas do problema
- Camada 1: 7 neurônios, função de ativação *ReLu*
- Camada 2 (saída): 1 neurônio, função de ativação *Sigmoid*

In [9]:
inputs = keras.Input(shape=n)
x = keras.layers.Dense(units=7, activation="relu")(inputs)
outputs = keras.layers.Dense(units=1, activation="sigmoid")(x)
model = keras.Model(inputs=inputs, outputs=outputs)
model.summary()

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 9)]               0         
                                                                 
 dense (Dense)               (None, 7)                 70        
                                                                 
 dense_1 (Dense)             (None, 1)                 8         
                                                                 
Total params: 78
Trainable params: 78
Non-trainable params: 0
_________________________________________________________________


2022-08-03 13:42:39.105987: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_SYSTEM_DRIVER_MISMATCH: system has unsupported display driver / cuda driver combination
2022-08-03 13:42:39.106017: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: ProjetoCNPq
2022-08-03 13:42:39.106023: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: ProjetoCNPq
2022-08-03 13:42:39.106120: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 510.85.2
2022-08-03 13:42:39.106138: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 515.65.1
2022-08-03 13:42:39.106143: E tensorflow/stream_executor/cuda/cuda_diagnostics.cc:313] kernel version 515.65.1 does not match DSO version 510.85.2 -- cannot find working devices in this configuration
2022-08-03 13:42:39.106753: I tensorflow/core/platform/cpu_feature_guard.cc:193] This Tens

# Compilação do modelo

In [10]:
model.compile(optimizer="RMSprop", loss="mean_absolute_error", metrics=["accuracy", "Precision", "Recall"])

# Treinamento do modelo

In [11]:
model.fit(train_set_X, train_set_Y, batch_size=32, epochs=500)

Epoch 1/500
Epoch 2/500
Epoch 3/500
Epoch 4/500
Epoch 5/500
Epoch 6/500
Epoch 7/500
Epoch 8/500
Epoch 9/500
Epoch 10/500
Epoch 11/500
Epoch 12/500
Epoch 13/500
Epoch 14/500
Epoch 15/500
Epoch 16/500
Epoch 17/500
Epoch 18/500
Epoch 19/500
Epoch 20/500
Epoch 21/500
Epoch 22/500
Epoch 23/500
Epoch 24/500
Epoch 25/500
Epoch 26/500
Epoch 27/500
Epoch 28/500
Epoch 29/500
Epoch 30/500
Epoch 31/500
Epoch 32/500
Epoch 33/500
Epoch 34/500
Epoch 35/500
Epoch 36/500
Epoch 37/500
Epoch 38/500
Epoch 39/500
Epoch 40/500
Epoch 41/500
Epoch 42/500
Epoch 43/500
Epoch 44/500
Epoch 45/500
Epoch 46/500
Epoch 47/500
Epoch 48/500
Epoch 49/500
Epoch 50/500
Epoch 51/500
Epoch 52/500
Epoch 53/500
Epoch 54/500
Epoch 55/500
Epoch 56/500
Epoch 57/500
Epoch 58/500
Epoch 59/500
Epoch 60/500
Epoch 61/500
Epoch 62/500
Epoch 63/500
Epoch 64/500
Epoch 65/500
Epoch 66/500
Epoch 67/500
Epoch 68/500
Epoch 69/500
Epoch 70/500
Epoch 71/500
Epoch 72/500
Epoch 73/500
Epoch 74/500
Epoch 75/500
Epoch 76/500
Epoch 77/500
Epoch 78

<keras.callbacks.History at 0x7f2b68972830>

# Avaliação do modelo

In [12]:
loss, acc, prec, rec = model.evaluate(test_set_X, test_set_Y)
print("Loss: %.2f" % loss, "\nAccuracy: %.2f" % acc, "\nPrecision: %.2f" % prec, "\nRecall: %.2f" % rec)

Loss: 0.29 
Accuracy: 0.71 
Precision: 0.67 
Recall: 0.43


# Predição

In [13]:
predictions = model.predict(test_set_X)
print("Predição: ", [round(x[0]) for x in predictions])
print()
print("Correto: ", [round(x) for x in test_set_Y])

Predição:  [0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0