# Redes Neurais Artificiais


Ilustra o funcionamento do algoritmo de redes neurais.

Prof. Hugo de Paula

-------------------------------------------------------------------------------

### Base de dados: Sonar, Mines vs. Rocks

https://archive.ics.uci.edu/ml/datasets/Connectionist+Bench+%28Sonar,+Mines+vs.+Rocks%29

208 instâncias

60 atributos

2 classes (rocha, mina)



In [1]:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.neural_network import MLPClassifier
from sklearn.preprocessing import LabelEncoder
import numpy as np

np.set_printoptions(precision=2)



### Carga dos dados


In [3]:
sonar = pd.read_csv('./data/sonar.csv', sep=";", decimal=",")
sonar

Unnamed: 0,Atributo_1,Atributo_2,Atributo_3,Atributo_4,Atributo_5,Atributo_6,Atributo_7,Atributo_8,Atributo_9,Atributo_10,...,Atributo_52,Atributo_53,Atributo_54,Atributo_55,Atributo_56,Atributo_57,Atributo_58,Atributo_59,Atributo_60,Classe
0,0.0200,0.0371,0.0428,0.0207,0.0954,0.0986,0.1539,0.1601,0.3109,0.2111,...,0.0027,0.0065,0.0159,0.0072,0.0167,0.0180,0.0084,0.0090,0.0032,Rocha
1,0.0453,0.0523,0.0843,0.0689,0.1183,0.2583,0.2156,0.3481,0.3337,0.2872,...,0.0084,0.0089,0.0048,0.0094,0.0191,0.0140,0.0049,0.0052,0.0044,Rocha
2,0.0262,0.0582,0.1099,0.1083,0.0974,0.2280,0.2431,0.3771,0.5598,0.6194,...,0.0232,0.0166,0.0095,0.0180,0.0244,0.0316,0.0164,0.0095,0.0078,Rocha
3,0.0100,0.0171,0.0623,0.0205,0.0205,0.0368,0.1098,0.1276,0.0598,0.1264,...,0.0121,0.0036,0.0150,0.0085,0.0073,0.0050,0.0044,0.0040,0.0117,Rocha
4,0.0762,0.0666,0.0481,0.0394,0.0590,0.0649,0.1209,0.2467,0.3564,0.4459,...,0.0031,0.0054,0.0105,0.0110,0.0015,0.0072,0.0048,0.0107,0.0094,Rocha
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
203,0.0187,0.0346,0.0168,0.0177,0.0393,0.1630,0.2028,0.1694,0.2328,0.2684,...,0.0116,0.0098,0.0199,0.0033,0.0101,0.0065,0.0115,0.0193,0.0157,Mina
204,0.0323,0.0101,0.0298,0.0564,0.0760,0.0958,0.0990,0.1018,0.1030,0.2154,...,0.0061,0.0093,0.0135,0.0063,0.0063,0.0034,0.0032,0.0062,0.0067,Mina
205,0.0522,0.0437,0.0180,0.0292,0.0351,0.1171,0.1257,0.1178,0.1258,0.2529,...,0.0160,0.0029,0.0051,0.0062,0.0089,0.0140,0.0138,0.0077,0.0031,Mina
206,0.0303,0.0353,0.0490,0.0608,0.0167,0.1354,0.1465,0.1123,0.1945,0.2354,...,0.0086,0.0046,0.0126,0.0036,0.0035,0.0034,0.0079,0.0036,0.0048,Mina



### Transformação de dados

A classe é convertida para labels únicos sequenciais.

<code>
 le = preprocessing.LabelEncoder()
  
 le.fit(dados)
</code>


### Particionamento da base

<code>train_test_split(X, y) -- particiona a base de dados original em bases de treinamento e teste.</code>

No código a seguir, são utilizados 10% para teste e 90% para treinamento.


In [4]:
X = sonar.iloc[:,0:(sonar.shape[1] - 1)]

le = LabelEncoder()
y = le.fit_transform(sonar.iloc[:,(sonar.shape[1] - 1)])

class_names = le.classes_


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=0)



### MLP com uma camada oculta

O bloco a seguir testa a rede neural com um camada oculta com 100 neurônios. 

São totalizados 6.100 pesos diferentes que precisarão ser ajustados na fase de treinamento.

O parâmetro solver='lbfgs' foi escolhido por ser mais adequado para treinamento com bases pequenas (menores que alguns milhares de registros).

In [5]:
# Rede Perceptron Multicamadas (MLP):  Configuração default otimizando a função log-loss
# uma camada oculta com 100 neurônios.

mlp = MLPClassifier(solver='lbfgs', random_state=0)
mlp.fit(X_train, y_train)
y_pred = mlp.predict(X_test)


print("Camadas da rede: {}".format(mlp.n_layers_))
print("Neurônios na camada oculta: {}".format(mlp.hidden_layer_sizes))
print("Neurônios na camada de saída: {}".format(mlp.n_outputs_))
print("Pesos na camada de entrada: {}".format(mlp.coefs_[0].shape))
print("Pesos na camada oculta: {}".format(mlp.coefs_[1].shape))

print("Acurácia da base de treinamento: {:.2f}".format(mlp.score(X_train, y_train)))
print("Acurácia da base de teste: {:.2f}".format(mlp.score(X_test, y_test)))


print(classification_report(y_test, y_pred, target_names=class_names))

# Calcula a matriz de confusão
cnf_matrix = confusion_matrix(y_test, y_pred)
print(cnf_matrix)

Camadas da rede: 3
Neurônios na camada oculta: (100,)
Neurônios na camada de saída: 1
Pesos na camada de entrada: (60, 100)
Pesos na camada oculta: (100, 1)
Acurácia da base de treinamento: 1.00
Acurácia da base de teste: 0.86
              precision    recall  f1-score   support

        Mina       0.78      0.88      0.82         8
       Rocha       0.92      0.85      0.88        13

    accuracy                           0.86        21
   macro avg       0.85      0.86      0.85        21
weighted avg       0.86      0.86      0.86        21

[[ 7  1]
 [ 2 11]]


### MLP com duas camadas ocultas

O bloco a seguir testa a rede neural com duas camadas ocultas. 

A primeira camada possui 100 neurônios, enquanto a segunda camada possui 60 neurônios. 

São totalizados 12.100 pesos diferentes que precisarão ser ajustados na fase de treinamento.

Com essa rede será possível observar que aumentar arbitrariamente a dimensão da sua rede neural não garante um aumento arbitrário da performance do modelo.

In [6]:
#While the default 'adam' solver generally performs well on large datasets, 'lbfgs' is often more suitable and converges faster on smaller datasets
mlp = MLPClassifier(solver='lbfgs', random_state=0, hidden_layer_sizes=[100, 60])
mlp.fit(X_train, y_train)
y_pred = mlp.predict(X_test)

print("Camadas da rede: {}".format(mlp.n_layers_))
print("Neurônios na camada oculta: {}".format(mlp.hidden_layer_sizes))
print("Neurônios na camada de saída: {}".format(mlp.n_outputs_))
print("Pesos na camada de entrada: {}".format(mlp.coefs_[0].shape))
print("Pesos na camada oculta: {}".format(mlp.coefs_[1].shape))

print("Acurácia da base de treinamento: {:.2f}".format(mlp.score(X_train, y_train)))
print("Acurácia da base de teste: {:.2f}".format(mlp.score(X_test, y_test)))

print(classification_report(y_test, y_pred, target_names=class_names))

cnf_matrix = confusion_matrix(y_test, y_pred)
print(cnf_matrix)


Camadas da rede: 4
Neurônios na camada oculta: [100, 60]
Neurônios na camada de saída: 1
Pesos na camada de entrada: (60, 100)
Pesos na camada oculta: (100, 60)
Acurácia da base de treinamento: 1.00
Acurácia da base de teste: 0.86
              precision    recall  f1-score   support

        Mina       0.86      0.75      0.80         8
       Rocha       0.86      0.92      0.89        13

    accuracy                           0.86        21
   macro avg       0.86      0.84      0.84        21
weighted avg       0.86      0.86      0.86        21

[[ 6  2]
 [ 1 12]]


### *Overfitting* por excesso de neurônios

A forma mais eficiente para se determinar o número de neurônios na camada oculta é por busca sistemática. Um artigo interessante que ilustra diversas heurísticas para resolver o problema pode ser visto em 

 D. Stathakis (2009) *How many hidden layers and nodes?*, **International Journal of Remote Sensing**, 30:8, 2133-2147, DOI: 10.1080/01431160802549278 
 
 Entretanto, um ponto de partida inicial muito utilizado corresponde ao:
 
 (num_entradas + num_saídas) / 2.
 
 Note que a rede manteve a mesma performance das topologias anteriores.

In [7]:
mlp = MLPClassifier(solver='lbfgs', random_state=0, hidden_layer_sizes=[31])
mlp.fit(X_train, y_train)
y_pred = mlp.predict(X_test)


print("Camadas da rede: {}".format(mlp.n_layers_))
print("Neurônios na camada oculta: {}".format(mlp.hidden_layer_sizes))
print("Neurônios na camada de saída: {}".format(mlp.n_outputs_))
print("Pesos na camada de entrada: {}".format(mlp.coefs_[0].shape))
print("Pesos na camada oculta: {}".format(mlp.coefs_[1].shape))

print("Acurácia da base de treinamento: {:.2f}".format(mlp.score(X_train, y_train)))
print("Acurácia da base de teste: {:.2f}".format(mlp.score(X_test, y_test)))

print(classification_report(y_test, y_pred, target_names=class_names))

cnf_matrix = confusion_matrix(y_test, y_pred)
print(cnf_matrix)


Camadas da rede: 3
Neurônios na camada oculta: [31]
Neurônios na camada de saída: 1
Pesos na camada de entrada: (60, 31)
Pesos na camada oculta: (31, 1)
Acurácia da base de treinamento: 1.00
Acurácia da base de teste: 0.86
              precision    recall  f1-score   support

        Mina       0.73      1.00      0.84         8
       Rocha       1.00      0.77      0.87        13

    accuracy                           0.86        21
   macro avg       0.86      0.88      0.86        21
weighted avg       0.90      0.86      0.86        21

[[ 8  0]
 [ 3 10]]


### Ajustamento dos dados

As redes neurais a seguir irão testar a hipótese de que uma MLP é robusta quanto a dados não normalizados. 

Os dados serão padronizados pelo Z-score.

A normalização dos dados não irá acarretar em nenhume melhoria da rede.

In [8]:
# Calcula a média e o desvio padrão de cada atributo da base de treinamento
mean_on_train = X_train.mean(axis=0)
std_on_train = X_train.std(axis=0)

# Normaliza os atributos pela norma Z = (X - média) / desvio padrão
# afterwards, mean=0 and std=1
X_train_scaled = (X_train - mean_on_train) / std_on_train
# usa a esma transformação nos dados de teste
X_test_scaled = (X_test - mean_on_train) / std_on_train


# A rede atinge o número máximo de iterações, mas não converge.
mlp = MLPClassifier(solver='lbfgs', hidden_layer_sizes=[31], random_state=0)
mlp.fit(X_train_scaled, y_train)
print("Acurácia da base de treinamento: {:.2f}".format(mlp.score(X_train_scaled, y_train)))
print("Acurácia da base de teste: {:.2f}".format(mlp.score(X_test_scaled, y_test)))

# Vamos aumentar o número máximo de iterações
mlp = MLPClassifier(solver='lbfgs', hidden_layer_sizes=[31], max_iter=1000, random_state=0)
mlp.fit(X_train_scaled, y_train)
print("Acurácia da base de treinamento: {:.2f}".format(mlp.score(X_train_scaled, y_train)))
print("Acurácia da base de teste: {:.2f}".format(mlp.score(X_test_scaled, y_test)))


Acurácia da base de treinamento: 1.00
Acurácia da base de teste: 0.86
Acurácia da base de treinamento: 1.00
Acurácia da base de teste: 0.86
