<a href="https://colab.research.google.com/github/Kaiziferr/Deep_Learning_Workshop/blob/master/multilayer_perceptron/03_workshop_binary_clasification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import train_test_split, cross_val_score, StratifiedKFold
from sklearn.preprocessing import LabelEncoder


import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning) 

In [2]:
data = pd.read_csv('https://raw.githubusercontent.com/Kaiziferr/Deep_Learning_Workshop/master/workshop_clasification_binary_multiclass/sonar_csv.csv')
data_values = data.values

In [3]:
def base_model():
  model = Sequential()
  model.add(Dense(60, input_dim = 60, kernel_initializer='glorot_uniform', activation='relu'))
  model.add(Dense(1, kernel_initializer='glorot_uniform', activation='relu'))
  model.compile(loss = 'binary_crossentropy', optimizer = 'adam', metrics = ['accuracy'])
  return model

# **Split**

---




In [4]:
X = data_values[:, :60].astype('float32')
y = data_values[:,60]

# **Encoder**

---



In [5]:
encoder = LabelEncoder()
encoder_y = encoder.fit_transform(y)
encoder_y

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

# **Wrapper**

---



In [6]:
model = KerasClassifier(build_fn=base_model, epochs = 100, batch_size = 5, verbose = 0)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
score = cross_val_score(model, X, encoder_y, cv = kfold)

In [7]:
score

array([0.85714287, 0.95238096, 0.76190478, 0.90476191, 0.90476191,
       0.71428573, 0.85714287, 0.5714286 , 0.55000001, 0.55000001])

In [8]:
print(" Line base: %.2f%% (%.2f%%)" % (score.mean()*100, score.std()*100))

 Line base: 76.24% (14.95%)


La dispersión es muy alta

# **Optimizar el rendimiento con procesamiento de datos**

---



In [9]:
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

In [10]:
estimator = []
estimator.append(('standarize', StandardScaler()))
estimator.append(('mlp', KerasClassifier(build_fn=base_model, epochs = 100, batch_size = 5, verbose = 0)))

In [11]:
pipeline = Pipeline(estimator)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
score = cross_val_score(pipeline, X, encoder_y, cv = kfold)

In [12]:
score

array([0.90476191, 0.80952382, 0.80952382, 0.85714287, 0.61904764,
       0.85714287, 0.71428573, 0.76190478, 0.94999999, 0.80000001])

In [13]:
print("Modelo Estandarizado: %.2f%% (%.2f%%)" % (score.mean()*100, score.std()*100))

Modelo Estandarizado: 80.83% (9.00%)


La dispersión se redujo, es mas confiable la media que la proporcionada por el modelo sin estandarizar.

# **Topologia Pequeña**

---



In [14]:
def model_smaller():
  model = Sequential()
  model.add(Dense(30, input_dim=60,kernel_initializer = 'glorot_uniform',activation='relu'))
  model.add(Dense(1, kernel_initializer = 'glorot_uniform', activation='sigmoid'))

  model.compile(loss= 'binary_crossentropy', optimizer='adam', metrics=['accuracy'])
  return model

In [15]:
estimator = []
estimator.append(('standarize', StandardScaler()))
estimator.append(('mlp', KerasClassifier(build_fn=model_smaller, epochs = 100, batch_size = 5, verbose = 0)))

In [16]:
pipeline = Pipeline(estimator)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
score = cross_val_score(pipeline, X, encoder_y, cv = kfold)

In [17]:
print("Modelo Estandarizado Topologia pequeña: %.2f%% (%.2f%%)" % (score.mean()*100, score.std()*100))

Modelo Estandarizado Topologia pequeña: 86.05% (7.53%)


La media aumento, por lo que el modelo tiene un mejor rendimiento, sin que la desviación creciera de manera desproporcional

# **Topologia grande**

---



In [18]:
def model_large():
  model = Sequential()
  model.add(Dense(60, input_dim = 60, kernel_initializer = 'glorot_uniform', activation = 'relu'))
  model.add(Dense(30, kernel_initializer = 'glorot_uniform', activation = 'relu'))
  model.add(Dense(1, activation = 'sigmoid'))

  model.compile(loss = 'binary_crossentropy', optimizer = 'adam', metrics = ['accuracy'])
  return model

In [19]:
estimators = []
estimators.append(('standarize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(build_fn=model_large, epochs=100, batch_size = 5, verbose=0)))

In [20]:
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
results = cross_val_score(pipeline, X, encoder_y, cv = kfold)
print("Modelo Estandarizado Topologia grande: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

Modelo Estandarizado Topologia grande: 87.07% (6.43%)


La topología grande es la que mejor resultado tubo, con respecto al resto de estructuras probadas, ya que su desviación es la más cercana a la media, ademas de que tubo una puntuación alta en la media.

***Mucha dispersión frente a la media en los resultados no es recomendado.***