# Deep Learning mit Keras

Dieses Notebook trainiert mehrere Modell zu den zwei Datensätzen `banknote_authentication.csv` und `spambase.csv`.

Zuerst wird der Datensatz `banknote_authentication.csv` betrachtet.

##  Spam erkennen

**Bibliotheken einbinden und Daten formatieren**

In [16]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler

from IPython.display import display, HTML

import keras
from keras import backend as K
from keras.models import Sequential
from keras.layers import Activation
from keras.activations import relu, elu, sigmoid
from keras.layers.core import Dense
from keras.optimizers import Adam, SGD
from keras.metrics import categorical_crossentropy

In [17]:
dat = pd.read_csv("./spambase.csv")

In [18]:
print(dat.shape)

(4601, 59)


Um später ein Modell mit Keras möglichst effizient darstellen zu können, bietet es sich an die Daten in die Daten selbst und die entsprechenden Lables aufzuteilen.

In [19]:
dat_new = dat.dropna()

In [20]:
train_lables = []
train_lables = dat
train_lables = dat_new["Target"]
train_lables = np.array(train_lables)

train_samples = pd.DataFrame([])
train_samples = dat_new
train_samples = train_samples.drop(['Target'], axis=1)
train_samples = np.array(train_samples)


print(train_lables.shape)
print(train_samples.shape)

(4456,)
(4456, 58)


Um die Rechenzeit zu verkürzen bietet es sich an, alle Daten innerhalb der oben erstellten `np.arrays` zu skalieren. Dazu bietet das Paket `sklearn` eine super Funktion names `MinMaxScaler`. Sprich alle Daten werden in unserem Fall auf eine Skala von 0 - 1 skaliert.

In [21]:
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_train_samples = scaler.fit_transform(train_samples)

# Zusätzliche Info
# Bei 1-D Daten muss man noch diesen Befehl einfügen: .reshape(-1, 1)

In [22]:
print(len(scaled_train_samples))
len(train_lables)
print(scaled_train_samples.shape)
print(train_lables.shape)
print(train_lables)
print(train_samples)

4456
(4456, 58)
(4456,)
[1 1 1 ... 0 0 0]
[[1.000e+00 0.000e+00 6.400e-01 ... 3.756e+00 6.100e+01 2.780e+02]
 [2.000e+00 2.100e-01 2.800e-01 ... 5.114e+00 1.010e+02 1.028e+03]
 [3.000e+00 6.000e-02 0.000e+00 ... 9.821e+00 4.850e+02 2.259e+03]
 ...
 [4.599e+03 3.000e-01 0.000e+00 ... 1.404e+00 6.000e+00 1.180e+02]
 [4.600e+03 9.600e-01 0.000e+00 ... 1.147e+00 5.000e+00 7.800e+01]
 [4.601e+03 0.000e+00 0.000e+00 ... 1.250e+00 5.000e+00 4.000e+01]]


**Modell / Neuronales Netz trainieren:**

**Talos:**  
Talos ist eine Library die es ermöglicht mehrere Parameter auf einmal zu testen.

In [23]:
from keras.models import Sequential
from keras.layers import Dense
from keras import optimizers, initializers, callbacks

In [24]:
p = {'activation':['relu', 'elu'],
        'optimizer': ['Nadam', 'Adam'],
        'losses': ['logcosh'],
        'hidden_layers':[1, 2],
        'batch_size': [20,30,40],
        'epochs': [10,20]}

In [25]:
def spambase(x_train, y_train, x_val, y_val, params):

        model = Sequential()
        model.add(Dense(32, input_shape=(58,), activation=params['activation']))
        model.add(Dense(1, activation='sigmoid'))
        model.compile(optimizer=params['optimizer'], loss=params['losses'], metrics = ['accuracy'])

        out = model.fit(x_train, y_train,
                         batch_size=params['batch_size'],
                         epochs=params['epochs'],
                         validation_data=[x_val, y_val],
                         verbose=0)

        return out, model

In [26]:
import talos as ta

scan_object = ta.Scan(scaled_train_samples, train_lables, model=spambase, params=p)

  0%|          | 0/48 [00:00<?, ?it/s]

Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
Use tf.cast instead.


100%|██████████| 48/48 [01:11<00:00,  1.29s/it]


In [27]:
from pprint import pprint

result_spambase = pd.DataFrame(scan_object.data)
result_spambase.head()
result_spambase.sort_values(by='acc', ascending=True).head()

Unnamed: 0,round_epochs,val_loss,val_acc,loss,acc,activation,optimizer,losses,hidden_layers,batch_size,epochs
42,10,0.008452,0.987285,0.011245,0.979481,elu,Adam,logcosh,1,40,10
38,10,0.007411,0.990277,0.009918,0.980442,relu,Adam,logcosh,2,40,10
31,10,0.007933,0.988033,0.010539,0.981084,elu,Adam,logcosh,2,40,10
8,10,0.007398,0.988781,0.00977,0.982046,relu,Adam,logcosh,1,40,10
15,10,0.006093,0.990277,0.008171,0.983328,relu,Adam,logcosh,2,30,10


**Ergebniss:**  
Unser Modell ist zwar dermaßen overfitted, dass einem schlecht wird, ABER wir haben eine Genauigkeit bei den Validationdaten von circa 99%.