# Getting started with Deep Learning

Tutors: Fabian Eitel (Fabian.Eitel@charite.de) and Talia Kimber (talia.kimber@charite.de)

# 1. Aims of this session

Get a rough idea of how artificial neural networks (ANNs) work, how an implementation in Keras looks like and how suitable they are for tabular data.

# Learning goals


## Theory

* Building blocks of ANNs
* Model training

## Practical

* Learn to understand the basics using the Tensorflow playground
* Learn to read a model definition in Python using Keras
* Run a pipeline of an ANN on the ADNI tabular data
* Investigate what filters learn at different layers

# References

* Stanford Course on Deep Learning http://cs231n.github.io/

## Theory


### Building blocks of artificial neural networks
Showing some of the blocks that can be used when training neural networks and some widely used examples.

__Layer types__
* Fully connected/linear/dense layers
* Convolutional layers
* Pooling layers and other down/upsampling layers
* Utility layers like input and output layers
* Batch normalization

__Activation types__
* Sigmoid
* Linear
* Tanh
* ReLU
* Leaky ReLU and other variants

__Regularizers__
* L1 regularization (used in LASSO)
* L2 regularization / almost the same as weight decay (used in Ridge regression)
* Dropout
* Early stopping

__Data functions__
* Normalization (e.g. using mean and standard deviation)
* Data augmentation
* Feature reduction (e.g. Principal Component Analysis [PCA])

__Cost functions__
Cost functions depend on your type of analysis, i.e. regression, binary classification, multi-class classification etc.
* Softmax
* Cross-entropy
* Binary cross-entropy
* Kullback-Leibler Divergence
* Smooth losses
* Mean-squared error

For more information on each topic view the course link in the references.

https://scs.ryerson.ca/~aharley/vis/conv/

https://www.youtube.com/watch?v=AgkfIQ4IGaM

# 2. Playground excersises

__Introduction__


https://playground.tensorflow.org

Tensorflow playground is a neural network framework you can use in your browser. Unlike the name says its not based on the popular Tensorflow program. It allows to get some intuition on neural network workings.

__2.1 Exercise__

Use the XOR dataset with 1 hidden layer and try out different loss functions:

https://playground.tensorflow.org/#activation=tanh&batchSize=10&dataset=xor&regDataset=reg-plane&learningRate=0.03&regularizationRate=0&noise=0&networkShape=4&seed=0.82689&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false

__2.2 Exercise__

What happens if you add more features one by one?
Start with X12. Maybe you can add extra layers and neurons too.

https://playground.tensorflow.org/#activation=relu&batchSize=10&dataset=xor&regDataset=reg-plane&learningRate=0.03&regularizationRate=0&noise=0&networkShape=4&seed=0.82689&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=true&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false

__2.3 Exercise__

Let's try a different dataset. Investigate the effects of the learning rate on the training results:

https://playground.tensorflow.org/#activation=relu&batchSize=10&dataset=circle&regDataset=reg-plane&learningRate=0.001&regularizationRate=0&noise=0&networkShape=4,2&seed=0.19504&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false

Real data is never this clean, it is usually were noisy. Now, use the same model from above and add some noise to the data distribution (middle slider on the bottom left). How does it affect the data (you see it on the right) and your model performance?

__2.4 Exercise__

After you have added the noise, try out L1 and L2 regularization. When does it help?

__2.5 Exercise__

What does it mean for a model to _converge_? Here is an example where it does not converge. Can you fix it?

https://playground.tensorflow.org/#activation=tanh&batchSize=10&dataset=xor&regDataset=reg-plane&learningRate=0.3&regularizationRate=0&noise=25&networkShape=4,2&seed=0.84469&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false

__2.6 Exercise__

Use everything you have learned so far on the more challenging spiral data:

https://playground.tensorflow.org/#activation=tanh&batchSize=10&dataset=spiral&regDataset=reg-plane&learningRate=0.03&regularizationRate=0&noise=25&networkShape=4,2&seed=0.07992&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false

Can you beat my quick experiments?

https://playground.tensorflow.org/#activation=relu&regularization=L2&batchSize=10&dataset=spiral&regDataset=reg-plane&learningRate=0.03&regularizationRate=0.03&noise=25&networkShape=5,4,2&seed=0.16124&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=true&xSquared=true&ySquared=true&cosX=false&sinX=true&cosY=false&sinY=true&collectStats=false&problem=classification&initZero=false&hideText=false

## Practical part

In [1]:
# Import required packages
import numpy as np
import pandas as pd

from sklearn.svm import SVC, LinearSVC
from sklearn.metrics import balanced_accuracy_score

In [2]:
# Load data table
df = pd.read_csv("data/alzheimers_disease_rand.csv")
# Print first 5 rows
df.head()

  interactivity=interactivity, compiler=compiler, result=result)


Unnamed: 0,RID,VISCODE,SITE,EXAMDATE,DX_bl,AGE,PTGENDER,PTEDUCAT,WORK,PTETHCAT,...,PTAU_bl,FDG_bl,PIB_bl,AV45_bl,Years_bl,Month_bl,Month,M,update_stamp,Unnamed: 109
0,128,bl,164,2005-09-08,CN,74.2,Male,16,technical writer and editor,Not Hisp/Latino,...,,1.36665,,,0.0,0.0,0.0,0.0,2019-12-04 04:19:56.0,
1,129,bl,164,2005-09-12,AD,82.4,Male,18,Secretary,Not Hisp/Latino,...,22.83,1.08355,,,0.0,0.0,0.0,0.0,2019-12-04 04:19:56.0,
2,129,m06,164,2006-03-13,AD,81.4,Male,18,Elementary school teacher,Not Hisp/Latino,...,22.83,1.08355,,,0.498289,5.96721,6.0,6.0,2019-12-04 04:19:56.0,
3,129,m12,164,2006-09-12,AD,81.3,Male,18,Communication,Not Hisp/Latino,...,22.83,1.08355,,,0.999316,11.9672,12.0,12.0,2019-12-04 04:19:56.0,
4,129,m24,164,2007-09-12,AD,80.5,Male,18,Accounting,Not Hisp/Latino,...,22.83,1.08355,,,1.99863,23.9344,24.0,24.0,2019-12-04 04:19:56.0,


In [3]:
list(df.keys())

['RID',
 'VISCODE',
 'SITE',
 'EXAMDATE',
 'DX_bl',
 'AGE',
 'PTGENDER',
 'PTEDUCAT',
 'WORK',
 'PTETHCAT',
 'PTRACCAT',
 'PTMARRY',
 'APOE4',
 'FDG',
 'PIB',
 'AV45',
 'ABETA',
 'TAU',
 'PTAU',
 'CDRSB',
 'ADAS11',
 'ADAS13',
 'ADASQ4',
 'MMSE',
 'RAVLT_immediate',
 'RAVLT_learning',
 'RAVLT_forgetting',
 'RAVLT_perc_forgetting',
 'LDELTOTAL',
 'DIGITSCOR',
 'TRABSCOR',
 'FAQ',
 'MOCA',
 'EcogPtMem',
 'EcogPtLang',
 'EcogPtVisspat',
 'EcogPtPlan',
 'EcogPtOrgan',
 'EcogPtDivatt',
 'EcogPtTotal',
 'EcogSPMem',
 'EcogSPLang',
 'EcogSPVisspat',
 'EcogSPPlan',
 'EcogSPOrgan',
 'EcogSPDivatt',
 'EcogSPTotal',
 'FLDSTRENG',
 'IMAGEUID',
 'Ventricles',
 'Hippocampus',
 'WholeBrain',
 'Entorhinal',
 'Fusiform',
 'MidTemp',
 'ICV',
 'DX',
 'mPACCdigit',
 'mPACCtrailsB',
 'EXAMDATE_bl',
 'CDRSB_bl',
 'ADAS11_bl',
 'ADAS13_bl',
 'ADASQ4_bl',
 'MMSE_bl',
 'RAVLT_immediate_bl',
 'RAVLT_learning_bl',
 'RAVLT_forgetting_bl',
 'RAVLT_perc_forgetting_bl',
 'LDELTOTAL_BL',
 'DIGITSCOR_bl',
 'TRABSCOR_b

In [4]:
df = df[df.VISCODE == "m12"]
df = df[df.DX != "MCI"]

In [5]:
df = df.dropna(subset=["Hippocampus", "DX", "Ventricles"])

### Data splitting

In [6]:
# Get an array with the number of samples
indices = np.arange(len(df))
print("Order before shuffling: %s" % indices[:5])

# Shuffle that array
np.random.seed(42) # fix a seed so each random event can be repeated
np.random.shuffle(indices)
print("Order after shuffling: %s"  % indices[:5])

Order before shuffling: [0 1 2 3 4]
Order after shuffling: [499 587 195 165 543]


In [7]:
# Take first 80% as a training set
len_training = int(len(indices) * 0.8) # use int() function to remove decimals
print("Number of samples for training set: %i" % len_training)

# Select the first 80% indices
train_idx = indices[0:len_training] # pick 0 to the value of len_training from the indices array

Number of samples for training set: 473


In [8]:
# Take the remaining data and split it 50/50
remaining_samples = len(indices) - len_training
len_validation = int(np.ceil(remaining_samples/2)) # round up once
len_test = int(np.floor(remaining_samples/2)) # round down once

# Select from the indices array the individual groups
validation_idx = indices[len_training:len_training+len_validation]
test_idx = indices[len_training+len_validation:len(indices)]

In [9]:
print("Number of training samples: %i" % len(train_idx))
print("Number of validation samples: %i" % len(validation_idx))
print("Number of test samples: %i" % len(test_idx))
print("Total number of samples: %i" % (len(train_idx) + len(validation_idx) + len(test_idx)))

Number of training samples: 473
Number of validation samples: 60
Number of test samples: 59
Total number of samples: 592


In [10]:
X = df[["Hippocampus", "AGE", "Ventricles"]]
#X.insert(column="hipp_diff" , value=(df["Hippocampus"] - df["Hippocampus_bl"]), loc=2)
#X.insert(column="SEX", value=(df["PTGENDER"]=="Male"), loc=2)

y = df["DX"]

X = X.reset_index(drop=True)
y = y.reset_index(drop=True)

In [11]:
from sklearn.preprocessing import StandardScaler

In [12]:
scaler = StandardScaler()
scaler.fit(X.loc[train_idx])

StandardScaler(copy=True, with_mean=True, with_std=True)

In [13]:
classifier = SVC(C=10, kernel='poly')

In [14]:
classifier.fit(X=scaler.transform(X.loc[train_idx]), y=y.loc[train_idx])

SVC(C=10, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='scale', kernel='poly',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False)

In [15]:
scaler.transform(X.loc[train_idx])

array([[-0.08525704, -2.5468523 ,  0.23374712],
       [ 0.46067324, -0.73130483, -1.13330459],
       [-1.11172159,  1.52299995, -0.80895342],
       ...,
       [ 0.16151356,  0.88755834, -0.35729376],
       [-0.63732949,  1.17502002, -0.83601901],
       [ 0.40828417, -2.75866617, -0.85659059]])

In [16]:
# Training prediction
y_pred = classifier.predict(X=scaler.transform(X.loc[train_idx]))
print(balanced_accuracy_score(y_true=y.loc[train_idx], y_pred=y_pred))

# Validation prediction
y_pred = classifier.predict(X=scaler.transform(X.loc[validation_idx]))
print(balanced_accuracy_score(y_true=y.loc[validation_idx], y_pred=y_pred))

0.9012849380326771
0.8


## Artificial Neural Network

_This is an unfinished example meant as an optional, advanced exercise_

In [17]:
from keras.models import Sequential
from keras.layers import Dense, Conv2D, MaxPooling2D, Dropout
from keras.regularizers import l2
from keras.optimizers import Adam
from keras.models import load_model
from keras.callbacks import EarlyStopping

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


In [18]:
def SimpleNet(drop_rate=0., weight_dcay=0.):
    model = Sequential()

    model.add(Dense(units=8, activation='relu', kernel_regularizer=l2(weight_dcay)))
    model.add(Dense(units=1, activation='sigmoid'))
    return model

In [None]:
model = SimpleNet()
opti = Adam(lr=lr, decay=lr_decay)
model.compile(optimizer=opti, loss='binary_crossentropy', metrics=['accuracy'])