# Deep learning in python
Deep learning does have a lot of fascinating math under the covers, but you do not need to know it to be able to pick it up as a tool and wield it on important projects and deliver real value. From the applied perspective, deep learning is quite a shallow field and a motivated developer can quickly pick it up and start making very real and impactful contributions. 

## Outline
* Neural network review
* Intro to tensorflow
* Intro to keras
* Machine learning demo

## Neural Networks
* The goal is not to create realistic models of the brain, but instead to develop robust algorithms and data structures that we can use to model dificult problems.
* Mathematically, they are capable of learning any mapping function and have been proven to be a universal approximation algorithm
* Neural networks are made up of neurons, weights and activation functions

## Neurons
* These are simple computational units that have weighted input signals and produce an output signal using an activation function
* You may be familiar with linear regression, in which case the weights on the inputs are very much like the coefficients used in a regression equation

<br>
<img src="./images/perceptron.jpeg" alt="drawing" width="300px"/>

* The weighted inputs are summed and passed through an activation function, sometimes called a transfer function. 
* An activation function is a simple mapping of summed weighted input to the output of the neuron. 
* It is called an activation function because it governs the threshold at which the neuron is activated and the strength of the output signal. 

## Networks of neurons
Neurons are arranged into networks of neurons. A row of neurons is called a layer and one network can have multiple layers. The architecture of the neurons in the network is often called the network topology.

<img src="./images/network.jpeg" width="300px">


* The bottom layer that takes input from your dataset is called the visible layer, because it is the exposed part of the network
* Layers after the input layer are called hidden layers because they are not directly exposed to the input
* The final hidden layer is called the output layer and it is responsible for outputting a value or vector of values that correspond to the format required for the problem
* The choice of activation function in the output layer is strongly constrained by the type of problem that you are modeling for example
    * A regression problem may have a single output neuron and the neuron may have no activation function.
    * A binary classification problem may have a single output neuron and use a sigmoid activation function to output a value between 0 and 1 to represent the probability of predicting a value for the primary class. This can be turned into a crisp class value by using a threshold of 0.5 and snap values less than the threshold to 0 otherwise to 1.
    
    <img src="./images/network-weights.jpeg" width="400px">


## Stochastic gradient descent
* preferred training algorithm for neural networks is called stochastic gradient descent
* exposed to the network to an input
* The network processes the input 
* This is called a forward pass on the network 
* The output of the network is compared to the expected output and an error is calculated. 
* This error is then propagated back through the network, one layer at a time, and the weights are updated according to the amount that they contributed to the error. 
* The process is repeated for all of the examples in your training data. One round of updating the network for the entire training dataset is called an epoch. 
* Alternatively, the errors can be saved up across all of the training examples and the network can be updated at the end. This is called batch learning and is often more stable.

<img src="./images/gradient.jpeg" width="400px">

## Prediction
* Once a neural network has been trained it can be used to make predictions. 
* You can make predictions on test or validation data in order to estimate the skill of the model on unseen data. 

## Tensorflow
TensorFlow™ is an open source software library for high performance numerical computation. Its flexible architecture allows easy deployment of computation across a variety of platforms (CPUs, GPUs, TPUs), and from desktops to clusters of servers to mobile and edge devices.
https://www.tensorflow.org/

* Installing with python is easy
```bash
pipenv install tensorflow
```
* Api is "low level"
* provides flexibility for building data structures for numerical computation (not just neural nets)

## Keras
* Provides a high level api to build neural networks with tensorflow
* Officially reccomended api

```bash
pipenv install keras
```

## Demo (Pima indians dataset)
*  It describes patient medical record data for Pima Indians and whether they had an onset of diabetes within five years. 
* It is a binary classification problem (onset of diabetes as 1 or not as 0)


In [13]:
# Create your first MLP in Keras
from keras.models import Sequential
from keras.layers import Dense
import numpy

In [14]:
# load pima indians dataset
dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",")

# split into input (X) and output (Y) variables
X = dataset[:, 0:8]
Y = dataset[:, 8]

## Let's build the neural network
* How do we know the number of layers to use and their types? 
* This is a very hard question. There are heuristics that we can use and often the best network structure is found through a process of trial and error experimentation. 
* Generally, you need a network large enough to capture the structure of the problem

<img src="./images/sparse-nn.png" width="800px">


In [15]:
# create model
# Here we are using a sequential model (simplest neural network)
model = Sequential()

# The first layer is the input layer
# we have to specify the input dimension of our data
model.add(Dense(12, input_dim=8, init='uniform', activation='relu'))

# First hidden layer
model.add(Dense(8, init='uniform', activation='relu'))

# last hidden layer (output)
# here was use the sigmoid activation function to give a binary classification output
model.add(Dense(1, init='uniform', activation='sigmoid'))

# Compile model
# Uses tensorflow on the backend to find the most efficient way to represent the network on your hardware
model.compile(loss='binary_crossentropy',
              optimizer='adam', metrics=['accuracy'])



In [16]:
# Fit the model
model.fit(X, Y, nb_epoch=150, batch_size=10)

  from ipykernel import kernelapp as app


Epoch 1/150
Epoch 2/150
Epoch 3/150
Epoch 4/150
Epoch 5/150
Epoch 6/150
Epoch 7/150
Epoch 8/150
Epoch 9/150
Epoch 10/150
Epoch 11/150
Epoch 12/150
Epoch 13/150
Epoch 14/150
Epoch 15/150
Epoch 16/150
Epoch 17/150
Epoch 18/150
Epoch 19/150
Epoch 20/150
Epoch 21/150
Epoch 22/150
Epoch 23/150
Epoch 24/150
Epoch 25/150
Epoch 26/150
Epoch 27/150
Epoch 28/150
Epoch 29/150
Epoch 30/150
Epoch 31/150
Epoch 32/150
Epoch 33/150
Epoch 34/150
Epoch 35/150
Epoch 36/150
Epoch 37/150
Epoch 38/150
Epoch 39/150
Epoch 40/150
Epoch 41/150
Epoch 42/150
Epoch 43/150
Epoch 44/150
Epoch 45/150
Epoch 46/150
Epoch 47/150
Epoch 48/150
Epoch 49/150
Epoch 50/150
Epoch 51/150
Epoch 52/150
Epoch 53/150
Epoch 54/150
Epoch 55/150
Epoch 56/150
Epoch 57/150
Epoch 58/150
Epoch 59/150
Epoch 60/150
Epoch 61/150
Epoch 62/150
Epoch 63/150
Epoch 64/150
Epoch 65/150
Epoch 66/150
Epoch 67/150
Epoch 68/150
Epoch 69/150
Epoch 70/150
Epoch 71/150
Epoch 72/150
Epoch 73/150
Epoch 74/150
Epoch 75/150
Epoch 76/150
Epoch 77/150
Epoch 78

<keras.callbacks.History at 0x1213bdeb8>

In [17]:
# evaluate the model
scores = model.evaluate(X, Y)
print("%s: %.2f%%" % (model.metrics_names[1], scores[1] * 100))

acc: 78.39%


In [None]:
# Saving 33% of data for validation/testing
model.fit(X, Y, validation_split=0.33, nb_epoch=150, batch_size=10)

In [10]:
# Multiclass Classification with the Iris Flowers Dataset
import numpy
import pandas
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from keras.utils import np_utils
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.preprocessing import LabelEncoder
from sklearn.pipeline import Pipeline
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# load dataset
dataframe = pandas.read_csv("iris.csv", header=None)
dataset = dataframe.values
X = dataset[:,0:4].astype(float)
Y = dataset[:,4]
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
# convert integers to dummy variables (i.e. one hot encoded)
dummy_y = np_utils.to_categorical(encoded_Y)

# define baseline model
def baseline_model():
    # create model
    model = Sequential()
    model.add(Dense(4, input_dim=4, init='normal', activation='relu'))
    model.add(Dense(3, init='normal', activation='sigmoid'))
    # Compile model
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

estimator = KerasClassifier(build_fn=baseline_model, nb_epoch=200, batch_size=5, verbose=0)
kfold = KFold(n_splits=10, shuffle=True, random_state=seed)
results = cross_val_score(estimator, X, dummy_y, cv=kfold)
print("Accuracy: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))




Accuracy: 33.33% (13.00%)
