# Introduction to Neural Networks with TensorFlow and Keras layers

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [2]:
%matplotlib inline
%pylab inline
import matplotlib.pyplot as plt

Populating the interactive namespace from numpy and matplotlib


In [3]:
import pandas as pd
print(pd.__version__)

1.1.3


In [4]:
import sys
!{sys.executable} -m pip install tensorflow    # ==1.15
#
import tensorflow as tf
# tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
print(tf.__version__)

You should consider upgrading via the '/usr/local/Cellar/jupyterlab/2.1.5/libexec/bin/python3.8 -m pip install --upgrade pip' command.[0m
2.3.1


In [5]:
# let's see what compute devices we have available, hopefully a GPU 
if False:
    sess = tf.Session()
    #
    devices = sess.list_devices()
    for d in devices:
        print(d.name)
        

In [6]:
physical_devices = tf.config.list_physical_devices()
print("Devices:", len(physical_devices))
for dev in physical_devices:
    print(dev.name)

Devices: 2
/physical_device:CPU:0
/physical_device:XLA_CPU:0


In [7]:
# a small sanity check, does tf seem to work ok?
# hello = tf.compat.v1.constant('Hello TF!')
hello = tf.constant('Hi there!')
# sess = tf.compat.v1.Session()
#
# print(sess.run(hello))
print(hello)

tf.Tensor(b'Hi there!', shape=(), dtype=string)


In [8]:
from tensorflow import keras
print(keras.__version__)

2.4.0


## Loading and preparing our data set for classification

In [9]:
!curl -O https://raw.githubusercontent.com/DJCordhose/deep-learning-crash-course-notebooks/master/data/insurance-customers-1500.csv

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 26783  100 26783    0     0  64074      0 --:--:-- --:--:-- --:--:--     0-:--:-- 64074


In [10]:
df = pd.read_csv('./insurance-customers-1500.csv', sep=';')

In [11]:
df.head()

Unnamed: 0,speed,age,miles,group
0,98.0,44.0,25.0,1
1,118.0,54.0,24.0,1
2,111.0,26.0,34.0,0
3,97.0,25.0,10.0,2
4,114.0,38.0,22.0,1


In [12]:
df.describe()

Unnamed: 0,speed,age,miles,group
count,1500.0,1500.0,1500.0,1500.0
mean,122.492667,44.980667,30.434,0.998667
std,17.604333,17.1304,15.250815,0.816768
min,68.0,16.0,1.0,0.0
25%,108.0,32.0,18.0,0.0
50%,120.0,42.0,29.0,1.0
75%,137.0,55.0,42.0,2.0
max,166.0,100.0,84.0,2.0


## First important concept: You train a machine with your data to make it learn the relationship between some input data and a certain label - this is called supervised learning

<img src='https://raw.githubusercontent.com/DJCordhose/deep-learning-crash-course-notebooks/master/img/encoding3.jpg'>

In [14]:
# we deliberately decide "group" is going to be our label, 
# it is often named lower case y (or ordinate)
y = df['group']

In [15]:
# since 'group' is now the label we want to predict, 
# we need to remove it from the training data 
df.drop('group', axis='columns', inplace=True)

In [16]:
# input data often is named upper case X, 
# the upper case indicates, that each row is a vector
# X = df.as_matrix()   # Removed from pandas 1.1.x
X = df.to_numpy()

## Neural Networks using TensorFlow and Keras layers
* Neural Networks consist of artificial neurons you organize in layers
* each neuron is very simple, but, theoretically, having enough of them in a single layer can approximate any funtion
* practically, we use 2 or 3 layers, as this has turned out to work well
* the more neurons and the more layers you use the longer the network takes to train
* neural networks often are no longer approachable using cross validation and grid search to find suitable hyper parameters

## Neuron (aka node or unit)

A neuron takes a number of numerical inputs, multiplies each with a weight, sums up all weighted input and adds bias (constant) to that sum. From this it creates a single numerical output. For one input (one dimension) this would be a description of a line. For more dimensions this describes a hyper plane that can serve as a decision boundary. Typically, this output is transformed using an activation function which compresses the output to a value between 0 and 1 (sigmoid), or between -1 and 1 (tanh) or sets all negative values to zero (relu).

It is not really important to understand the details of a neural network. Practically how you configure them to form something more powerful is much more important. This, however, is still a very experimental domain, so there really is no conscise explanation and understanding how they work.

<img src='https://raw.githubusercontent.com/DJCordhose/deep-learning-crash-course-notebooks/master/img/neuron.jpg'>

### We use a sequential mode, that means data flows without junctions from in to out

In [17]:
model = keras.Sequential()

### We start with a single fully connected layer having 50 neurons
* we have three inputs
  * age 
  * speed
  * miles
* activation function is tanh
* why these parameters: random for now

In [18]:
from tensorflow.keras.layers import Dense

model.add(Dense(50, name='hidden1', activation='tanh', input_dim=3))

### The final layer just transforms to likelyhood for each of our 3 classes

In [19]:
num_categories = 3
model.add(Dense(num_categories, name='softmax', activation='softmax'))

### First, let us have a look at how the input and output from this model would look like

* this model has not been trainined, so do not expect the outputs to be reasonable
* we are only interested in the format of input and output
* note that there is a mismatch between prediction and our known truths in format
* we will fix this in the next step

In [20]:
input = X[0:10]

In [21]:
# combinations of customer data
input

array([[ 98.,  44.,  25.],
       [118.,  54.,  24.],
       [111.,  26.,  34.],
       [ 97.,  25.,  10.],
       [114.,  38.,  22.],
       [130.,  55.,  34.],
       [118.,  40.,  51.],
       [143.,  42.,  34.],
       [120.,  41.,  42.],
       [148.,  33.,  53.]])

In [22]:
# predicted output: likeliyhoods for groups
model.predict(input)

array([[0.05204791, 0.21763855, 0.7303135 ],
       [0.05584538, 0.16135797, 0.7827967 ],
       [0.02877889, 0.17047876, 0.8007423 ],
       [0.05008142, 0.23259573, 0.7173228 ],
       [0.04908125, 0.24614482, 0.7047739 ],
       [0.0493986 , 0.24666113, 0.7039402 ],
       [0.05960433, 0.15291464, 0.78748095],
       [0.04820875, 0.25942487, 0.6923663 ],
       [0.06183838, 0.22208914, 0.71607256],
       [0.02674145, 0.13844317, 0.8348154 ]], dtype=float32)

In [23]:
# true, known output
y[0:10]

0    1
1    1
2    0
3    2
4    1
5    0
6    0
7    1
8    2
9    0
Name: group, dtype: int64

### These are the parameters of the model that need to be learned

In [24]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
hidden1 (Dense)              (None, 50)                200       
_________________________________________________________________
softmax (Dense)              (None, 3)                 153       
Total params: 353
Trainable params: 353
Non-trainable params: 0
_________________________________________________________________


### Bringing it all together
* _sparse_categorical_crossentropy_
  * _crossentropy_: Loss is defined by https://en.wikipedia.org/wiki/Cross_entropy
  * _categorical_: we are comparing categorical data
  * _sparse_: allows us to leave our labels as they are without explicitly turning them into a one-hot encoding 
* _adam_: is the least tedious algorithm to minimize loss (http://cs231n.github.io/neural-networks-3/#ada)
  * auto-tunes most important parameters including learning rate   

In [25]:
model.compile(loss='sparse_categorical_crossentropy',
             optimizer='adam')

# Caution: we have not trained our model, yet, the parameters are still initinialized randomly