# Chapter 10 Notes: Introduction to Artificial Neural Networks with Keras

# From Biological to Artificial Neurons
 - McCulloch Pitts 1943 computational paper
## Logical Computatiosn with Neurons
 - Uses artificial binary neurons
 - Can perform basic logic functions like AND, OR, =, OR NOT 
 - Cannot do XOR
## The Perceptron 
 - Frank Rosenblatt 1957
 - Threshold Logic Units (TLU) make up the perceptron
 - inputs and outputs are scalars
 - each input is associated with a weight
 - each TLU computes a weighted sum of its inputs
    - z = $w_1x_1$ + $w_2x_2$ ... = **$x^T$w**
 - Then the TLU applies a step function to the result and outputs the result
    - $h_w$(**x**) = step(z) where z = **$x^T$w**
 - Types of step functions for Preceptrons:
    - Heaviside Step Function - 0 until z > 0, then 1
    - Sign Step Function - -1 for z < 0,  0 for z=0, 1 for z>0
 - A single TLU can perform linear binary classification. 
 - A perceptron is merely a single layer of TLUs and an input layer 
 - The input layer also contains a bias neuron which always outputs 1
 - Perceptrons can do multi-class classification
 - Computing the outputs of a fully connected perceptron layer:
    - $h_{W,b}$ = $\phi$(**XW** + **b**)
    - **X**  # instances by # features
    - **W** wieght matrix, # input neurons by # artificial neurons (TLUs) 
    - **b** bias vector contains weights between bias neuron and all the TLUs. len= # TLUs
    - $\phi$ activation function 
 - Learning Rule: reinforces the connections which help reduce the error
 - $w_{i,j}^{(next Step)}$ = w$_{i,j}$ + $\eta$($y_j$ - $\hat{y}_j$)$x_i$
     - $x_i$ ith input value for this instance
     - $\hat{y}_j$ out put of jth output neuron for this instance
     - $y_j$ tartget output for jth neuron for this instance
     - $\eta$ learning rate
 - Only works for linear problems
 - Perceptrons do not output a class probability
 - Perceptrons cannot perform XOR operations
 - MLP - Multi-Layer Perceptrons can do XOR and other thins

In [11]:
import numpy as np 
from sklearn.datasets import load_iris
from sklearn.linear_model import Perceptron

iris = load_iris()
X = iris.data[:,(2,3)] #length/width of petal 
y = (iris.target == 0).astype(np.int) #1 for setosas

iris.keys()

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  y = (iris.target == 0).astype(np.int) #1 for setosas


dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename'])

In [12]:
per_clf = Perceptron()
per_clf.fit(X,y)
y_pred = per_clf.predict([[2, 0.5]])
print(y_pred)

[0]


 ## The Multilayer Perceptron and Backpropagation
- MLPs are composed of an input layer, 1 or more hidden layers, and an output layer.
    - there is a bias TLU in each layer. This bias unit receives no input from previous layer.
- the input layer consists of pass through units and the other layers are TLUs
- Rumelhart, Hinton, and Williams 1986 introduced backpropagation
- **Backpropagation** - computes the gradient with respect to every single model parameter and for all the layers. Accomplishes this in two passes through the network. Finds how to tweak the weights in order to reduce the error. 
    - handles instances in minibatches (e.g. 32 instances)
    - an epoch is a single cycle through the whole dataset
    - On the *forward pass* the instances are passed through the network and all the intermediate outputs are saved. 
    - error is measured by using a loss function which compares the actual output vs the desired output
    - the chain rule is used to determine how much each output contributed to the error
    - the algorithm works backwards, determining how much of these error contributions came from each connection in the next lower layer. It propagates the error gradient backwards through the network. 
    - gradient descent performed by tweaking all connections in the network using the error gradients just computed. 
- Step function is replaced by a sigmoid function so there is a gradient to follow. $\sigma$ = $\frac{1}{1 + e^{-z}}$
    - this is an activation function like the hyperbolic tangent function or Rectified linear unit
- non-linear activation functions allow the MLP to approximate non-linear continuous functions. 

## Regression MLPs
- you need one output neuron per value you are trying to predict.
    - home value: 1 neuron, size of a rectangle: 2 neurons
- Usually you do not use an activation function for the output neurons. 
    - certain activation functions can bound the outputs within useful ranges
- MSE is the typical loss funciton
    - $\frac{1}{n}$$\sum\limits_{i=1}^{n}$(y-$\hat{y}$)$^2$

## Classification MLPs
- can output the estimated probability for binary classification with a single output neuron
- You need one output neuron per class you are predicting
    - Softmax activation funciton will ensure all the outputs sum to one. This is useful for exclusive multiclass classification
    - cross entropy loss function is useful here

# Implementing MLPs with Keras
- released in 2015
- relies on computation backend (TF, CNTK, theano)
- tensorflow has its own version of keras included. tf.keras
## Installing TensorFlow 2

In [1]:
import tensorflow as tf
from tensorflow import keras
keras.__version__

'2.8.0'

In [2]:
tf.__version__

'2.8.0-dev20211217'

## Building an Image Classifier
- Using fashion MNIST
### Using keras to Load the Dataset
- data represented as 28x28 arrays

In [3]:
fashion_mnist = keras.datasets.fashion_mnist
(X_train_full, y_train_full), (X_test_full, y_test_full) = fashion_mnist.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz


In [4]:
X_train_full.shape

(60000, 28, 28)

In [6]:
X_train_full.dtype

dtype('uint8')

In [8]:
import numpy as np
print(np.max(X_train_full))
print(np.min(X_train_full))

255
0


In [17]:
#split and scale the data (0 to 1)
#data already shuffled
X_valid, X_train = X_train_full[:5000]/255.0, X_train_full[5000:]/255.0
y_valid, y_train = y_train_full[:5000], y_train_full[5000:]

In [18]:
class_names = ["T-shirt/top", "Trouser", "Pullover", "Dress", "Coat",
"Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"]
[class_names[int(this_item)] for this_item in y_train[0:10]]

['Coat',
 'T-shirt/top',
 'Sneaker',
 'Ankle boot',
 'Ankle boot',
 'Ankle boot',
 'Coat',
 'Coat',
 'Dress',
 'Coat']

### Creating the model Using the Sequential API
- sequential api is the simplest form of nn single stack of layers
- add an input layer, preprocesses the data, reshaping the data to be 1D
- add a dense layer with relu. Also manages the bias term for each neuron
- add another layer, 100 neurons 
- add output layer, classes are exlusive 
- alternative syntax is to pass a list of layers when initializing the model

- dense layers produce a lot of parameters 
- more parameters introduces risk of overfitting

In [23]:
#initialize and build the model
model = keras.models.Sequential()
model.add(keras.layers.Flatten(input_shape=[28,28]))
model.add(keras.layers.Dense(300, activation='relu'))
model.add(keras.layers.Dense(100, activation='relu'))
model.add(keras.layers.Dense(10, activation='softmax'))

In [24]:
model.summary()

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 flatten_1 (Flatten)         (None, 784)               0         
                                                                 
 dense_3 (Dense)             (None, 300)               235500    
                                                                 
 dense_4 (Dense)             (None, 100)               30100     
                                                                 
 dense_5 (Dense)             (None, 10)                1010      
                                                                 
Total params: 266,610
Trainable params: 266,610
Non-trainable params: 0
_________________________________________________________________


In [25]:
model.layers

[<keras.layers.core.flatten.Flatten at 0x257ab94e970>,
 <keras.layers.core.dense.Dense at 0x257ac3d0eb0>,
 <keras.layers.core.dense.Dense at 0x257ac60db20>,
 <keras.layers.core.dense.Dense at 0x257ac089a90>]

In [26]:
model.layers[1].name

'dense_3'

In [28]:
weights, biases = model.layers[1].get_weights()

In [30]:
weights.shape

(784, 300)

In [31]:
biases.shape

(300,)

In [32]:
#can use kernel_initializer to alter initialization parameters 

### Compiling the model