# Test the Model
This notebook establishes layers for the CNN (evaluating image size, memory requirements).
a) It experiments with the Keras "Model" and "Layers" using examples from the documentaion.
b) It looks at examples from the book (Geras) and internet on approaches to structuring a CNN model
c) It then goes into a deep dive looking at AlexNet.

Keras References:
https://keras.io/models/model/
https://keras.io/layers/about-keras-layers/

In [27]:
# Sequential Model
# https://keras.io/layers/core/#dense

from keras.models import Sequential
from keras.layers import Dense

# without bias...
# the first (input) stage is 5 inputs * 3 nodes = 15
# the second stage is 3 inputs * 2 nodes = 6
model = Sequential()
model.add(Dense(units=3, input_shape=(5,), use_bias=False))
model.add(Dense(units=2, use_bias=False))
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_27 (Dense)             (None, 3)                 15        
_________________________________________________________________
dense_28 (Dense)             (None, 2)                 6         
Total params: 21
Trainable params: 21
Non-trainable params: 0
_________________________________________________________________


In [34]:
# with bias...
# Bias is applied at the output... (?)

# the first (input) stage is 5 inputs * 3 nodes = 15
# the second stage is 3 inputs * 2 nodes = 6
model = Sequential()
model.add(Dense(units=3, input_shape=(5,), use_bias=True))
model.add(Dense(units=2, use_bias=True))
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_39 (Dense)             (None, 3)                 18        
_________________________________________________________________
dense_40 (Dense)             (None, 2)                 8         
Total params: 26
Trainable params: 26
Non-trainable params: 0
_________________________________________________________________


In [55]:
# categorical output
# https://keras.io/getting-started/sequential-model-guide/

# For a single-input model with 10 classes (categorical classification):

# The questions are... why:
# "relu"
# "softmax"
# conversion to one-hot encoding
# "cross-entropy"

# And the answers are:
# To what is "cross-entropy" applied?   Answer:  cross-entropy is applied
# during training as the distance measure between input X and output y.
# The gradient-descent will use this to determine how to adjust all the
# weights in the network.
#
# Conversion to one-hot encoding?  
# Answer: the author converts Y into a one-hot encoded value before using.
# The reason behind this - we want Y to be categorical - and the easiest
# way to generate random categorical data is to first generate
# a random integer and then convert that to one-hot-binary representation.
# I think this is missing some important aspects - that categorical representation
# implies a probabilistic intent.  If so then the label data should really not be one-hot-binary
# encoded but rather one-hot-probability encoded.  That is for the labels as
# given to "fit" for training - note during runtime ("predict") the
# network will output floats (and the output function is "softmax" which guarantees
# it will be between 0 and 1).  But there is no assurance they SUM to 1 which is what
# we would expect for a Random Variable.   So if your goal is just to classify you can use the
# output and do a maxarg() on it.  But if your goal is to output a random variable then
# it may be necessary to normalize the output.
#
# Experiment 1: Confirm the network as is outputs values between 0 and 1, but they do not sum to 1.
# Experiment 2: There is no reason we could not input labels which ae one-hot-probability (would
# be good experiment)

import keras

model = Sequential()
model.add(Dense(32, activation='relu', input_dim=100))
model.add(Dense(10, activation='softmax'))

model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Generate dummy data
import numpy as np
data = np.random.random((1000, 100))
labels = np.random.randint(10, size=(1000, 1))

# Convert labels to categorical one-hot encoding
one_hot_labels = keras.utils.to_categorical(labels, num_classes=10)

# Train the model, iterating on the data in batches of 32 samples
model.fit(data, one_hot_labels, epochs=10, batch_size=32)


Epoch 1/1


<keras.callbacks.History at 0x7f44ac8fc3c8>

In [56]:
# - OK so it is confirmed that the network output sums to 1.0
# predict(x, batch_size=None, verbose=0, steps=None)

out = model.predict(data[0:4])

print(out[0].sum())
print(out[1].sum())
print(out[2].sum())
print(out[3].sum())



<class 'numpy.ndarray'>
(1000, 100)
(1, 100)
1.0
1.0000001
0.99999994
1.0000001


In [None]:
# next example - from https://becominghuman.ai/building-an-image-classifier-using-deep-learning-in-python-totally-from-a-beginners-perspective-be8dbaf22dd8

classifier.add(Conv2D(32, (3, 3), input_shape = (64, 64, 3), activation = 'relu'))
classifier.add(MaxPooling2D(pool_size = (2, 2)))
classifier.add(Flatten())
classifier.add(Dense(units = 128, activation = 'relu'))
classifier.add(Dense(units = 1, activation = 'sigmoid'))



## Trying to architect the layers for my CNN...

Image size is 480x640.
The site where the car is running will determine the color of the surface and
color of the guideline. For example - Gene's track is a red tape on a carpet (dark on light)
while mine is white tape on a concrete floor (light on dark).  The point
is we cannot use color as an indicator, or even that the line is light
relative to the surface.  Rather - 
the concept of an "edge" (any light-to-dark transition) will be used.  This will
allow the car to run at either site.
Ambient lighting will vary - the overall image may be dark or light.
The line width varies due to perspective and due to width of the material (tape vs rope).
At its furthest distance the tape is maybe 10 pixels, at its nearest point it is 100 pixels.
The line may extend into the distance, at its longst it is 3/4 the height of the image.    

As such:

Early Processing Stages - here we are removing common mode signals
    Remove color (convert to Black/White)
    Normalize
    Augmentation -> invert the image (swap black and white)
    (At this point we could try a threshold detect (convert to binary) but we will leave
    that as a future exercise.)

First stage of processing - here we are performing localized Feature Detection (detecting
edge of the line) and then a pair of edges which constitue a 'line crossing'
    Single edge detection - the highest frequency of our 'signal' is 10 pixels. Nyquist says
    we need to sample at least twice that frequency. But neural net nodes look at a set of pixels
    (multiple samples)...  At the end of the day we want the edge detect filter to be wide enough
    to detect the fuzziest edge transition.  Additional width is OK - those pixels will be
    'trained out' i.e. and the additional computation is nominal.
    
    The plan is to start small
    Width=6  Stride=3.
    We can later experiment with larger width/strides.

    We should be able to 'pool' - get rid of the original image.
    
Mid-Stages of processing
    Here we are up-sampling by convolution.  First to get an 'edge-pair',
    and then combine to get continuity in the long dimension and the overall
    position in the image.

We then go to fully-connected layers.

The final (outout) layer will need to be categorical - our data ("y") is
    one-of-three signals (left,center,right).

=+=========

The images from the camera are mid-sized (480x640).
The examples such as the "Le-Net architecture (Geron - Hands-On Machine Learning - page 370)
use teeny images - 32x32.
Will I be able to train/run it (ballpark) ?

a) Memory Use (during training)
Refererring to:
https://ai.stackexchange.com/questions/3938/how-to-handle-images-of-large-sizes-in-cnn
480 * 640 * 3 channels * 4 bytes/ch * batch size of 10 == 37MB
So the memory requirement is 37MB for images during training.
The parameters and model itself are probably not significant.
Summary - we think we will be OK for memory during training.

Compute cycles during training - this depends on batch size/epoch and number of
paramters, and how fast it reaches a minimim (threshold).  Hard to say.

b) If the feature is localized then selective search and pick region of interest.
https://www.researchgate.net/post/How_to_modify_a_Convolutional_Neural_Network_architecture_to_deal_with_large_input_images
This is not my case but a good approach.

c) CUDA to offload training to a GPU. This can also applied at runtime if you have a local GPU.
d) during training - split the load across cloud compute nodes - i.e. tensorflow !

======================================
THE QUESTION IS - how do I get from 480x630 down to "3"
It appears from the LeNet example - the pooling layers divide
the number of features in half - because stride = 2





## AlexNet
Here for my educational purpose I am re-documenting the "AlexNet" (Geron page 371).

Notes and code to implement AlexNet follow.

Notes:
This application of CNN focuses on image processing.  As such the
data is expressed in 2-d.  The information extracted is therefore kept
in the third dimension (depth) which is the number of maps.
And we observe as layers progress the "image" size (x,y dimensions) decrease, while the third
dimension increases.  That is - as "information" is extracted from the "data"
there is more  contained in the third dimension and less in the first two.

We believe there are three "strata" here: information extraction, application, and rgb extraction
Image extraction is the midle strata - extracting all information possible from the
2-d image, but preserving the "location" of that data as the 2-d representation
Application is the third and final stage in the strata.  It is specific to the
application - creating a one-hot detector.
RGB extraction is the first (input) strata - this is eliminating the RGB content
to vacate the the third dimension so it can be used for information extraction.

```

Name    Type          #maps stride/pad kernel activation resulting dimensions    

                                                         1000 x 1
Out  Fully Connected  1000                    Softmax
                                                         4096 x 1
F9   Fully Connected  4096                    ReLU
                                                         4096 x 1
F8   Fully Connected  4096                    ReLU
                                                          256  13 x 13
C7   Convolution       256  1 "SAME"     3x3  ReLU
                                                          384  13 x 13
C6   Convolution       384  1 "SAME"     3x3  ReLU
                                                          384  13 x 13
C5   Convolution       384  1 "SAME"     3x3  ReLU
                                                          256  13 x 13
S4   Max Pooling            2 "VALID"   13x13
                                                          256  27 x 27
C3   Convolution       256  1 "SAME"     5x5  ReLU
                                                           96  27 x 27
S2   Max Pooling            2 "VALID"    3x3
                                                           96  55 x 55
C1   Convolution        96  4 "SAME"    11x11 ReLU
in                                                     3(RGB) 224 x 224
```
The code follows

In [66]:
#Here is the code for AlexNet:
from keras.models import Sequential
from keras.layers import Dense, Conv2D, MaxPooling2D



fout = Dense(units=1000, activation='softmax')
f9 = Dense(units=4096, activation='relu')
f8 = Dense(units=4096, activation='relu')

c7 = Conv2D(filters=256, strides=(1, 1), padding='same', kernel_size=(3,3), activation='relu')

c6 = Conv2D(filters=384, strides=(1, 1), padding='same', kernel_size=(3,3), activation='relu')

c5 = Conv2D(filters=384, strides=(1, 1), padding='same', kernel_size=(3,3), activation='relu')

s4 = MaxPooling2D(strides=2, padding='valid', pool_size=(13, 13))
c3 = Conv2D(filters=256, strides=(1, 1), padding='same', kernel_size=(5,5), activation='relu')

s2 = MaxPooling2D(strides=2,  padding='valid', pool_size=(3, 3))
c1 = Conv2D(filters=96, strides=(4, 4), padding='same', kernel_size=(11,11), activation='relu', input_shape=(224, 224, 3))


model = Sequential()
model.add(c1)
model.add(s2)
model.add(c3)
model.add(s4)
model.add(c5)
model.add(c6)
model.add(c7)
model.add(f8)
model.add(f9)
model.add(fout)

model.summary()


                

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_12 (Conv2D)           (None, 56, 56, 96)        34944     
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 27, 27, 96)        0         
_________________________________________________________________
conv2d_11 (Conv2D)           (None, 27, 27, 256)       614656    
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 8, 8, 256)         0         
_________________________________________________________________
conv2d_10 (Conv2D)           (None, 8, 8, 384)         885120    
_________________________________________________________________
conv2d_9 (Conv2D)            (None, 8, 8, 384)         1327488   
_________________________________________________________________
conv2d_8 (Conv2D)            (None, 8, 8, 256)         884992    
__________

In [57]:
from metrowestcar_file_io import FileReader
from metrowestcar_display import Displayer

from os import getcwd
from os.path import abspath
from os.path import join
from os.path import exists


file_reader = FileReader()
displayer = Displayer()

fullpath = join(abspath(getcwd()), "../data/pictures_test")
image_array = file_reader.read_images_from_directory(fullpath)
for i in image_array:
    displayer.display_image(i)

image = file_reader.read_image_from_file(join(fullpath,filename))

filename = "control100"
steering = file_reader.read_steering_from_file(join(fullpath,filename))

# displaying an image



displayer.display_thumbnail(image)
