# 1. Cloud Computing

### 3. Get Access to GPU Instances 

**Elastic Computing** - the degree to which a system is able to adapt to workload changes by provisioning and de-provisioning resources in an autonomic manner, such that at each point in time the available resources match the current demand as closely as possible.



# 2. Convolutional Neural Networks


In [1]:
from keras.datasets import mnist

Using TensorFlow backend.


Categorical cross entropy:
* looks at the ground truth label and the prediction, and outputs a large value if the prediction is far off from the ground truth, and a small value if the prediction is close to the ground truth 

In [9]:
(X_train, Y_train), (X_test, Y_test) = mnist.load_data()


In [2]:
# import matplotlib.pyplot as plt
# %matplotlib inline
# import matplotlib.cm as cm 
# import numpy as np

# fig = plt.figure(figsize=(20,20))
# for i in range(6):
#     ax = fig.add_subplot(1, 6, i+1, xticks=[], yticks=[])
#     ax.imshow(X_train[i], cmap='gray')
#     ax.set_title(str(Y_train[i]))

Multilayer Perceptron vs. Convolutional Neural Networks when dealing with images
* MLPs take in vectors as input, they have no knowledge that the input vector was previously an image (aka an array) 
* CNNs make use of the fact that the input data is an image (a 2D matrix that can be represented with numbers)



In [1]:
# def visualize_input(img, ax):
#     ax.imshow(img, cmap='gray')
#     width, height = img.shape
#     thresh = img.max()/2.5
#     for x in range(width):
#         for y in range(height):
#             ax.annotate(str(round(img[x][y],2)), xy=(y,x),
#                         horizontalalignment='center',
#                         verticalalignment='center',
#                         color='white' if img[x][y]<thresh else 'black')

# fig = plt.figure(figsize = (12,12)) 
# ax = fig.add_subplot(111)
# visualize_input(X_train[0], ax)

In [12]:
# rescale [0,255] --> [0,1]
X_train = X_train.astype('float32')/255
X_test = X_test.astype('float32')/255


In [13]:

from keras.utils import np_utils

# print first ten (integer-valued) training labels
print('Integer-valued labels:')
print(y_train[:10])

# one-hot encode the labels
y_train = np_utils.to_categorical(y_train, 10)
y_test = np_utils.to_categorical(y_test, 10)

# print first ten (one-hot) training labels
print('One-hot labels:')
print(y_train[:10])

Integer-valued labels:
[5 0 4 1 9 2 1 3 1 4]
One-hot labels:
[[0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
 [0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]]


In [14]:
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten

# define the model
model = Sequential()
model.add(Flatten(input_shape=X_train.shape[1:]))
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(10, activation='softmax'))

# summarize the model
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
flatten_1 (Flatten)          (None, 784)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 512)               401920    
_________________________________________________________________
dropout_1 (Dropout)          (None, 512)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 512)               262656    
_________________________________________________________________
dropout_2 (Dropout)          (None, 512)               0         
_________________________________________________________________
dense_3 (Dense)              (None, 10)                5130      
Total params: 669,706
Trainable params: 669,706
Non-trainable params: 0
_________________________________________________________________


In [15]:
# compile the model
model.compile(loss='categorical_crossentropy', optimizer='rmsprop', 
              metrics=['accuracy'])

In [16]:

# evaluate test accuracy
score = model.evaluate(X_test, y_test, verbose=0)
accuracy = 100*score[1]

# print test accuracy
print('Test accuracy: %.4f%%' % accuracy)

Test accuracy: 15.0300%


In [19]:
from keras.callbacks import ModelCheckpoint   

# train the model
checkpointer = ModelCheckpoint(filepath='mnist.model.best.hdf5', 
                               verbose=1, save_best_only=True)
hist = model.fit(X_train, y_train, batch_size=128, epochs=10,
          validation_split=0.2,
          verbose=1, shuffle=True)

Train on 48000 samples, validate on 12000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [None]:
model.load_weights('mnist.model.best.hdf5')

In [None]:
# evaluate test accuracy
score = model.evaluate(X_test, y_test, verbose=0)
accuracy = 100*score[1]

# print test accuracy
print('Test accuracy: %.4f%%' % accuracy)

### Local Connectivity 

MLPs
* only use fully connected layers (lots of parameters for small 28x28 images)
* only accepts vectors as input (no knowledge of where the pixels are located in reference to one another)

CNNs
* use sparsely connected layers 
* accepts matrices as input 

**locally connected layers** uses far fewer parameters than a densely connected layer 
* less prone to over fitting 
* better at teasing out patterns contained in image data
* parameter sharing
    
weight sharing 
* common weights for different regions of an image 

### Convolutional Layers

CNN single layer 
* layer_i --> $\text{CONV}(\text{layer_i})$ --> $g(\text{CONV}(\text{layer_i}))$ --> output

* Common to have 10s to 100s of filters/kernels for each CONV layer

* edge detector filters are super importante for CNNs   
* for 3D images (RGB), it's proper to have 3D filters 

* we're assuming filters are randomly generated 
    * randomly designed patterns that the filters are going to detect is the result of this... 
    * filters are updated at each epoch to minimize the (categorical cross entropy) loss function 
    
### Striding and padding 

* striding: how many pixels/indices one slides over an image 

* padding: a border (of zeros) around the 2D matrix so as to maintain shape in the next layer  
    * 'VALID' = no padding
    * 'SAME' = shape of the convolved layer is the same as the input layer shape 

### Convolutional Layers in Keras

In [None]:
from keras.layers import Conv2D

Conv2D(filters, kernel_size, strides, padding, activation='relu', input_shape)

# if Conv layer is right after input layer, you must specify input_shape=(height,width, depth)

In [54]:
from keras.models import Sequential
from keras.layers import Conv2D

model = Sequential()
model.add(Conv2D(filters=16, kernel_size=2, strides=2, padding='valid', 
    activation='relu', input_shape=(200, 200, 2)))
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_35 (Conv2D)           (None, 100, 100, 16)      144       
Total params: 144
Trainable params: 144
Non-trainable params: 0
_________________________________________________________________


* output shape changes with the stride # and padding_type (and kernel_size if padding='VALID'
* param # changes by the number of filters, kernel size, and input shape 

* in a convolutional layer:
    * recall that there is one bias variable for each filter, thus num_biases = num_filters 
    * num_weights = num_filters $*$ filter_height $*$ filter_width $*$ depth of input_layer
    * num_params = num_weights $+$ num_biases

### Dimensionality 

* The shape of a convolutional layer depends on the supplied values of kernel_size, input_shape, padding, and stride. Let's define a few variables:

* K - the number of filters in the convolutional layer
* F - the height and width of the convolutional filters
* S - the stride of the convolution
* H_in - the height of the previous layer
* W_in - the width of the previous layer
Notice that K = filters, F = kernel_size, and S = stride. Likewise, H_in and W_in are the first and second value of the input_shape tuple, respectively.

The depth of the convolutional layer will always equal the number of filters K.

If padding = 'same', then the spatial dimensions of the convolutional layer are the following:

`height = ceil(float(H_in) / float(S))`

`width = ceil(float(W_in) / float(S))`

If padding = 'valid', then the spatial dimensions of the convolutional layer are the following:

`height = ceil(float(H_in - F + 1) / float(S))`

`width = ceil(float(W_in - F + 1) / float(S))`

In [68]:
from keras.models import Sequential
from keras.layers import Conv2D

model = Sequential()
model.add(Conv2D(filters=32, kernel_size=3, strides=2, padding='valid', 
    activation='relu', input_shape=(5, 5, 3)))
model.summary()


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_49 (Conv2D)           (None, 2, 2, 32)          896       
Total params: 896
Trainable params: 896
Non-trainable params: 0
_________________________________________________________________


### Pooling Layers

* too high of a dimensionality of a convolutional layer leads to too many parameters, which can lead to overfitting, so pooling layers help shorten this 
    * expresses maxpooling and global average pooling 

In [78]:
# Keras MaxPooling example 

from keras.layers import MaxPooling2D

pool_size, strides, padding = (1,1), 0, 'valid'

MaxPooling2D(pool_size, strides, padding)

<keras.layers.pooling.MaxPooling2D at 0x18eaf74c1d0>

In [86]:
from keras.models import Sequential

model = Sequential()
model.add(MaxPooling2D(pool_size=2, strides=4, input_shape=(100, 100, 15)))
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
max_pooling2d_16 (MaxPooling (None, 25, 25, 15)        0         
Total params: 0
Trainable params: 0
Non-trainable params: 0
_________________________________________________________________


### CNNs for Image Classification 

* Conv layers detect regional patterns in an image
* pooling layers reduce the dimensionality of our arrays 
* CNNs input images have to be all the same size (preprocessing must be done for this)

While flowing through a CNN we,
* go from, pixel by pixel representation, to, are there ears in this photo???? 
    * **spatial representation** lowers and **content representation** increases 

### Image Augmentation in Keras

* invariant representation - image still has an object no matter the size/angle/location on the image the object is
    * scale invariance 
    * rotation invariance 
    * translation invariance (embedded within CNNs)
* **data augmentation** - adding training examples (object is rotated, shifted in the image) so that you can make stronger predictions 

#### Fun Fact
ssh is a software package that enables secure system administration and file transfers over insecure networks.


### Mini Project: Image Augmentation
* IF YOU HAVE TOO MANY KERNELS OPEN, YOU MIGHT BE USING TOO MUCH MEMORY! Remove all kernels from running so you can focus on one kernal on the AWS server.

### Visualizing CNNs 

* Visualizing activation and convolutional maps 
* taking filters from conv layers and constructing images that maximize their activations 



### Transfer Learning

* take learned understanding from one NN model and transfer it to another NN  
    * for transfer learning: keep earlier layers and train on the new layers and finally a new output layer 

let good_dataset = the large, good, well-trained, critically acclaimed :P dataset

* General process of tranfser learning:
    * randomly initialize weights for new layers 
    * initialize rest of the weights with pre-trained weights (from the already learned, acclaimed gucci model, gg)
    * re-train the whole NN 
    
* if new dataset is small and similar to good_dataset:
    * apply transfer learning to the end of the NN
    
* if new dataset is large and very different from good_dataset:
    * apply transfer learning farther on in the NN 
    
OR:

    |                  | similar          | different        |
    |------------------|------------------|------------------|
    | small            | end of convNet   | start of convNet |
    |------------------|------------------|------------------|
    | large            | fine-tune        | fine-tune/retrain|
    
    
    
If the new data set is **small** and **similar** to the original training data:

* slice off the end of the neural network
* add a new fully connected layer that matches the number of classes in the new data set
* randomize the weights of the new fully connected layer; **freeze** all the weights from the pre-trained network (to prevent overfitting)
* train the network to update the weights of the new fully connected layer

If the new data set is **small** and **different** from the original training data:

* slice off most of the pre-trained layers near the beginning of the network
* add to the remaining pre-trained layers a new fully connected layer that matches the number of classes in the new data set
* randomize the weights of the new fully connected layer; **freeze** all the weights from the pre-trained network
* train the network to update the weights of the new fully connected layer

If the new data set is **large** and **similar** to the original training data:

* remove the last fully connected layer and replace with a layer matching the number of classes in the new data set
* randomly initialize the weights in the new fully connected layer
* initialize the rest of the weights using the pre-trained weights
* re-train the **entire** neural network

If the new data set is **large** and **different** from the original training data:

* remove the last fully connected layer and replace with a layer matching the number of classes in the new data set
* retrain the network from scratch with randomly initialized weights for the new layers (and initialzing the pre-trained weights with their pre-trained weights) 
* alternatively, you could just use the same strategy as the "large and similar" data case


### Dog-Breed Classifier

* **Cascade of Classifiers**
    * instead of applying all features (6000 in the example) on a window,  the features are grouped into different stages of classifiers and applied one-by-one.
        * if the window fails for a stage(?), drop it
        * otherwise, if the window cascades through all the classifier stages, then that feature is detected in the image. (seems a little of an implementation for me, maybe my intuition is wrong).

In [None]:
from sklearn.datasets import load_files
import numpy as np 

# load files returns a type Bunch, a dictionary-like object with 'interesting attributes'
load_files('some_madeup_path')

# stack arrays vertically 
np.vstack 

# Python Imaging Library
from PIL import ImageFile

# keras.preprocessing imports module image, which builds on top of PIL library 
from keras.preprocessing import image

# loading a model with the best weights from checkpointers 
Xception_model.load_weights('saved_models/weights.best.VGG16.hdf5')

# get index of predicted dog breed for each image in test set
VGG16_predictions = [np.argmax(VGG16_model.predict(np.expand_dims(feature, axis=0))) for feature in test_VGG16]

# report test accuracy
test_accuracy = 100*np.sum(np.array(VGG16_predictions)==np.argmax(test_targets, axis=1))/len(VGG16_predictions)
print('Test accuracy: %.4f%%' % test_accuracy)

# returns "True" if face is detected in image stored at img_path
def face_detector(img_path):
    img = cv2.imread(img_path)
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    faces = face_cascade.detectMultiScale(gray)
    return len(faces) > 0

### returns "True" if a dog is detected in the image stored at img_path
def dog_detector(img_path):
    prediction = ResNet50_predict_labels(img_path)
    return ((prediction <= 268) & (prediction >= 151)) 
