# Convolutional Neural Networks

* So far we've built fully connected feed-forward (FF) NNs, where each input node is connected to each hidden node.
* A fully connected NN that has a 30 unit hidden layer and accepts 28x28 images has $28 \times 28 \times 30 + 30 = 23,550$ weights between the input and hidden layer
* The number of weights will quickly grow if we start dealing with larger color images which have more pixels and where each pixel has three numbers associated with it (red, blue, and green)
* The network will quickly overfit as a result
* For images, the standard way to handle this problem is via the convolutional neural network which consists of convolutional layers, pooling layers, and the usual fully-connected layers.
* Those networks have drastically fewer parameters than FF NNs.

## Convolution Layer

The convolution layer makes two assumptions about its inputs:

1. Inputs that are nearby are related.
2. A detector that could detect a pattern in (x, y) can be used to detect the same pattern in other locations in the image

Both assumptions are very reasonable in images: pixels that are nearby are likely to share statistical properties and a detector that can detect edges at the top of the image, can also detect edges at the bottom.

### One Dimensional Convolution
Let's see how basic convolution operator works in the 1D input case (recall that images are 2D). A single neuron in a convolution layer defines a receptive field that operates over a limited range of inputs. The neuron slides this receptive field over the input to produce the final outputs:

![alt text](figures/1dconv.jpg)

In the Figure, the neuron (opaque magenta) defines a simple operation over the sequence: it computes the average of pairs of neighbors. At the first step, the neuron computes the average of the first two elements. At the second step the neuron computes the average of the second and third elements, and so on. 

The neuron has two weights: 0.5 and 0.5 which mean that both inputs are weighted equally, so it computes a straightforward average. But we can change the weights to anything, and, as you might have guessed, those weights will be free parameters of a convolutional neural network. 

It is important to note that the *same weights* are applied across the entire input.

Here's an example of how one might implement 1D convolution in numpy.

In [27]:
# here's a 1D "image"
input_data = np.array([1,2,3,4,5,6,7,8,9])

# a convolutional layer neuron defines
convolution_kernel = np.array([0.5, 0.5, 0.5])

y_same = np.convolve(input_data, convolution_kernel, "same")

y

array([ 1.5,  3. ,  4.5,  6. ,  7.5,  9. , 10.5, 12. ,  8.5])

The `"same"` argument tells numpy to add $m-1$ zeros at the left of the sequence (where $m$ is the size of the convolution kernel, 3 in this case) so that the output of the convolution operation has the same size as the input. There are other padding options:

* `"full"`: adds $m-1$ zeros at both ends of the sequence. The final output has size $n + m - 1$.
* `"valid"`: computes entries where the kernel and the input sequence fully overlap.

In [36]:
x = [6, 2]
h = [1, 2, 5, 4]

y = np.convolve(h, x, "full")
print(y)

y = np.convolve(h, x, "valid")
print(y)


[ 6 14 34 34  8]
[14 34 34]


We call the convolution operation with a specific kernel (weights) a filter.

### Two Dimensional Convolution

https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-convolutional-neural-networks

# Example (Simple MNIST convnet from Keras Documentation)

In [37]:
from tensorflow import keras
from tensorflow.keras import layers

Init Plugin
Init Graph Optimizer
Init Kernel


In [38]:
num_classes = 10
input_shape = (28, 28, 1)

# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Scale images to the [0, 1] range
x_train = x_train.astype("float32") / 255
x_test = x_test.astype("float32") / 255
# Make sure images have shape (28, 28, 1)
x_train = np.expand_dims(x_train, -1)
x_test = np.expand_dims(x_test, -1)
print("x_train shape:", x_train.shape)
print(x_train.shape[0], "train samples")
print(x_test.shape[0], "test samples")


# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples


In [39]:
model = keras.Sequential(
    [
        keras.Input(shape=input_shape),
        layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Flatten(),
        layers.Dropout(0.5),
        layers.Dense(num_classes, activation="softmax"),
    ]
)

model.summary()

Metal device set to: Apple M1
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 11, 11, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
flatten (Flatten)            (None, 1600)              0         
_________________________________________________________________
dropout (Dropout)            (None, 1600)              0         
_________________________________________________________________
dense (Dense)             

2021-12-13 07:15:37.654692: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2021-12-13 07:15:37.654845: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)


In [40]:
batch_size = 128
epochs = 15

model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=0.1)

2021-12-13 07:16:36.119634: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
2021-12-13 07:16:36.119809: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
2021-12-13 07:16:36.235074: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.


Epoch 1/15

2021-12-13 07:16:43.750152: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.


Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


<tensorflow.python.keras.callbacks.History at 0x2a3207e20>

In [41]:
score = model.evaluate(x_test, y_test, verbose=0)
print("Test loss:", score[0])
print("Test accuracy:", score[1])

Test loss: 0.025478513911366463
Test accuracy: 0.9922000765800476
