# Chapter 14: Deep Computer Vision Using Convolutional Neural Networks

## 14.1 The Architecture of the Visual Cortex

**Convolutional neural networks** - The evolution of the neocognitron inspired by the studies of the visual cortex:

- Many neurons in the visual cortex have a small **local receptive field** meaning they react only to visual stimuli located in a limited region of the visual field.

- Some neurons react only to images of horizontal lines, while others react only to lines with different orientations.

- Some neurons have larger receptive fields, and they react to more complex patterns that are combinations of the lower-level patterns.

- Higher-level neurons are based on the outputs of neighboring lower-level neurons.

## 14.2 Convolutional Layers

**Convolution** - A mathematical operation that slides one function over another and measures the integral of their pointwise multiplication.

**Convolutional layer** - Neurons in the first convolutional layer are not connected to every single pixel in the input image, but only to pixels in their receptive fields.

This architecture allows the network to concentrate on small low-level features in the first hidden layer, then assemble them into larger higher-level features in the next hidden layer, and so on.

**Zero padding** - In order for a layer to have the same height and width as the previous layer, it is common to add zeros around the inputs.

**Stride** - The shift from one receptive field to the next.

### 14.2.1 Filters

A neuron's weights can be represented as a small image the size of the receptive field.

**Filters** (also called convolutional kernels) - The set of weights.

**Figure 14-5** - Applying two different filters to get two feature maps

- A vertical filter is a black square with a vertical white line in the middle (7x7 matrix with central column of 1s, rest 0s).

    - When applied to input image, the vertical white lines get enhanced while the rest gets blurred.

- A horizontal filter is similar to vertical filter except it has a white horizontal line in the middle.

    - When applied, the horizontal white lines get enhanced while rest gets blurred.

**Feature map** - A layer full of neurons using the same filter highlights the areas in an image that activate the filter the most.

### 14.2.2 Stacking Multiple Feature Maps

A convolutional layer has multiple filters and outputs one feature map per filter, and is more accurately represented in 3D (as opposed to 2D). It has one neuron per pixel in each feature map, and all neurons within a given feature map share the same parameters.

Sharing the same parameters dramatically reduces the number of parameters in the model. And once a CNN has learned to recognize a pattern in one location, it can recognize it in any location (regular DNN can only recognize in a particular location).

> Note: All neurons located in the same row $i$ and column $j$ but in different feature maps are connected to the outputs of the exact same neurons in the previous layer.

### 14.2.3 TensorFlow Implementation

### 14.2.4 Memory Requirements

## 14.3 Pooling Layers

### 14.3.1 TensorFlow Implementation

## 14.4 CNN Architectures

### 14.4.1 LeNet-5

### 14.4.2 AlexNet

### 14.4.3 GoogLeNet

### 14.4.4 VGGNet

### 14.4.5 ResNet

### 14.4.6 Xception

### 14.4.7 SENet

## 14.5 Implementing a ResNet-34 CNN Using Keras

## 14.6 Using Pretrained Models from Keras

## 14.7 Pretrained Models for Transfer Learning

## 14.8 Classification and Localization

## 14.9 Object Detection

### 14.9.1 Fully Convolutional Networks

### 14.9.2 You Only Look Once (YOLO)

## 14.10 Semantic Segmentation