# Intro to Deep Learning with OpenCV
**Instructor**: Jonathan Fernandes

* OpenCV $\Rightarrow$ **BGR** format
    * Why?
    * Years ago, when OpenCV was created, many cameras were BGR instead of RGB
    
* OpenCV's deep learning module is known as **dnn**s, or deep neural networks
* The dnn model is **not** a full-fledged deep learning framework
    * We cannot **train** any deep learning neural network
    * **No** backpropagation (and so, no "learning")
    
<img src='data/opencv1.png' width="500" height="250" align="center"/>

#### Popular uses of OpenCV:

<img src='data/opencv2.png' width="200" height="100" align="center"/>

#### Expert Systems
    * Rules determine actions
    
#### Machine learning
    * Data determines the rules
    
<img src='data/opencv3.png' width="600" height="300" align="center"/>

<img src='data/opencv4.png' width="600" height="300" align="center"/>


### OpenCV
* Makes code simpler
* Standard CPUs can do this (no GPUs necessary)
* As of OpenCV 4, the dnn module supports:
    * Caffe 
    * Tensorflow
    * Torch 
    * Darknet
    * ONNX (format)

#### Advantages
* OpenCV is framework- independent
    * It has no framework-specific limitations
* Models are represented internally in OpenCV
    * code can be optimized
* OpenCV has its own deep learning implementation
    * external dependencies are kept to a minimum
    
<img src='data/opencv5.png' width="600" height="300" align="center"/>

* A simple inference engine will simply pass the input data through the network and output the result
    * However, there are a lot of optimizations that can be performed that make the inference speed fast(er)
    * For instance:
        * prune part of the NN that isn't activated
        * combine multiple layers into a single computational step
* Inference can be done on standard CPUs

#### OpenVINO
* Open Visual Inferencing and Neural Network Optimization
    * designed to speed up neural networks for tasks like image classification and object detection
    
#### Support for:
   * AlexNet
   * GoogleNet
   * VGG
   * ResNet
   * SqueezeNet
   * DenseNet
   * ShuffleNet
   
### Person identification: OpenFace (Torch)

#### Overview of the dnn process

* To use webcam instead of saved jpg/mov, in place of file path provide `0`

<img src='data/opencv6.png' width="600" height="300" align="center"/>
    
    
<img src='data/opencv7.png' width="600" height="300" align="center"/>

#### Create a Four-Dimensional Blob

```
cv2.dnn.readNetFromCaffe(prototxt, caffeModel)

blob = cv2.dnn.blobFromImage(image, [scalefactor], [size], [mean], [swapRB], [crop], [ddepth])

net.setInput(blob)

outp = net.forward()
```

### Working with Blobs
* **Blob**: One or more images with the same width, height, and number of channels that have all been preprocessed in the same way.

<img src='data/opencv8.png' width="600" height="300" align="center"/>

* **Note** if you have more than one blob, use **`blobFromImages()`**
    * The output of `blobFromImage()` is a 4D Tensor (NCHW)
        * **NCHW**: **N**umber of images, (number of) **C**hannels, **H**eight of the Tensor, **W**eights 
        * This is stored in a `blob object` which is then passed to a trained model, which allows us to get the image or video inference
        * For images and videos, this (inference) could be:
            * **Object detection**
            * **Semantic segmentation**
            * Much, much more
            
#### blobFromImage parameters
* `blobFromImage` creates a 4D blob

<img src='data/opencv9.png' width="600" height="300" align="center"/>

### Image Classification
* The ImageNet image database is organized according to the WordNet hierarchy
* Each meaningful concept in wordnet, which could be multiple words, is called a **synonym set** or a **synset**
* These 1000 classes are stored in this synset file; if we open the synset file, we can see the 1000 different categories.
    * Here, each row corresponds to a category and then one or more words describing the category
    * 1000 rows corresponding to 1000 classes
    
### Classification for an image: inference

#### Output of Image Classification
* Now we'll use the OpenCV dnn module as an inference engine
* We'll pass an image through a pre-trained model that has been trained on the 1,000 clases of ImageNet
* The model will then output the probability that the image contains each of the 1,000 classes 

## Classification for a video
* Here we'll get started with using OpenCV's `dnn` module as an inference engine for a video file
* We can reuse many of the concepts used in image classification when working with video files

### YOLOv3
* "You Only Look Once"
* In the past, detection algorithms apply a model to an image multiple times at different locations and scales
* **YOLO v3 applies model only once to multiple regions of the image.**
    * The NN divides the image into regions nad predicts bounding boxes and probabilities for each region
    * The network only needs to view the image one time and then the bounding boxes are weighted by the predictive probabilities
    * We can also set thresholds (say, 80%) and only if the YOLO algorithm is more than 80& sure that it has detected a particular class will it draw a bounding box around it. 
    * YOLO v3 has been trained on the **COCO datset which has 80 different classes of objects**
* If objects haven't been detected, we can lower the confidence threshold and that might allow them to be picked up by the algorithm (but that also may introduce incorrect classifications).
* The YOLO v3 algorithm is a very powerful object-detection algorithm