<a href="https://colab.research.google.com/github/huanfachen/DSSS/blob/main/Week_6/Practical_06.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<div style="float:left">
    <h1 style="width:600px">CASA0006 Practical 6: Deep learning applications</h1>
</div>
<div style="float:right"><img width="100" src="https://github.com/jreades/i2p/raw/master/img/casa_logo.jpg" /></div>

## Introduction
In this practical, we will use TensorFlow and [YOLOv5] to:

1. Build a convolutional neural network (CNN) for Fashion-MNIST object identification;
2. Conduct object detection in satellite imagery, using pre-trained deep learning models YOLOv5.

## Setting up Google Colab

As installing and configuring tensorflow and YOLO on local machines can be a pain, we recommend using Google Colab for this practical. Click [here](https://colab.research.google.com/github/huanfachen/DSSS/blob/main/Week_6/Practical_06.ipynb) to run this practical on Google Colab, which requires a Google account.

Resource limit of Google Colab under free plan:

- Memory: up to 12 GB.
- Maximum duration of running a notebook: notebooks can run for at most **12 hours**, depending on availability and your usage patterns. (The notebook will die after at most 12 hours)
- GPU duration: dynamic, up to a few hours. If you use GPU regularly, runtime durations will become shorter and shorter and disconnections more frequent.

*Very Important* - we will use the GPU on Google Colab to accelerate the model training. To do this, go to 'Runtime' -> 'Change runtime type' -> Select 'T4 GPU' -> Save. See below.

![](https://github.com/huanfachen/DSSS/blob/main/Figures/Colab_GPU_setting.jpg?raw=true)

If you are following along in your own development environment, rather than Colab, see the [install guide](https://www.tensorflow.org/install) for setting up TensorFlow for development.

Note: Make sure you have upgraded to the latest `pip` to install the TensorFlow 2 package if you are using your own development environment.

## CNN Overview

![CNN](https://raw.githubusercontent.com/huanfachen/DSSS/main/Figures/cnn.png)

Convolutional neural networks (CNN) is a special type of neural network for image or image-like features. 

CNN is a complex subject and we could do a 10-week module on it. Here, we aim to cover the basics of CNN with loads of illustrations and examples.

In [None]:
from __future__ import division, print_function, unicode_literals

# Common imports
import numpy as np
import os
import tensorflow as tf
from tensorflow import keras

# To plot pretty figures
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
mpl.rc('axes', labelsize=14)
mpl.rc('xtick', labelsize=12)
mpl.rc('ytick', labelsize=12)

# to make this notebook's output reproducible across runs
def reset_state(seed=42):
    tf.keras.backend.clear_session()
    tf.random.set_seed(seed)
    np.random.seed(seed)

# To plot pretty figures
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
plt.rcParams['axes.labelsize'] = 14
plt.rcParams['xtick.labelsize'] = 12
plt.rcParams['ytick.labelsize'] = 12

def plot_image(image):
    plt.imshow(image, cmap="gray", interpolation="nearest")
    plt.axis("off")

def plot_color_image(image):
    plt.imshow(image.astype(np.uint8),interpolation="nearest")
    plt.axis("off")
    
def crop(images):
    return images[150:220, 130:250]
    
import warnings
warnings.filterwarnings('ignore')

## Convolutional Layers in NN 

* Convolutional neural networks (CNNs) emerged from the study of the brain’s visual cortex, and they have been used in image recognition since the 1980s.
* Used in image recognition also in voice recognition and natural language processing 
* Local Field of View
* Used to define local features in an image - data compression

<img src="https://raw.githubusercontent.com/huanfachen/DSSS/main/Figures/localFov.jpeg" alt="Drawing" style="width: 500px;"/>

### Why CNN?

* A way of encoding images using a small(er) number of features upon which training (e.g. for classifying images) can take place. 
* Differences in images that enable humans to classify are hypothesise to be extracted in similar ways.

## Convolutional Layers 

* Encode convolutions as a NN 

* Neurons connected to _receptor_ field in next layer that is _smaller_. Uses zero padding to force layers to have same height & width.

* Also can connect large input layer to much smaller layer by spacing out receptor fields (distance between receptor fields is called the _stride_)

<img src="https://raw.githubusercontent.com/astro-informatics/course_mlbd_images/master/Lecture19_Images/CC.jpeg" alt="Drawing" style="width: 700px;"/>

### Feature Maps 

* Neuron weights can look like small images (w/ size = receptor field)
* Examples below: 1) vertical filter (single vertical bar, mid-image, all other cells zero) 2) horizontal filter (single horizontal bar, mid-image, all other cells zero)
* Both return _feature maps_ (highlights areas of image most similar to filter)


<img src="https://raw.githubusercontent.com/astro-informatics/course_mlbd_images/master/Lecture19_Images/featuremap.jpeg" alt="Drawing" style="width: 800px;"/>

### Stacking Feature Maps 

* More realistic to have several features of similar size i.e. 3D layers 
* A convolutional layer can thereby apply multiple filters to its input and be capable of detecting multiple features 

<img src="https://raw.githubusercontent.com/astro-informatics/course_mlbd_images/master/Lecture19_Images/stacked.jpeg" alt="Drawing" style="width: 400px;"/>


### Example 

* The following code loads two sample images, using Scikit-Learn’s load_sample_images() (which loads two color images, one of a Chinese temple, and the other of a flower)
* Then it creates two 7 × 7 filters (one with a vertical white line in the middle, and the other with a horizontal white line in the middle), and applies them to both images using a convolutional layer built using TensorFlow’s ${\tt tf.nn.conv2d()}$ function (with zero padding and a stride of 2). 
* Finally, it plots one of the resulting feature maps



In [None]:
#simple example 
from sklearn.datasets import load_sample_image #load images

# Load sample images
china = load_sample_image("china.jpg") / 255
flower = load_sample_image("flower.jpg") / 255
images = np.array([china, flower])
batch_size, height, width, channels = images.shape

# Create 2 filters that are 7x7xchannelsx2 arrays
filters = np.zeros(shape=(7, 7, channels, 2),  dtype=np.float32)
filters[:, 3, :, 0] = 1  # vertical line
filters[3, :, :, 1] = 1  # horizontal line

In [None]:
# Conv2D arguments:
# filters = 4D tensor
# strides = 1D array (1, vstride, hstride, 1)
# padding = VALID = no zero padding, may ignore edge rows/cols
# padding = SAME  = zero padding used if needed
conv = keras.layers.Conv2D(filters=32, kernel_size=3, strides=1,
                           padding="SAME", activation="relu")

In [None]:
outputs = tf.nn.conv2d(images, filters, strides=8, padding="SAME")

plt.imshow(outputs[0, :, :, 0], cmap="gray") # plot 1st image's 2nd feature map

plt.show()

In [None]:
outputs = tf.nn.conv2d(images, filters, strides=1, padding="SAME")

plt.imshow(outputs[0, :, :, 1], cmap="gray") # plot 1st image's 2nd feature map

plt.show()

In [None]:
outputs = tf.nn.conv2d(images, filters, strides=1, padding="SAME")

plt.imshow(outputs[1, :, :, 0], cmap="gray") # plot 1st image's 2nd feature map

plt.show()

In [None]:
outputs = tf.nn.conv2d(images, filters, strides=1, padding="SAME")

plt.imshow(outputs[1, :, :, 1], cmap="gray") # plot 1st image's 2nd feature map

plt.show()

<img src="https://raw.githubusercontent.com/astro-informatics/course_mlbd_images/master/Lecture19_Images/dotproduct-slides.png" alt="Drawing" style="width: 900px;"/>

Note the relation to the dot product 

<img src="https://raw.githubusercontent.com/astro-informatics/course_mlbd_images/master/Lecture19_Images/example-slides.png" alt="Drawing" style="width: 900px;"/>



<img src="https://raw.githubusercontent.com/astro-informatics/course_mlbd_images/master/Lecture19_Images/6filter-slides.png" alt="Drawing" style="width: 900px;"/>

The convolution layer comprises of a set of independent filters (6 in the example shown). Each filter is independently convolved with the image and we end up with 6 feature maps of shape 28x28x1.

<img src="https://raw.githubusercontent.com/astro-informatics/course_mlbd_images/master/Lecture19_Images/simpleseries-slides.png" alt="Drawing" style="width: 900px;"/>

All these filters are initialized randomly and become our parameters which will be learned by the network subsequently.

<img src="https://raw.githubusercontent.com/astro-informatics/course_mlbd_images/master/Lecture19_Images/padding.jpeg" alt="Drawing" style="width: 900px;"/>



## Pooling Layers 

* Pooling layers aim to subsample (i.e., shrink) the input image in order to reduce the computational load, the memory usage, and the number of parameters (thereby limiting the risk of overfitting). Pooling layer operates on each feature map independently.

* Just like in convolutional layers, each neuron in a pooling layer is connected to the outputs of a limited number of neurons in the previous layer, located within a small rectangular receptive field. You must define its size, the stride, and the padding type, just like before. 

* However, a pooling neuron has no weights; all it does is aggregate the inputs using an aggregation function such as the max or mean.

<img src="https://raw.githubusercontent.com/huanfachen/DSSS/main/Figures/pool.jpeg" alt="Drawing" style="width: 900px;"/>

<img src="https://raw.githubusercontent.com/huanfachen/DSSS/main/Figures/maxpool-slides.png" alt="Drawing" style="width: 900px;"/>



## CNN Architectures 

* Typical CNN architectures stack a few convolutional layers (each one generally followed by a ReLU layer; max(0,x), then a pooling layer, then another few convolutional layers (+ ReLU), then another pooling layer, and so on.

* The image gets smaller and smaller as it progresses through the network, but it also typically gets deeper and deeper (i.e., with more feature maps) thanks to the convolutional layers

* At the top of the stack, a regular feedforward neural network is added, composed of a few fully connected layers, and the final layer outputs the prediction 

* There are many successful and popular CNN structures, including LeNet, AlexNet, ResNet.

<img src="https://raw.githubusercontent.com/astro-informatics/course_mlbd_images/master/Lecture19_Images/aCNN.jpeg" alt="Drawing" style="width: 1000px;"/>

<img src="https://raw.githubusercontent.com/astro-informatics/course_mlbd_images/master/Lecture19_Images/excnn.jpeg" alt="Drawing" style="width: 1500px;"/>



### LeNet

LeNet is a series of CNN structure proposed by LeCun et al., and have been widely used for mnist. The first LeNet, LeNet-1, was trained in 1989.  

<img src="https://raw.githubusercontent.com/huanfachen/DSSS/main/Figures/lenet.jpeg" alt="Drawing" style="width: 500px;"/>



### AlexNet

AlexNet is another CNN structure, designed by *Alex Krizhevsky* in collaboration with *Ilya Sutskever* and *Geoffrey Hinton*. 

AlexNet was submitted in ImageNet Challenge [http://www.image-net.org/challenges/LSVRC/](http://www.image-net.org/challenges/LSVRC/). It won the champion and achived a top-5 error of 15.3%, much better than the runner-up.

<img src="https://raw.githubusercontent.com/huanfachen/DSSS/main/Figures/alexnet.jpeg" alt="Drawing" style="width: 500px;"/>

### ResNet-34

Residual neural network (also referred to as a residual network or ResNet) is a CNN architecture in which the layers learn residual functions with reference to the layer inputs and some connections skip two or more layers.

![ResBlock](https://raw.githubusercontent.com/huanfachen/DSSS/main/Figures/ResBlock.jpg)

It was first proposed by [Kaiming He](https://people.csail.mit.edu/kaiming/), see [this ground-breaking paper](https://openaccess.thecvf.com/content_cvpr_2016/html/He_Deep_Residual_Learning_CVPR_2016_paper.html), with 260766 citations on Google Scholar.

It was developed in 2015 for image recognition, and won the ImageNet Large Scale Visual Recognition Challenge of that year.

The residual connection stablises teh training and convergence of CNN with hundreds of layers, and has become a common motif in later CNNs, such as BERT and GPT models.

ResNet-34 is one type of ResNet and has 34 convolutional layers. It is pretrained on the ImageNet dataset.

In [None]:
from functools import partial  
# What is partial? See https://chriskiehl.com/article/Cleaner-coding-through-partially-applied-functions

DefaultConv2D = partial(keras.layers.Conv2D, kernel_size=3, strides=1,
                        padding="SAME", use_bias=False)

class ResidualUnit(keras.layers.Layer):
    def __init__(self, filters, strides=1, activation="relu", **kwargs):
        super().__init__(**kwargs)
        self.activation = keras.activations.get(activation)
        self.main_layers = [
            DefaultConv2D(filters, strides=strides),
            keras.layers.BatchNormalization(),
            self.activation,
            DefaultConv2D(filters),
            keras.layers.BatchNormalization()]
        self.skip_layers = []
        if strides > 1:
            self.skip_layers = [
                DefaultConv2D(filters, kernel_size=1, strides=strides),
                keras.layers.BatchNormalization()]

    def call(self, inputs):
        Z = inputs
        for layer in self.main_layers:
            Z = layer(Z)
        skip_Z = inputs
        for layer in self.skip_layers:
            skip_Z = layer(skip_Z)
        return self.activation(Z + skip_Z)

In [None]:
model = keras.models.Sequential()
model.add(DefaultConv2D(64, kernel_size=7, strides=2,
                        input_shape=[28, 28, 1]))
model.add(keras.layers.BatchNormalization())
model.add(keras.layers.Activation("relu"))
model.add(keras.layers.MaxPool2D(pool_size=3, strides=2, padding="same"))
prev_filters = 64
for filters in [64] * 3 + [128] * 4 + [256] * 6 + [512] * 3:
    strides = 1 if filters == prev_filters else 2
    model.add(ResidualUnit(filters, strides=strides))
    prev_filters = filters
model.add(keras.layers.GlobalAvgPool2D())
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(10, activation="softmax"))

In [None]:
model.summary()

### ResNet-34 for tackling Fashion-MNIST task

Here we will try ResNet-34 for [the Fashion-MNIST task](https://keras.io/api/datasets/fashion_mnist/). We don't want to use the classic MNIST, as MNIST is now considered too easy for modern deep learning models.

Fashion-MNIST is a dataset of 60,000 28x28 grayscale images of 10 fashion categories, along with a test set of 10,000 images. This dataset can be used as a drop-in replacement for MNIST. 

The 10 classies are:

| Label | Description   |
|-------|---------------|
| 0     | T-shirt/top   |
| 1     | Trouser       |
| 2     | Pullover      |
| 3     | Dress         |
| 4     | Coat          |
| 5     | Sandal        |
| 6     | Shirt         |
| 7     | Sneaker       |
| 8     | Bag           |
| 9     | Ankle boot    |

![Fashion-MNIST](https://raw.githubusercontent.com/huanfachen/DSSS/main/Figures/fashion-mnist-sprite.jpg)

More info: https://github.com/zalandoresearch/fashion-mnist/tree/master

In [None]:
# import Fashion-MNIST data
(X_train_full, y_train_full), (X_test, y_test) = keras.datasets.fashion_mnist.load_data()
X_train, X_valid = X_train_full[:-5000], X_train_full[-5000:]
y_train, y_valid = y_train_full[:-5000], y_train_full[-5000:]

X_mean = X_train.mean(axis=0, keepdims=True)
X_std = X_train.std(axis=0, keepdims=True) + 1e-7
X_train = (X_train - X_mean) / X_std
X_valid = (X_valid - X_mean) / X_std
X_test = (X_test - X_mean) / X_std

X_train = X_train[..., np.newaxis]
X_valid = X_valid[..., np.newaxis]
X_test = X_test[..., np.newaxis]

In [None]:
model.summary()

In [None]:
model.compile(loss="sparse_categorical_crossentropy", optimizer="nadam", metrics=["accuracy"])

Beware - the cell below could take a long time to run!

In [None]:
history = model.fit(X_train, y_train, epochs=5, validation_data=(X_valid, ??))

In [None]:
score = model.evaluate(??, y_test)

In [None]:
# Predict 5 images from test set
n_images = 5
test_images = X_test[:n_images]
predictions = model.predict(??)

# Display image and model prediction.
for i in range(n_images):
    plt.imshow(np.reshape(test_images[i], [28, 28]), cmap='gray')
    plt.show()
    print("Model prediction: %i" % y_test[i])
    print("Model prediction: %i" % np.argmax(predictions.numpy()[i]))

## YOLOv5 for Object Detection in Satellite Imagery

Having explored CNN and ResNet, we are now ready to apply this to a more explicitly spatial use case: object detection in satellite imagery.

Read the [following guide](https://bellingcat.github.io/RS4OSINT/C5_Object_Detection.html), and then complete the rest of this workbook.

### What is YOLOv5
Object detection is a fairly complicated task, and there are a number of different approaches to it. In this tutorial, we’ll be using a model called YOLOv5. YOLO stands for **You Only Look Once**, and it’s a model that was developed by [Joseph Redmon et al.](https://pjreddie.com/), and the full paper detailing the model can be found [here](https://arxiv.org/abs/1506.02640).

The YOLOv5 model is a convolutional neural network (CNN), which is a type of deep learning model. CNNs are very good at identifying patterns in images, particularly in small regions of images. This is important for object detection, because we want to be able to identify objects even if they’re partially obscured by other objects.

YOLO works by chopping an image up into a grid, and then predicting the location and size of objects in each grid cell:

![](https://bellingcat.github.io/RS4OSINT/images/yolo.jpg)

It learns the locations of these objects by training on a dataset of images in which each object is indicated by a bounding box. Then, when it’s shown a new image, it will attempt to predict bounding boxes around the objects in that image. The standard YOLO model is trained on the COCO dataset, which contains over 200,000 images of 80 different objects ranging from people to cars to dogs. YOLO models pre-trained on this dataset work great out of the box to detect objects in videos, photographs, and live streams. But the nature of the objects we’re interested in is a bit different.

Luckily, we can simply **re-train** the YOLOv5 model on datasets of labeled satellite imagery. We will walk through the process of re-training YOLOv5 on a custom dataset, and then using it to identify objects in satellite imagery pulled from Google Earth Engine or Google Map.

In [None]:
!git clone https://github.com/huanfachen/yolov5_RS  # clone repo
#%cd yolov5_RS
%pip install -qr yolov5_RS/requirements.txt # install dependencies
%pip install -q roboflow

import torch
import os
from IPython.display import Image, clear_output  # to display images

print(f"Setup complete. Using torch {torch.__version__} ({torch.cuda.get_device_properties(0).name if torch.cuda.is_available() else 'CPU'})")

We will use public satellite imagery dataset from Roboflow, specifically, [this link](https://universe.roboflow.com/gdit/aerial-airport/dataset/1).

What is Roboflow? It is a Computer Vision developer framework for better data collection to preprocessing and model training techniques. Roboflow contains public datasets readily available to users and also has access for users to upload their own custom data.

You can explore a gallery of public datasets on [Roboflow Universe](https://universe.roboflow.com/). These datasets are frequently updated.

Note that you need a Roboflow API KEY to access the datasets. To get your own API KEY, visit [this link](https://docs.roboflow.com/api-reference/authentication).

Below is my API KEY in the free tier, which is subject to certain use limit. You can use it for tests, but it might be slow or out of limit. It is always a good idea to get your own API KEY and get control of it.

In [None]:
from roboflow import Roboflow
API_KEY = 'aywzIBJkeuu2TcHztYSq'
rf = Roboflow(api_key=API_KEY)
project = rf.workspace("gdit").project("aerial-airport")
dataset = project.version(1).download("yolov5")

Check the datasets have been downloaded.

In [None]:
!ls
!ls yolov5_RS

The next step is to copy the data folder *Aerial-Airport-1* into the folder of yolov5_RS, as the YOLOv5 Python codes require the datasets are in the same folder.

In [None]:
!cp -r Aerial-Airport-1 yolov5_RS

In [None]:
!ls
!ls yolov5_RS

We will train the model via running the *train.py* file.

The settings, including batch size, is important. The general rule is: the larger batch size, the larger memory required. The batch size is set as 16 here after trial and error. If we set **--batch 32**, Colab might run out of memory and force the training to stop (*setting a '^C' in the proccess*). Note that the memory of Google Colab is default at 12 GB of RAM.

Other factors influencing the required memory of YOLO include image size, batch size, and model size.

See [Discussion on Github](https://github.com/ultralytics/yolov5/issues/3847) or [Stackoverflow](https://stackoverflow.com/a/63797661/4667568).

The following model training takes quite a long time, around 3 hours.

In [None]:
!less yolov5_RS/train.py

In [None]:
%%time
!python yolov5_RS/train.py --data {dataset.location}/data.yaml --img 320 --batch 16 --cache

Then, we will apply the YOLOv5 model to detect objects of a remote sensing image from Google Map.

In [None]:
img='gatwick.jpg'
!python yolov5_RS/detect.py --weights yolov5_RS/weights/general.pt --img 2000 --conf 0.4 --source {os.path.join('yolov5_RS',img)} --line-thickness 2 --exist-ok #--hide-labels --exist-ok

The results of detected objects are saved in the runs/detect/exp folder, as below.

In [None]:
!ls yolov5_RS/runs/detect/exp

In [None]:
img='gatwick.jpg'

In [None]:
out_dir='yolov5_RS/runs/detect/exp'
Image(filename=os.path.join(out_dir,img))

## References and recommendations:

1. Some materials are from Machine Learning with Big Data (SPCE0038) module at UCL.