# 2. Literature Review

[index](../Index.ipynb) | [prev](./01.Introduction.ipynb) | [next](./03.SystemDesign.ipynb)

It is important to appreciate and highlight the work of other researchers, without which my concept would never be possible and a project like this could not be delivered by a single person.

This Literature Review will follow the data flow in the process. There are a three main building blocks in this research:
- Data Collection
- Forecasting
- Anomaly Detection

Each of them is very complex and has been studied by scientists for decades.

After two years of my own research and testing, I have narrowed down the list to a handful of useful tools, for which I will provide the theoretical background below.

## 2.1. Data Collection

The first aspect of my project is the Data Collection phase.

It involves a mini-computer (Raspberry Pi) which streams the data to the central unit (a Ubuntu-based Desktop PC with a GPU), which runs an infinite loop with the two key algorithms:
- Backgroud subtraction
- Yolo Object Recognition

Both of these algorithms are extremelly useful in the image processing applications, and the are the foundation of how data is collected in the system.

They have both significantly reduced the data size, which otherwise would be required to store a six months of video footage.

### 2.1.1. Background subtraction

Let’s consider how can we detect objects in the video streams in an efficient way.

For example we could supply a stream of 30 frames per second to a Python script, which could run an object detection algorithm (like Yolo) using GPU on every frame. This could work, but it would be extremely inefficient from the GPU utilisation and speed perspective. Why would we waste the GPU to detect objects non stop if there is actually nothing changing in the scene for 90% of the time? Can we somehow detect a change in the images (which I will refer to as motion detection) and only then use the GPU to run the object detection step?

It turns out that there is already a set well established algorithms in the Computer Vision domain, which can quite easily do that for us. They are not 100% bulletproof, but they do not need to be. If they can help to reject the 90% of non-interesting frames, we can dedicate the expensive GPU resource to other tasks or simply extend it’s lifespan. The bonus of not using GPU all the time is that the machine will be much quieter and won’t generate so much heat (which is always bad for the components).

One of the most popular and successful methods for motion detection based on images is called **Background Subtraction**. At a very high level the concept is very simple: we would like to start with a static image without any moving objects. We will call it our **background**. Then, every consecutive frame will be compared against the background to detect any changes. Those changes should be the new objects, which have appeared in the image and they will be called the foreground.

Unfortunately, there are many challenges in this optimistic approach. What if:
- the initial background already contains moving objects?
- the next frames actually don’t contain any moving objects, but only light illumination has changed over time? What this in turn caused shadows to appear?
- the camera is in-door and we turn the light on and off?
- there are moving tree branches in the background?
- the weather has changed and we deal with rain or snow?
- we are not interested in small objects, but only objects of a certain size?

I hope that at this stage I was able to prove that we need a more sophisticated approach than just a simple subtraction of foreground from the background.

Luckily there are such solutions already invented for us by others. One such method has been proposed by Zoran Zivkivoc in 2004 in his paper: Improved Adaptive Gaussian Mixture Model for Background Subtraction (ref. 1 below). The aim of Zivkivoc’s work was to overcome all the challenges above and improve processing time versus the previous models from other researchers.

In the MOG2 model, the background is constantly updated and not static. As author describes it, it uses recursive equations to constantly update parameters and also select appropriate number of components per each pixel. At a high level author describes a metric R (using a Bayesian decision), which follows the formula:

$$
R=\frac{p(BG|\overrightarrow{x}^{(t)})}{p(FG|\overrightarrow{x}^{(t)})}=\frac{p(\overrightarrow{x}^{(t)}|BG)p(BG)}{p(\overrightarrow{x}^{(t)}|FG)p(FG)}
$$

Where the aim is to determine the ratio between the probability of new pixel at time $t$ being a foreground ($FG$) or a background ($BG$).

Since, in general, we don't have any prior information about $FG$, so we set it as a uniform distribution. Then, we decide that object is a $BG$ if the probability of $x$ at time $t$, given $BG$ is greater than some threshold value ($c_{thr}$):

$$
p(\overrightarrow{x}^{(t)}|BG) > (=Rc_{FG})
$$

The left side of the equation is referred t oas a background model. It depends on the training set denoted as $X$.

### 2.1.2. Yolo - Object Recognition

TODO: provide theoretical background here

## 2.2. Forecasting

The next theme in this project is Forecasting. One of the research questions is if we can predict an accurate expected number of objects in the hourly intervals with the use of Machine Learning.

In my work I have utilised only two Machine Learning algorithms, which will be described in detail below:
- Linear Regression
- Random Forest

I have also tried many other algorithms, like:
- Support Vector Machines (TODO: provide a very brief theory and reference here)
- Poission Gaussian Process (TODO: provide a very brief theory and reference here)
- Feed Forward Neural Network (TODO: provide a very brief theory and reference here)
- Long Short Term Memory Recurrent Neural Network (TODO: provide a very brief theory and reference here)

None of the additional algorithms was beneficial to my results and they only introduced computational overhead and complexity, therefore I have decided to remove them from the theoretical analysis and the code samples from methodology section.

### 2.2.1. Linear Regression
TODO: provide theoretical background here

### 2.2.2. Random Forest
TODO: provide theoretical background here

## 2.3. Anomaly Detection

The core idea of this system is anomaly detection. It makes it very useful as owners are often un-aware of the events in the observed area, and it is impractical to continuously monitor the enviornment.

An anomaly detection solution described below, which uses **Probabilistic Programming** helps to identify a threshold of observations in a given hour, above which system can send an alert to the owners.

But what if there is only a single unusual event in an hour? It turns out that we do not need to limit the system to a single anomaly detection algorithm. Below I will describe an **Auto-Encoder** - a Neural Network based method, which can detect an anomaly only from a single raw image frame.

### 2.3.1. Probabilistic Programming
TODO: provide theoretical background here

### 2.3.2. Auto Encoders
TODO: provide theoretical background here

## 2.4. Conclusion

TODO: Here provide a high level conclusion and describe the next Chapter.

[index](../Index.ipynb) | [prev](./01.Introduction.ipynb) | [next](./03.SystemDesign.ipynb)