# 1. Introduction

[index](../Index.ipynb) | [next](./02.LiteratureReview.ipynb)

## 1.1. Background

Computers and Vision have been already linked together since the sixties.

In 1963, Larry Roberts in his Ph.D. (Roberts 1963) mentions that the *pictorial data* understanding by the machines has been a challenge for quite a while.

Since then, research in the area had its ups and downs, but the most recent major break-through, and what is currently seen as the beginning of the modern era, can be credited to the paper by Alex Krizhevsky: *ImageNet Classification with Deep Convolutional Neural Networks* (Krizhevsky 2012).

From year 2012 onwards there has been an exponential progress in the area of Object Recognition and Detection. Hand crafted feature detectors, like SIFT (Lowe, 2004), and prediction algorithms, like SVM (Vapnik et al., 1995), have been challenged by automated feature detection and prediction capabilities offered by Convolutional Neural Networks.

Modern Computer Vision and prediction algorithms can be leveraged by using flexible and abstracted libraries, online articles, books and video tutorials. Seeing this progress would be like something from an alien civilization for Larry Roberts.

These "intelligent software" ideas have been accompanied by the substantial innovation in the hardware space and data availability. Starting from usage of GPU ([Graphical Processing Unit](https://en.wikipedia.org/wiki/Graphics_processing_unit)) to significantly boost matrix computations, doubling CPU clocks every year, and a spike in highly affordable small form factor IOT devices (like Raspberry Pi), it is now possible to efficiently operate on large volumes of image data to solve interesting problems.

## 1.2. This Research

My research uses modern Computer Vision, Machine Learning and hardware to create a Camera Monitoring System, capable of showing a live video stream with real time object detection and recognition.

The questions I will try to answer are:
- How complex is it to build a fast and reliable object detection pipeline using *Computer Vision*?
- Given collected data with object detections, can future object counts be predicted using *Machine Learning*?  
- Does object detections data contain anomalous signals, which can be recognized with *Anomaly Detection* algorithms and used for alerts to the users?

If the research goals are achieved, then the final product should be generic enough to apply it in other households and to other use cases, like predicting traffic, tourist congestion or animal behaviour, and finding unusual events or even security threats from the video stream.

From a data protection and security perspective, the aim is to keep the data local, which means that the internet connection should not be required and data breach is much less probable.

And lastly, I would like this thesis to be distributed as an open source project, so anyone curious can see how all these pieces are glued together, make their own improvements and build their own datasets and algorithms.

## 1.3. Limitations

Like any software in the real world, the system I am proposing here has its limitations.

The type of the camera used in the process of video capture is very basic. The default Raspberry Pi camera ([PiCam](https://picamera.readthedocs.io/en/release-1.13/)), does not have the Night Vision capability, which somewhat limits its usage as a security device. However, according to FBI, and as reported by many home alarm companies in the online sources (alarmnewengland 2020), most of burglaries occur between 10AM and 3PM, when most of adults are at work or school.

The next limitation is the prediction accuracy. Due to heavily stochastic nature of the world around us and potential gaps in data collection process, it is not flawless. But the <span style="background-color: yellow;">main objective of this research is less about accuracy and metrics, but more about usefulness.</span>

## 1.4. Guidelines for reader

The whole dissertation has been written in Jupyter Notebooks. It means that code samples can be easily copied, and project can be cloned ([GitHub link](https://github.com/Alchemication/cvdl-for-home)) and code executed on another machine.

Code samples are kept away from the main chapters, as it makes them difficult to follow.

**Guidelines to be aware of:**
- Chapters contain links to the previous and next chapters on top and in the bottom of each
- <span style="background-color: yellow;">Key highlights</span> will be highlighted with a yellow background
- Important concepts, areas or terminology will be written in *italic*
- Some chapters provide a reference to an in-depth study (called *Extras*) with well documented code samples and additional commentary and plots
- Chapters are structured as a hierarchy with maximum two levels of depth (for example $6.$ -> $6.1.$ -> $6.1.1.$)
- There are often clickable [links](https://en.wikipedia.org/wiki/Artificial_intelligence) to create a better flow
- All mathematical notations are written in [LaTex](https://en.wikibooks.org/wiki/LaTeX/Mathematics)
- Each reference to a code or function will be formated like `this_function`
- Some paragraphs will be divided by a title in **bold font** to improve text spacing

Next Chapter contains a Literature Review, which is a study of theoretical framework related to this research.

[index](../Index.ipynb) | [next](./02.LiteratureReview.ipynb)