# 1. Introduction

[index](../Index.ipynb) | [next](./02.LiteratureReview.ipynb)

## 1.1. Background

Computers and Vision have been already linked together in the sixties.

In 1963, Larry Roberts in his Ph.D. <cite data-cite="roberts:1963:perception">(Roberts, 1963)</cite> mentions that the *pictorial data* understanding by the machines has been a challenge for quite a while.

Since then, the research in the area had its ups and downs. However the most recent major break-through, and what is currently seen as the beginning of the modern era, can be credited to the paper by Alex Krizhevsky: *ImageNet Classification with Deep Convolutional Neural Networks* <cite data-cite="krizhevsky:2012:imagenet">(Krizhevsky et al., 2012)</cite>.

From year 2012 onwards there has been an exponential progress in the area of Object Recognition and Detection. Hand crafted feature detectors, like SIFT <cite data-cite="lowe:2004:keypoints">(Lowe, 2004)</cite>, and prediction algorithms, like SVM <cite data-cite="vapnik:1995:svm">(Vapnik et al., 1995)</cite>, have been challenged by more automated feature detection and prediction capabilities, offered by Convolutional Neural Networks.

Modern Computer Vision and Machine Learning algorithms, can be leveraged by using flexible and abstracted libraries, online articles, books and video tutorials. Seeing this progress would be like something from an alien civilization for Larry Roberts.

These "intelligent software" ideas have been accompanied by the substantial innovation in the hardware space and data availability. Starting from usage of GPU ([Graphical Processing Unit](https://en.wikipedia.org/wiki/Graphics_processing_unit)) to significantly boost matrix computations; doubling CPU clock speeds every year; and a spike in highly affordable small form factor IOT devices (like Raspberry Pi), it is now possible to efficiently operate on large volumes of image data to solve interesting problems.

## 1.2. This Research

This research uses modern Computer Vision, Machine Learning and hardware components, to create a Camera Monitoring System, capable of showing a live video stream with real time object detection and alerts.

The questions I will try to answer are:

- How complex is it, to build a fast and reliable object detection pipeline, using *IOT* devices and *Computer Vision*?
- Given the collected image data with object detections, can future object counts be accurately predicted using *Machine Learning*?
- Does object detection data contain anomalous patterns, which can be recognized with *Anomaly Detection* algorithms?

If the research goals are achieved, then the final product should be generic enough to apply it in other households and to other use cases. For instance: predicting traffic or tourist congestion; animal behaviour; or security threats in the video streams.

From data protection and security perspective, the aim is to keep the data local, which means that the internet connection should not be required and data breach is much less probable.

Lastly, this thesis will be distributed as an open source project on GitHub, being a starting point for those, who want to build their own datasets and algorithms, or contribute and make improvements to this project.

## 1.3. Limitations

Like any software in the real world, the proposed system has its limitations.

The type of the camera used in the process of video capture is very basic. The default Raspberry Pi camera ([PiCam](https://picamera.readthedocs.io/en/release-1.13/)), does not have the Night Vision capability, which somewhat limits its usage as a security device. However, according to FBI, and as reported by many home alarm companies in the online sources <cite data-cite="crippin:2016:burglars">(Crippin, 2020)</cite>, most of burglaries occur between 10AM and 3PM, when most of adults are at work or school.

The next limitation is the limited prediction accuracy, which should be expected, given the heavily stochastic nature of the *Person* and *Vehicle* counts datasets.

## 1.4. Guidelines for reader

Even though this dissertation has been exported to a pdf file, it has been written in Jupyter Notebooks.

This means that the project can be cloned from [GitHub](https://github.com/Alchemication/cvdl-for-home), code samples can be executed and plots reproduced.

Below are the guidelines to make reading a more enjoyable experience:

- Chapters contain links to the previous and next chapters on top and in the bottom of each Notebook for easy navigation
- Citations, important concepts, areas or terminology will be written as *italic*
- Some chapters provide a reference to an in-depth study Notebooks (called the *Extras*), which contain well documented code samples, additional commentary and plots
- Chapters are structured as a hierarchy with maximum two levels of depth (example: $6.$ -> $6.1.$ -> $6.1.1.$)
- There are often clickable [links](https://en.wikipedia.org/wiki/Artificial_intelligence) to create a better flow
- All mathematical notations are written in [LaTex](https://en.wikibooks.org/wiki/LaTeX/Mathematics)
- Each reference to a code or function will be formated like `this_function`
- Some paragraphs will be divided by a title in **bold font** to improve readability

Next Chapter contains a Literature Review, which is a study of theoretical framework related to this research.

[index](../Index.ipynb) | [next](./02.LiteratureReview.ipynb)