# Lab Assignment Seven: Convolutional Neural Networks (CNN's)

__*Austin Chen, Luke Hansen, Oscar Vallner*__

## 1. Preparation and Overview


In this report, we will be returning to the CIFAR-10 dataset from our lab 3 report, a dataset consisting of 50,0000 32px x 32px images. Within these 50,000 images, there are 10 unique classes with 5,000 images each. From the <a href="https://www.cs.toronto.edu/~kriz/cifar.html">dataset's website</a>, the classification guide is as follows:

<br><br>

<img src="dataset_overview.png" width="475px" >

<br><br>


### 1.1 Business Understanding



Though there are 10 unique classifications within our image dataset, the CIFAR-10 dataset is merely a small labelled subset of the <a href="http://groups.csail.mit.edu/vision/TinyImages/">80 million tiny images</a> dataset, a far larger collection of images. The "80 million tiny images" dataset was formed as a part of a large public initiative to form an expansive and accurate visual dictionary to aid the field of computer image recognition. The dataset now contains a visualization of 53,464 English nouns arranged by meaning, and this data was collected by scraping millions of images from all across the web.


### _But why is all this data practical?_

In the ever-growing field of AI advancement, image recognition is paramount. While there already exists several examples of applied image recognition in industry (such as face detection and AI), a more comprehensive and sophisticated AI would have to leverage broader machine learning image recognition algorithms in order to expand beyond these narrow applications.

According to the dataset's website:

_"computers have difficulty recognizing objects in images. While practical solutions exist for a few simple classes such as human faces or cars, the more general problem of recognizing all different classes of objects in the world (e.g. guitars, bottles, telephones) remains unsolved."_


At Facebook's 2016 annual developer conference, Mark Zuckerberg outlined the social network's AI plans to "build systems that are better than people at perception: seeing, hearing, language, and so on.” Even these specific plans justify the importance of having so much image data in the first place. For example, image recognition technology catered to the blind, can "see" what is going on in a picture and explain it out loud (in fact, Facebook has been working on such accessibility features like this). Promoting technology usage for the visually impaired is a noble use case of image classification, expanding the tools of technology to all. 

No matter the application, maintaining an extensive catalogue of images has endless applications. Though it may not seem practical in the present, the there are effectively an infinite number of possible images a visual system can be confronted with, and an infinite set of applications it can be helpful for. 


### The CIFAR-10 Subset

Focusing back in on the CIFAR-10 dataset, we will be narrowing the focus of our classification problem. When looking at the image categories, the two most compelling categories that stood out to us were automobiles and deer. 

In 2017, one of the most relevant applications of image recognition falls under the transportation domain in the form self-driving/autonomous vehicles. One of the primary requirements of self-driving vehicles is to have accurate, live image recognition. In order for a self-driving car to be as safe as possible, it must be able to quickly classify objects and animals in the road just as well as humans can (if not better). In the United States, <a href="http://cultureofsafety.thesilverlining.com/driving/deer-vs-car-collisions">the National Highway Safety Administration (NHSA) conducted a study</a> concerning the increasing dangers from deer-related vehicle accidents. Their findings are as follows:

- There are approximately 1.5 million deer-related car accidents annually
- The cost of these accidents results in over 1 billion dollars in vehicle damage
- There are around 175-200 fatalities and 10,000 injuries every year

The fact that deer collisions cause over 1 billion dollars worth of vehicle damage a year provides us with ample business incentive to create an accurate classification model. Once the additional monetary costs of fatalities and injuries are accounted for on top of the 1 billion, we have an even more compelling reason to create a classification model to resolve this issue. 

Therefore, for our classification problem, we will be attempting to create an image recognition model that can classify and distinguish incoming automobiles on the road from deer. If a car can determine if an oncoming object is a deer or a vehicle, it could automatically brake, potentially saving drivers from a fatal collision with a deer. Within the CIFAR-10 dataset, we will be taking all 5,000 images from the "automobile" category, and combining them with all 5,000 images from the "trucks" category to create an overarching "vehicle" category. We will then build a convolutional neural network that will distinguish between these road vehicles and deer. With a reliable enough model, our neural network would ideally be deployed in autonomous driving systems in consumer vehicles across America. 

Although we have mostly discussed autonomous vehicle deployment thus far, autonomous vehicles are still in development in 2017, and have a long ways before being perfected. A far more feasible intermediate deployment could be used in roadside cameras on highways. If a roadside camera setup detects that a deer is in the road, it can flash several bright warning lights to alert the drivers on the road. If the roadside camera only detects a car, it would not flash its lights. A reliable model could benefit several stakeholders, ranging from car manufacturers, to drivers, and even taxpayers in America who pay for road repairs that could be caused by deer related automobile collisions.


## Measure of Success

Our classification task is binary, and even if we use accuracy as our primary scoring metric we should aim for far higher than random chance, 50% (however, we won't use accuracy). We must remind ourselves that with our business case, lives are at risk. A roadside camera & light system should be as accurate as possible when determining when deer are obstructing the road. As we will discuss later, it might even be advantageous to analyze the implications of false positives and false negatives; that is, when the alert light would fires when there isn't actually a deer there, or if it fails to trigger when a deer is there.

---

Link to dataset: https://www.cs.toronto.edu/~kriz/cifar.html

---

### 1.2 Class Variables

### 1.3 Data Preparation

### 1.4 Evaluation Metrics

Deer vs. automobiles (trucks and automobiles)

True Positive: Classifies: Deer. Reality: Deer
False Positive: Classifies: Deer. Reality: Car
False Negative: Classifies: Car. Reality: Deer
True Negative: Classifies: Automobile. Reality: Automobile.

For this dataset, we will be using f1-score.

### 1.5 Cross Validation Methods

# --INSERT CLASS DISTRIBUTION PIE CHART OR HISTOGRAM HERE--

For our dataset, we narrowed down our cross validation techniques down to two: 10-fold cross validation and **stratified** 10-fold cross validation. However, we ultimately decided against a stratified cross validation. Here's why.

As we can see in the chart above, 66% of the instances in our dataset are vehicles, while 33% are deer. While there is a definitive class imbalance, here, we do not believe it is an extreme enough class imbalance to absolutely mandate a stratified cross validation. Our vehicle instances outnumber deer 2:1, but we have numerous instances of both classes, with 5,000 instances of deer (our lower represented class between the two). Perhaps, however, a class imbalance is actually advantageous for our classification model. 

If we step back and think about a real life scenario, many vehicle collisions with deer happen past sundown and at night, on lower traffic single/double lane highways (according to the NHSA study referenced above). And more often than not, these collisions only involve one car and one deer, as opposed to one car colliding with an entire herd of deer. Therefore, when considering our methods for cross validation, we must remember what is realistic in a real world scenario. In standard scenarios, automobiles outnumber deer--perhaps even more than 2:1 (our class balance).

Therefore, while it might be nice to ensure equal class representation across all the folds, our goal is to use a cross-validation method that has a realistic mirroring of real world practice. We cannot guarantee that the ratio of cars to deer on the road will always be a set constant ratio, nor can we guarantee that a new batch of data that is fed into our classifier will containt he same ratio. We believe that stratifying our data might detract from a realistic mirroring.

Therefore, we will be using 10-fold cross validation as our metric as opposed to stratified 10-fold cross validation.

## 2. Modeling

### 2.1 Convolutional Neural Network Model Implementation

### 2.2 Network Architecture Experimentation

### 2.3 Performance Analysis Compared to Scikit-Learn MLP

## 3. Data Expansion