## Intel Image Classification using Neural Networks
by Allen Wang

Convolutional neural networks are a class of neural networks use for image learning and analysis. Convolutional networks have been an important part of computer vision using layers of convolved neurons for feature and image detection. Computer vision's importance lies in many industries such as autonomous driving, search engines, facial recognition, augmented reality, and healthcare. There have been several architectures of neural networks throughout the years and each architecture has its performance measured in the ILSVRC ImageNET challenge. Hopefully in this project, we will train a neural network that will attain a good accuracy in scenery identification possibly used for autonomous driving or camera technology. There have been several notable architectures. AlexNet, VGG, and ResNet are ILSVRC challenge winners in 2012, 2014 and 2015. In this project we are going to be using ResNet50 and VGG16 and a basic format of a CNN for image identification of natural scenes. We are going to try and sort images into 6 classes: sea, street, mountains, glacier, forest, buildings. 

![Intel background](vgg.png)

### 1. Data

The data consists of natural scenes around the world separated into 6 different classes: sea, street, mountains, glacier, forest, buildings. The data comes from a dataset on Kaggle.com that was published by Intel. This data was initially published on https://datahack.analyticsvidhya.com by Intel to host a Image classification Challenge. The Train, Test and Prediction data is separated in each zip files. There are around 14k images in Train, 3k in Test and 7k in Prediction. To view or download the dataset click on the link below:

* [Kaggle](https://www.kaggle.com/puneet6060/intel-image-classification)

### 2. Training Model 1

For the first neural network we're going to be using a default convolutional neural network available on the official Tensorflow website as a tutorial. In order to properly load data into the CNN, a dataset has to be compiled on python using the images provided. All the images are organized into folders by their respective labels so Tensorflow has a method of creating a dataset using a branch of folders and images. Using the function image_dataset_from_directory from tf.keras.preprocessing, I created a dataset with image sizes of 150x150 and a batch size of 32. Here are the first 9 images from the training dataset.

![image dataset](imagedataset.png)

The layers for the first model are listed so:

![model1](model1layers.jpg)

After several trials of training and tuning, I realized that the model achieves a somewhat decent accuracy but tends to overfit to the training data. After the third epoch, there was a huge disparity between training loss and validation loss which means that the model has overfitted to the training data. I felt that there was no need for parameter tuning since the model itself did not achieve that good of an accuracy at its peak. It was time to try different architectures. 

![model1](model1.jpg)

![model1](model1test.jpg)

### 2. Training Model 2

The second neural network I am going to be using is ResNet50, the winner of ILSVC 2015 producing an error rate of less than 3.6% and only composing of 152 layers. It is such a prominent architecture of convolutional networks because of its ability of skip connections. ResNet uses something called a residual block. And adding a residual block allows the outputs to just copy over the inputs in a network. This drastically improves training time and also improves performance. 

For this model, I wrote a function that extracts the labels from the dataset from the images and then I concatenated them back together. The purpose of this is to extract the individual labels as their own variable in order to feed it into the ResNet network. 

I used a very simple ResNet50 network and then I appended more layers at the end of the output to fine tune the parameters. 

![model2](resnet.jpg)

The validation accuracy and training accuracy as well as the loss seems to be somewhat consistent indicating no sign of overfitting and start to smooth out after 20 epochs. The accuracy of the neural network only caps out at around 70%. I adjusted some for some of the parameters by changing average pooling to max pooling and adding dropout layer. The accuracy seemed to be the same so I just decided to try a different architecture.

![model1](model2.jpg)

![model1](model2test.jpg)

### 2. Training Model 3

The last type of architecture I used was VGG16 which was an CNN but with 13 convolutional and 3 fully-connected layer with the 'relu' activation function. 

Here are the layers for the VGG16 architecture

![model2](vgg16.jpg)

After trials of testing, I determined that this was the best model and I decided to add two callbacks: early stopping and reduction on learning rate.

Reduction on learning rate helps find the minimum of the loss function and early stopping helps prevent unnecessary computations. The early stopping callback stopped the training at the 11th epoch and the reduction on learning rate helped improve the accuracy by at least 5%. With this architecture, I was able to achieve ~85% accuracy. 

![model1](model3.jpg)

The training and validation accuracy and loss are fairly consistent; they tend to go at the same rate. The accuracies and losses seem to level out at the 11th epoch which is evident on the graph. 

VGG16 was the final model that I decided to go with since it achieved the best training accuracy and minimal training loss while maintaining same levels of validation accuracy and also minimizing validation loss. The model did not overfit as evident in the graphs above.

![model1](model3test.jpg)