<h1><center> About the MNIST dataset</center></h1>

![d5222c6e3d15770a.png](attachment:d5222c6e3d15770a.png)

In our previous notebook, we looked at how to read in the MNIST Dataset from the zip files provided on the MNIST website. Once we understood how to do that, we were ready to move on to the next step of the project. This was <b>classifiying</b> the .png files contained in the .zip files we extracted in our previous notebook.

 * Training Set - 55,000 images
 * Test Set - 10,000 images

<h2>What is a classifier?</h2>

In the most basic terms, we can describe a classifier as an algorithm that maps <b>input data</b> to some kind of <b>output</b>(category, class, target etc.)


Look at the following example.

![Untitled-Diagram.png](attachment:Untitled-Diagram.png)

* Regression involves estimating or predicting a response.

* Classification is identifying group membership.

![ml_map.png](attachment:ml_map.png)

In this example, we can consider the 'news articles' as our <b>input data</b> and 'food', 'sport' and 'politics' as our as our <b>output</b>. Consider we give the algorithm a specific article containing information about food and tell it that this is a food article. Then when we give the algorthim an unclassified article, it should be able to analyze the data it already has and make a guess as to what the article could be about(food, sports, politcs). Classification belongs to the category of <i>supervised learning</i> as output data is provided. 

<h2>Which classifier should I use?</h2>

There is no right or wrong answer to this question as different classifier's can be used in varying situations, ideally the <i>best</i> classifier would be the one that produces the highest accuracy when trained and setup properly. However, we can use common sense a bit of guidance to decide on a classifier which best suits our needs. <i> Scikit-learn</i> provides a helpful cheat sheet below detailing what kind of model or classifier you should use when approaching a problem and given your dataset you want to use.

![ml_map.png](attachment:ml_map.png)

Although this can be helpful for beginners, reducing the complexity of choosing an algorithm down to a simple image like this is not recommended. Every situation warrants a detailed examination of many more factors than those presented in this image.

<h3>Example classifiers</h3>

* [Baseline Classifier](https://www.tensorflow.org/api_docs/python/tf/estimator/BaselineClassifier)

* [Linear Classifier](https://www.tensorflow.org/api_docs/python/tf/estimator/LinearClassifier)

* [DNN Classifier](https://www.tensorflow.org/api_docs/python/tf/estimator/DNNClassifier)

* [DNN/Linear Combined Classifier](https://www.tensorflow.org/api_docs/python/tf/estimator/DNNLinearCombinedClassifier)


<h1><center> MNIST - Linear Classifier<center></h1>

![linsep_new.png](attachment:linsep_new.png)

 * Inputs = images
 * Outputs = digit(0..9)

On my first attempt at classifying the MNIST dataset, I used a linear classifier as it was basic and easy way to understand the concept of a linear classifier. When using a linear classifier, the classification is done using a linear function of the inputs. The best way to explain a linear model is the following: Think of each <b>input</b> as a point in <i>d</i>-dimensional space. Each one of these <b>inputs</b> corresponds to one <b>output</b>. The classifier defines a line in this space which seperates <b>positive</b> from <b>negative</b> values. Every point is either on one side of the line (or 
plane or hyperplane) or the other. This is also known as binary classification as there are only 2 classes

The overall goal of the learning process is to come up with a <i>good</i> line that fit's the data correctly. The algorithm will use this line to make decisions when given <b>input</b> with no <b>output</b>. The slope and position of this line is dictated by the bias and weights assigned in the algorthim. 

Using this algorthim, I was able to generate a prediction accuracy rate of <b>92 percent</b>. Although this was good, I thought it could be improved so I did some research as to how I could better improve it. From research, it was recommended that to achieve better results, one must use <i>deep neural network classification</i>.

<h2> Performance </h2>

As the complexity of this script is quite simple, it was fairly easy to run and run's nearly instantaneously on my machine. We will see, however, that this isn't the case with the DNN classifier below.

<h1><center> MNIST - Deep Neural Network Classifier<center></h1>

![OH3gI.png](attachment:OH3gI.png)

In our deep neural network we created to classify the MNIST dataset, we have 2 hidden layers. Our first classification using a linear classifier had 1 hidden layers. For this reason, we can call this a <i>'deep'</i> neural network as it contains <b>more</b> than <b>1</b> hidden layer. Let's look at some code to demonstrate the difference.

![1.PNG](attachment:1.PNG)

![2.PNG](attachment:2.PNG)

If we look at the above examples of creating a <b>feedforward</b> neural network and a <b>deep</b> neural network we can see that when creating the <b>deep</b> neural network, we define two hidden layers in the <i>hidden_units</i> variable, the first with 256 nodes and the second with 32 nodes. 

![inputs_to_model_bridge.jpg](attachment:inputs_to_model_bridge.jpg)

* feature_columns = <i>extract information to give to the model</i>
* hidden_units = <i>hidden layers and the nodes of each</i>
* optimizer = <i>optimization function to apply to the model</i>
* n_classes = <i>number of classes, in our case 10 (0..9)</i>
* dropout = <i>define a dropout rate. prevents ovverfitting</i>
* model_dir = <i>choose a directory to save the model into</i>

<h2>Which optimizer should I use?</h2>

A common problem faced when working on deep learning projects is choosing an optimizer. In the diagram below, [David Mack](https://medium.com/octavian-ai/which-optimizer-and-learning-rate-should-i-use-for-deep-learning-5acb418f9b2)tested a variety of commonly used optimizers on the MNIST Datset with the same model. From looking at the diagram we can see that different optimizers take different amounts of time to reach certain accuracies. We can also see that even when given extra time, some optimizers perform worse than others. It's worth noting that this graph only applies to the MNIST data

![1%203mbLR7aSgbg_UoueBymw5g.png](attachment:1%203mbLR7aSgbg_UoueBymw5g.png)

<h2> Performance </h2>

At first, this classifier was running extremely slowly, it took about 2-3 minutes to compile and train the model when the linear classifier only took about 5 seconds. After a long time spent figuring out why this was, I realized that the script was trying to classify/train all 10,000 images from the test set at once rather than in steps. I changed the step value from 10,000 to 1000 and this significantly improved the performance of the classifier. 

<h3> References</h3>

* [Classifying Handwritten Digits with TF.Learn - Machine Learning Recipes](https://www.youtube.com/watch?v=Gj0iyo265bc)

* [MNIST Website](http://yann.lecun.com/exdb/mnist/)

* [Handling the MNIST Datset](https://github.com/datapythonista/mnist)

* [Writing images using OpenCV](https://docs.opencv.org/2.4/doc/tutorials/introduction/load_save_image/load_save_image.html)

* [Classifying the MNIST Dataset using a DNN](https://codeburst.io/use-tensorflow-dnnclassifier-estimator-to-classify-mnist-dataset-a7222bf9f940)

* [What is the difference between a neural network and a deep neural network, and why do the deep ones work better?](https://stats.stackexchange.com/questions/182734/what-is-the-difference-between-a-neural-network-and-a-deep-neural-network-and-w)

* [TensorFlow Optimizer Documentation](https://www.tensorflow.org/api_docs/python/tf/train/Optimizer)

* [How to pick an optimizer and learning rate for a model](https://medium.com/octavian-ai/which-optimizer-and-learning-rate-should-i-use-for-deep-learning-5acb418f9b2)