## Softmax Regression (Multinomial Logistic Regression)

It is the convention in programming that the first thing you do is print "Hello World".
So, like programming has Hello World, machine learning has MNIST.

### What is MNIST?
MNIST is a standard dataset used for computer vision. It consists of images of handwritten digits like the following:

![Figure 1. Sample images from MNIST dataset (from [TensorFlow](https://www.tensorflow.org/get_started/mnist/beginners).](figures/MNIST.png)
Along with the images are the labels for each of them, indicating which digit it is. For instance, the labels for the above images are 5, 0, 4, and 1.

For this tutorial, we are going to implement a simple model and train it to look at images, then predict what their labels are. However, take note that the model to be implemented here will not achieve state-of-the-art performance. We'll get to that later, at the next tutorial. For now, we shall be starting with a quite simple model called the ***Softmax Regression***.

We shall accomplish the following in this tutorial:
* Learn about the MNIST data and softmax regression
* Implement a model for recognizing MNIST digits, based on looking at every pixel of each image.
* Use TensorFlow to train the model to recognize the handwritten digits by having it "look" at thousands of examples
* Check the model's accuracy with the test data

### The MNIST Data
The MNIST data is hosted on [Yann LeCun's website](http://yann.lecun.com/exdb/mnist/). We shall be loading the dataset using the following two lines of code:

In [2]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('/home/darth/MNIST_data', one_hot=True)

Extracting /home/darth/MNIST_data/train-images-idx3-ubyte.gz
Extracting /home/darth/MNIST_data/train-labels-idx1-ubyte.gz
Extracting /home/darth/MNIST_data/t10k-images-idx3-ubyte.gz
Extracting /home/darth/MNIST_data/t10k-labels-idx1-ubyte.gz


The MNIST data is split into two parts:

|Filename|Category|File Size|
|--------|--------|---------|
|[train-images-idx3-ubyte.gz](http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz)|training set images|9912422 bytes|
|[train-labels-idx1-ubyte.gz](http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz)|training set labels|28881 bytes|
|[t10k-images-idx3-ubyte.gz](http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz)|test set images|1648877 bytes|
|[t10k-labels-idx1-ubyte.gz](http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz)|test set labels|4542 bytes|

The splitting of data is quite important for it is essential in machin learning to have a separate data. This way, we can determine if the model actually generalizes, and not just memorized the data.

As it was mentioned a while ago, the dataset has two parts: (1) image of handwritten digit -- we'll call `x`, and (2) a corresponding label -- we'll call `y`. Both the training set and test set contain images and their corresponding labels, e.g. `x = mnist.train.images`, and `y = mnist.train.labels`.

Each image is 28 pixels by 28 pixels, and they can be interpreted as a big array of numbers:

![](figures/MNIST-Matrix.png)

The said array can then be flatten into a vector of 28x28 = 784 numbers. From this perspective, the MNIST images are just points in a 784-dimensional vector space.

Hence, the result of a flattened `mnist.train.images` is a tensor (n-dimensional array) with a shape of `[55000, 784]`. The first dimension refers to the index of images in the dataset, while the second dimension refers to the index for each pixel in each image. Each entry in the tensor is a pixel intensity between 0 and 1, for a particular pixel in a particular image.

![](figures/mnist-train-xs.png)

As mentioned before, each image in MNIST has a corresponding label, a number between 0 and 9, representing the digit written in the image.

For this tutorial, the labels will be _one-hot vectors_, i.e. a vector with 1 in a single dimension (index of the label for the image) and 0 in the rest. For instance, a label 3 would be `[0, 0, 0, 1, 0, 0, 0, 0, 0, 0]`. Consequently, `mnist.train.labels` is a `[55000, 10]` array of floats.

![](figures/mnist-train-ys.png)

Now that you have been familiarized with the MNIST dataset, we can now write the model.

### Softmax Regression (Multinomial Logistic Regression)

What is **regression**? It is an approximation technique used to find or to estimate the relationships between or among variables. In other words, to determine the mapping of the input and output: ![](figures/mapping.png)

There are of regression techniques available, but for this tutorial, we are going to focus on _multinomial logistic regression_ or more commonly known as _softmax regression_.

Since it has already been understood that every image in MNIST is of a handwritten digit between 0 and 9, there are only 10 possible numbers a given image can be. Through softmax regression, we can look at an image and provide probabilities for each digit. For instance, the model might look at a picture of number 9, and give a probability that says it's 90% sure it's a 9, but give 5% chance to it being an 8, and scattered probabilities among others because it isn't 100% sure.

Hence, if the problem domain is to determine the probabilities of an object belonging to one of several different things (classes), softmax is the one to choose. This is so because it gives a list of values between 0 and 1, that add up to 1 (a whole probability). Even when you look at other models, the most common final step is a softmax layer.

A softmax regression has two steps:
* Add up the evidence of the input being in a certain class
* Convert the evidence to probabilities