# Day2: January 31st 
<br></br>
<u>Introduction to tensorflow</u>
<br></br>
What is the MNIST data set?

There are many different MNIST data sets, but in this project we will be using the MNIST data set of hand written digits. It's a training set of 60,000 hand written digits and a test set of 10,000 hand written digits. Image recognition is one of the basic forms of machine learning application and is similar to the 'Hello World' of machine learning.

__             Fig 1. Example of MNIST data of written digits          __

![MNIST data set](http://neuralnetworksanddeeplearning.com/images/mnist_100_digits.png)

Linear Regression is used to supervised learning on the set of training models. Supervised learning is where you have both the input and the output values and you map the model based on those values (using linear regression). Following this is the equation depicting linear regression on a model and a corresponding graph.

$Y_i = \beta_0 + \beta_1X_i + \epsilon_i$

![Linear regression graph](https://upload.wikimedia.org/wikipedia/commons/thumb/3/3a/Linear_regression.svg/1200px-Linear_regression.svg.png)

In this particular problem, the type of function that will be modelling the model is a softmax function which is a generalization of the sigmoid function.

$f(x) = \frac{1}{1+e^-(x)}$

In this part of the code, the tensorflow library has its own softmax function that's already implemented for you to use:

In [None]:
y = tf.nn.softmax(tf.matmul(x,W)+b)

Cross entropy loss is another important function that outputs the probability (from 0 to 1) that the prediction probability diverges from the actual value on the label of the image. This is modelled by the function

$$CE = - \sum_{i}^{C} t_i log(f(s)_i)$$
$$CE = - \sum_{i = 1}^{C' = 2} t_i log(f(s_i)) = -t_1 log(f(s_1)) - (1-t_1) log(1-f(s_1))$$

In the code, we use the tensorflow library's function to model this equation for the training model.

In [None]:
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels = y_, logits = y))

Using the stochastic gradient descent, we're able to train the model until it achieves a high accuracy. The stochastic gradient descent is an algorithm that starts at a random point and on a function and iteratively travels down the function until its slope (so the tangent to the function) reaches 0.

This is modelled by the following code:

In [None]:
learning_rate = 0.5
number_of_steps = 10000
batch_size = 100
train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(cross_entropy)

We train the model with the training examples provided and the stochastic gradient descent

In [None]:
for _ in range(number_of_steps):
    batch = mnist.train.next_batch(batch_size)
    train_step.run(feed_dict={x: batch[0], y_: batch[1]})

And evaluate the accuracy of the model

In [None]:
#Evaluate the model
correct_pred = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

# Print out the accuracy
acc_eval = accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels})
print(f"Current accuracy: %{acc_eval * 100}")