# Logistic Regression 

For this example I am using TensorFlow library and MNIST database of handwritten digits.

Logistic regression algorithm finds a linear function that predict class $(y)$ given an input vector $x$ as,

$$
\hat{y}=\sum_{n_x}^{j=0} w_jx_j+b=w^Tx+b
$$
where $X$ is the explanatory variable and $Y$ is the dependent variable. The slope of the line is w, and b is the intercept. Typicaly, the output should be in probability form (0<ŷ <1), and thefore sigmoid function ( or similar function) is applied to predicted output.

$$\hat{y}=P(y=1|x) \longrightarrow \hat{y}=\sigma(w^Tx+b)$$

The final goal  here is to find optimal weights $ w $ that best approximate the output y by minimizing 
the prediction error. The simple loss function or error function is one half of square error as, 
$$
L(\hat{y},y)=\dfrac{1}{2}(\hat{y}-y)^2\approx 0
$$

---

In [1]:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

### Load data
Firts download MNIST dataset from [here](http://yann.lecun.com/exdb/mnist/). Tensorflow allows us to easily load the MNIST data.  If you are using different data you might need to write your own pipeline. 

---

In [2]:
# In one_hot the class-numbers is converted from a single integer to a vector whose length equals 
#the number of possible classes. For zero the one_hot look like this ==> [1., 0., 0., 0., 0., 0., 0., 0., 0., 0.]
mnist = input_data.read_data_sets("../data/MNIST/", one_hot=True)
train_x = mnist.train.images
test_x = mnist.test.images
print(test_x.shape[1])
fv_size=train_x.shape[1]#size of feature vector
num_cls=10

Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
Instructions for updating:
Please write your own downloading logic.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting ../data/MNIST/train-images-idx3-ubyte.gz
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting ../data/MNIST/train-labels-idx1-ubyte.gz
Instructions for updating:
Please use tf.one_hot on tensors.
Extracting ../data/MNIST/t10k-images-idx3-ubyte.gz
Extracting ../data/MNIST/t10k-labels-idx1-ubyte.gz
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
784


### Build TensorFlow Graph

In [3]:
# Parameters
learning_rate = 0.01
training_epochs = 25
batch_size = 100
# graph input
x     = tf.placeholder(tf.float32,[None, fv_size])
y     = tf.placeholder(tf.float32,[None,num_cls])
y_cls = tf.placeholder(tf.int64, [None])
# Set model weights
w     = tf.Variable(tf.zeros([fv_size, num_cls]))# Weights
b     = tf.Variable(tf.zeros([num_cls])) #Biases

### Construct model 
Here I used  Softmax Regression which is a generalization of logistic regression that we can use for multi-class classification.

---


In [4]:
y_hat = tf.nn.softmax(tf.matmul(x, w) + b)

### Minimize error
The half a square error might cause the optimization problem (non-convex problem). To prevent multiple local optima the loss function can be written as,  
$$
L(\hat{y},y)=-\big(y\log \hat{y}+(1-y)\log(1-\hat{y}) \big)
$$
Above equation measures error for one a single training example. The overall performance over $ m $ training  is indicated by cost function $ J $ as,  
$$
J(w,b)=\dfrac{1}{m} \sum_{i=1}^{m} L(\hat{y}^i,y^i)
$$

---

In [5]:
J = tf.reduce_mean(-tf.reduce_sum(y*tf.log(y_hat), reduction_indices=1))

### Define optimizer

In [6]:
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(J)

### Run TensorFlow

In [7]:
init = tf.global_variables_initializer()

In [8]:
sess = tf.InteractiveSession()
sess.run(init)
for epch in range(training_epochs):
        avg_cost = 0.
        total_batch = int(mnist.train.num_examples/batch_size)
        # Loop over all batches
        for i in range(total_batch):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            # Run optimization op (backprop) and cost op (to get loss value)
            _,c = sess.run([optimizer,J], feed_dict={x: batch_x,y: batch_y})
            # Compute average loss
            avg_cost += c / total_batch
        if (epch+1) % 1 == 0:
            print("Epoch:", '%04d' % (epch+1), "cost=", "{:.9f}".format(avg_cost))

print("Optimization Finished!")

('Epoch:', '0001', 'cost=', '1.183184105')
('Epoch:', '0002', 'cost=', '0.665269974')
('Epoch:', '0003', 'cost=', '0.552796931')
('Epoch:', '0004', 'cost=', '0.498691439')
('Epoch:', '0005', 'cost=', '0.465480853')
('Epoch:', '0006', 'cost=', '0.442601104')
('Epoch:', '0007', 'cost=', '0.425521235')
('Epoch:', '0008', 'cost=', '0.412171783')
('Epoch:', '0009', 'cost=', '0.401396890')
('Epoch:', '0010', 'cost=', '0.392414389')
('Epoch:', '0011', 'cost=', '0.384699149')
('Epoch:', '0012', 'cost=', '0.378190482')
('Epoch:', '0013', 'cost=', '0.372417551')
('Epoch:', '0014', 'cost=', '0.367313159')
('Epoch:', '0015', 'cost=', '0.362695270')
('Epoch:', '0016', 'cost=', '0.358600007')
('Epoch:', '0017', 'cost=', '0.354862887')
('Epoch:', '0018', 'cost=', '0.351458699')
('Epoch:', '0019', 'cost=', '0.348322929')
('Epoch:', '0020', 'cost=', '0.345416409')
('Epoch:', '0021', 'cost=', '0.342750701')
('Epoch:', '0022', 'cost=', '0.340278062')
('Epoch:', '0023', 'cost=', '0.337937795')
('Epoch:', 

### Performance measures

In [9]:
correct_prediction = tf.equal(tf.argmax(y_hat, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print("Accuracy:", accuracy.eval({x: mnist.test.images, y: mnist.test.labels}))

('Accuracy:', 0.9144)
