![alt text](https://miro.medium.com/max/1250/1*Z-UWDCFfm9KQ_Xu9OoQadw.png)

Logistic regression uses probabilities to distinguish inputs and thereby puts them into separate bags of output classes.

Consider a case where you want to sketch a relation between your basketball shot’s accuracy and the distance you shoot from. On the whole, it’s about predicting whether you make the basket or not. 

Let’s suppose you’re going to predict the answer using linear regression. The relation between the win (y) and distance (x) is given by a linear equation,
 
 y = mx + c. 
 
As a prerequisite, you played for a month, jotted down all the values for x and y, and now you insert the values into the equation. This completes the training phase.

Later, you want to estimate the possibility of making the shot from a specific distance. You note the value x and pass it to the trained math equation described above. It will now be a static equation, i.e.

 y = (trained_m)x + (trained_c). 
 
As a result, a `y (win)` value flew out of the equation. Discretizing y to predict the output,  either win or lose, isn’t a great technique. Although it technically works, it isn’t a sound approach because y isn’t a probability.

What about using an activation function in the final stage to compel the output to fall into either the win class or the lose class? 

This indeed seems like a fix to our problem because it takes the concept of probability into consideration. This technique is what’s meant by logistic regression. Yet this isn’t the whole story, so let’s get a detailed overview of the fix.

**TYPES OF LOGISTIC REGRESSION**

Logistic regression can be one of three types based on the output values:


1.   Binary Logistic Regression, in which the target variable has only two possible values (e.g., pass/fail or win/lose).
2.   Multi Logistic Regression, in which the target variable has three or more possible values that are not ordered (e.g., sweet/sour/bitter or cat/dog/fox).
1.   Ordinal Logistic Regression, in which the outputs are ordered in some way (e.g., bad/good/better/best or low/medium/high).


# **Building Logistic Regression Using TensorFlow 2.0.**
STEP 1: IMPORTING NECESSARY MODULES


In [1]:
from __future__ import absolute_import, division, print_function
import tensorflow as tf
import numpy as np

STEP 2: LOADING THE MNIST DATA SET & SETTING UP HYPERPARAMETERS AND DATA SET PARAMETERS

For the logistic regression model that we’re building, we will be using the MNIST data set. MNIST data is a collection of hand-written digits that contains 60,000 examples for training and 10,000 examples for testing. The digits have been size-normalized and centered in a fixed-size image (28x28 pixels) with values from 0 to 255.

To import the MNIST data set to our program, we use `tensorflow.keras.datasets`. Next, we load the training data set and testing data set in the variables (x_train,y_train) and (x_test,y_test) using the `mnist.load_data()` function.

Then we initialize the model parameters. `num_classes` denotes the number of outputs, which is 10 as we have digits from 0 to 9 in the data set. `num_features` defines the number of input parameters, and we store 784 since each image contains 784 pixels.

`learning_rate` defines the step size the model should take to converge to a minimum loss. `training_steps` defines the number of steps the model will take to train itself completely, and `batch_size` denotes the number of samples per each batch in the training process. 

We use `display_step` to iterate over the training steps and print them in the training process.

In [2]:
from tensorflow.keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

In [3]:
# MNIST dataset parameters
num_classes = 10                                     # 0 to 9 digits
num_features = 784                                   # 28*28

In [4]:
# Training parameters
learning_rate = 0.01
training_steps = 1000
batch_size = 256
display_step = 50

STEP 3: PREPARING THE MNIST DATA SET

In the code below, each image will be converted to float32, normalized to [0, 1] and flattened to a 1-D array of 784 features (28*28).

Since the data are images, we flatten the pixel values or features into a 1-D array of size 784 using the reshape method. We also normalize the pixel intensities to make sure their values are between 0 to 1 by dividing them with 255.

In [5]:
# Convert to float32
x_train, x_test = np.array(x_train, np.float32), np.array(x_test, np.float32)

In [6]:
# Flatten images to 1-D vector of 784 features (28*28)
x_train, x_test = x_train.reshape([-1, num_features]), x_test.reshape([-1, num_features])

In [7]:
# Normalize images value from [0, 255] to [0, 1].
x_train, x_test = x_train / 255., x_test / 255.

STEP 4: SHUFFLING AND BATCHING THE DATA

We need to shuffle and batch the data before we start the actual training to avoid the model from getting biased by the data. This will allow our data to be more random and helps our model to gain higher accuracies with the test data.

With the help of `tf.data.Dataset.from_tensor_slices`, we can get the slices of an array in the form of objects. The function `shuffle(5000)` randomizes the order of the data set’s examples. 

Here, 5000 denotes the variable `shuffle_buffer`, which tells the model to pick a sample randomly from 1 to 5000 samples. After that, there are only 4999 samples left in the buffer, so the sample 5001 gets added to the buffer. The perfect method allows having an efficient input pipeline by making input processing operations runnable in parallel to downstream GPU operations.

In [8]:
# Use tf.data API to shuffle and batch data.
train_data = tf.data.Dataset.from_tensor_slices((x_train,y_train))
train_data = train_data.repeat().shuffle(5000).batch(batch_size).prefetch(1)

STEP 5: INITIALIZING WEIGHTS AND BIASES

We now initialize the weights vector and bias vector with ones and zeros, respectively using `tf.ones` and `tf.zeros`. We use `tf.variable` to define these vectors as we will be changing the values of weights and biases during the course of training.



In [9]:
# Weight of shape [784, 10], the 28*28 image features, and a total number of classes.
W = tf.Variable(tf.ones([num_features, num_classes]), name="weight")


In [10]:
# Bias of shape [10], the total number of classes.
b = tf.Variable(tf.zeros([num_classes]), name="bias")

STEP 6: DEFINING LOGISTIC REGRESSION AND COST FUNCTION

We define the logistic_regression function below, which converts the inputs into a probability distribution proportional to the exponents of the inputs using the `softmax` function. The softmax function, which is implemented using the function `tf.nn.softmax`, also makes sure that the sum of all the inputs equals one.

In the next piece of code, we encode the outputs using the function `tf.one_hot`. We also define and compute the cross-entropy function as the loss function, which is given as 

cross-entropy loss = -ytrue*(log(ypred))

using `tf.reduce_mean` and `tf.reduce_sum`, which are analogous to the mean and sum functions using numpy such as np.mean and np.sum.

In [11]:
# Logistic regression (Wx + b)

def logistic_regression(x):
  # Apply softmax to normalize the logits to a probability distribution
  return tf.nn.softmax(tf.matmul(x, W) + b)


In [12]:
# Cross-Entropy loss function

def cross_entropy(y_pred, y_true):
  # Encode label to a one hot vector
  y_true = tf.one_hot(y_true, depth=num_classes)

  # Clip prediction values to avoid log(0) error
  y_pred = tf.clip_by_value(y_pred, 1e-9, 1.)

  # Compute cross-entropy
  return tf.reduce_mean(-tf.reduce_sum(y_true * tf.math.log(y_pred)))

STEP 7: DEFINING OPTIMIZERS AND ACCURACY METRICS

Now we define a function to choose the correct prediction. When we compute the output, it gives us the probability of the given data to fit a particular class of output. 

We consider the correct prediction as to the class having the highest probability. We compute this using the function `tf.argmax`. We also define the stochastic gradient descent as the optimizer from several optimizers present in TensorFlow. We do this using the function `tf.optimizers.SGD`. This function takes in the learning rate as its input which defines how fast the model should reach its minimum loss or gain the highest accuracy.

In [13]:
# Accuracy metric

def accuracy(y_pred, y_true):
  # Predicted class is the index of the highest score in prediction vector (i.e. argmax)
  correct_prediction = tf.equal(tf.argmax(y_pred, 1), tf.cast(y_true, tf.int64))
  return tf.reduce_mean(tf.cast(correct_prediction, tf.float32))


In [14]:
# Stochastic gradient descent optimizer
optimizer = tf.optimizers.SGD(learning_rate)

STEP 8: OPTIMIZATION PROCESS AND UPDATING WEIGHTS AND BIASES

Now we define the `run_optimization()` method where we update the weights of our model.

We calculate the predictions using the `logistic_regression(x)` method by taking the inputs and find out the loss generated by comparing the predicted value and the original value present in the data set. 

Next, we compute the gradients using and update the weights of the model with our stochastic gradient descent optimizer.

In [15]:
# Optimization process 

def run_optimization(x, y):
  # Wrap computation inside a GradientTape for automatic differentiation
  with tf.GradientTape() as g:
    pred = logistic_regression(x)
    loss = cross_entropy(pred, y)

  # Compute gradients
  gradients = g.gradient(loss, [W, b])

  # Update W and b following gradients
  optimizer.apply_gradients(zip(gradients, [W, b]))

STEP 9: THE TRAINING LOOP

In [16]:
# Run training for the given number of steps

for step, (batch_x, batch_y) in enumerate(train_data.take(training_steps), 1):
  # Run the optimization to update W and b values.
  run_optimization(batch_x, batch_y)
  
  if step % display_step == 0:
    pred = logistic_regression(batch_x)
    loss = cross_entropy(pred, batch_y)
    acc = accuracy(pred, batch_y)
    print("step: %i, loss: %f, accuracy: %f" % (step, loss, acc))

step: 50, loss: 918.834900, accuracy: 0.730469
step: 100, loss: 95.151543, accuracy: 0.906250
step: 150, loss: 117.975586, accuracy: 0.871094
step: 200, loss: 136.754883, accuracy: 0.875000
step: 250, loss: 209.071350, accuracy: 0.816406
step: 300, loss: 90.575401, accuracy: 0.929688
step: 350, loss: 159.829681, accuracy: 0.843750
step: 400, loss: 279.425171, accuracy: 0.820312
step: 450, loss: 80.711960, accuracy: 0.929688
step: 500, loss: 73.576675, accuracy: 0.910156
step: 550, loss: 69.959778, accuracy: 0.906250
step: 600, loss: 184.948959, accuracy: 0.843750
step: 650, loss: 79.176666, accuracy: 0.925781
step: 700, loss: 67.128937, accuracy: 0.953125
step: 750, loss: 35.166481, accuracy: 0.941406
step: 800, loss: 63.041100, accuracy: 0.929688
step: 850, loss: 48.798599, accuracy: 0.941406
step: 900, loss: 52.511875, accuracy: 0.953125
step: 950, loss: 37.899361, accuracy: 0.953125
step: 1000, loss: 80.842911, accuracy: 0.910156


STEP 10: TESTING MODEL ACCURACY USING THE TEST DATA

Finally, we check the model accuracy by sending the test data set into our model and compute the accuracy using the accuracy function that we defined earlier.

In [17]:
# Test model on validation set
pred = logistic_regression(x_test)
print("Test Accuracy: %f" % accuracy(pred, y_test))

Test Accuracy: 0.884600
