# Logistic Regression

Logistic Regression is a Classification Algorithm which under the hood uses a simple linear regression model and dumps the output prediction(Continous Values) to a sigmoid funtion that maps the values to the corresponding binary probabilities.

So, In a nutshell

$$Linear~Regression + Sigmoid ~Function \Rightarrow Logistic~Regression$$

### **Note:**

Logistic Regression in its simplest form is a binary classification algorithm and ny extension can be used for multiclass classification using something called as the "One vs Rest" approach in which a seperate classifier is calculated for each class.

# Math Behind Logistic Regression

There is nothing new behind the scene for logistic regression except the inclusion of the sigmoid function, So lest look at the function closely.

The Sigmoid Function:

$$S(x) = \frac{1}{1+e^{-z}}$$

The function itself is innocent enough there is only variable that needs to be dealt with which is $z$.   
Now, what makes the sigmoid function useful is its ability to map given inputs to a space of $(0 - 1)$. This nature of the function can be understood by looking at its curve.

Sigmoid Function Curve:

![sigmoid](https://ai-master.gitbooks.io/logistic-regression/content/assets/sigmoid_function.png)

So, the sigmoid function takes any real number, and returns a numbers in the space $(0-1)$. Because of this, We can combine the sigmoid function at the end of the linear regression and instead of the continous values we can obtain the corresponding sigmoid's output that can be interpreted as probabilities in a binary classification problem and we can use these to make stochastic predictions.

So, Using a general linear regression model:

$$y = \beta_0 + \beta_1x_1 + \beta_2x_2 + \dots +\beta_kx_k$$

The Logistic Regression Model will become:

$$P(y = 1|x) = \frac{1}{1 + e^{-(\beta_0 + \beta_1x_1+ \dots +\beta_kx_k)}}$$

Since, the values of $\beta$ vector or the coefficients of the linear regression model will be different for every problem therefore, the shape of sigmoid function will also be different for every problem but the function will always retain its general shape and properties.

### **Bonus:**

Since, the logistic regression only introduces the sigmoid function at end of the linear regression pipeline we can use any algorithm for the linear regression and convert it to it's corresponding logistic regression algorithm. For example lasso logistic regression or least squares logistic regression.

# Implementing Logistic Regression

For this implementation, A Gradient descent linear regression will be used and converted to a Gradient Descent Logistic Regression

In [14]:
#First we need a sample dataset on which we can test our algorithms
#Using sklearn to create a random binary classification problem
from sklearn.datasets import make_classification

X,y = make_classification(n_samples=200,
                      n_features=12, 
                      n_classes=2, 
                      random_state=42)

#Splitting the dataset into test and training sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

#looking at the generated Data
X_train[:5,:] #First Five rows

array([[-1.18759842,  0.30982071, -1.08801577,  1.58463429,  2.14514913,
         1.24774207,  0.27899416, -0.96510344, -0.67349062, -0.29494968,
        -0.83534705,  0.81794088],
       [-0.13744851,  0.95287455, -0.35988191,  0.77000611,  1.37365855,
         0.67787532, -1.8306329 ,  0.24554453, -0.65407568, -2.70323229,
         0.5112026 ,  0.19476642],
       [-1.11057585,  1.75227044,  1.18137586, -2.34174533, -0.77781669,
        -2.07339023, -0.37144087, -0.37892304, -0.34268759,  1.24608519,
        -1.40751169, -0.69666772],
       [-0.43449623, -0.30917212,  0.47282467, -1.28206838, -0.46227529,
         0.02975614, -0.51604473, -0.94377212,  0.93828381,  0.02831838,
         0.09612078, -0.17253989],
       [ 0.77836108, -0.55118572, -0.18053048,  0.77127784,  0.59252695,
         2.00609289,  1.20836623,  1.00760453,  2.06150358,  1.54210995,
         1.02406253, -0.02097391]])

In [2]:
#Creating the Gradient Descent Logistic Regression Algorithm
import numpy as np

class MyLogisticRegression():

  def __init__(self, n_iterations=1000, learning_rate=0.001):

    self.weights = None #Initializing weights vector
    self.intercept = None #Initializing intercept
    self.n_iterations = n_iterations #setting number of iterations
    self.lr = learning_rate #setting learning rate

  def _Gradient_Descent(self, n_samples, X, y_probs, y_act):
    
    #Compute Gradient
    self._Dw = (-1) * (1/n_samples) * np.dot(X.transpose(), (y_act - y_probs))
    self._Di = (-1) * (1/n_samples) * np.sum(y_act - y_probs)

    #Update Model Parameters
    self.weights = self.weights - (self.lr * self._Dw)
    self.intercept = self.intercept - (self.lr * self._Di)

  def _sigmoid(self, y_cont): #A method for the sigmoid function

    return 1 / (1 + np.exp(-y_cont))
  
  def fit(self, X, y):

    n_samples, n_features = X.shape

    #Initializing Parameters
    self.weights = np.zeros(n_features)
    self.intercept = 0

    for _ in range(self.n_iterations):

      #Calculating Continous Prediction(Regression)
      y_cont = self.intercept + X.dot(self.weights)

      #Converting the continous prediction to class probabilities Using Sigmoid Function
      y_probs = self._sigmoid(y_cont)

      #Gradient Descent
      self._Gradient_Descent(n_samples, X, y_probs, y)

  def predict(self, x, threshold=0.5):

    if type(x) != 'numpy.ndarray':
      x = np.array(x)

    y_cont = self.intercept + x.dot(self.weights)
    y_probs = self._sigmoid(y_cont)
    y_preds = [1 if y_prob > threshold else 0 for y_prob in y_probs]
    return y_preds


In [24]:
#Now, we can test our Logistic Regression Algorithm
model = MyLogisticRegression(n_iterations=1000, learning_rate=0.005) #Creating an instances
model.fit(X_train,y_train) #Fitting the model

#Lets see the weights and the intercept
print('The Calculated Model:')
print('The Coefficients = {}'.format(model.weights.round(4)))
print('The Intercept {:.4f}'.format(model.intercept))

The Calculated Model:
The Coefficients = [-0.0384  0.1237  0.1698  0.1423  0.0547  0.1277  0.1024  1.0456  0.0565
  0.0339 -0.011  -0.2477]
The Intercept -0.0483


In [33]:
#Since, this is a classification problem the weights of the regression will not
#be very useful but we can calculate the accuracy of model

#Helper function to calculate accuracy
def accuracy(y_true,y_pred):
  accuracy = np.sum(y_true == y_pred)/len(y_true)
  return accuracy

#Getting predictions
y_test_preds = model.predict(X_test)

#Evaluating Performance on Test set
acc = accuracy(y_test,y_test_preds)
print('The Accuracy gained by our Algorithm on Test Set = {:.2f}%'.format(acc*100))

The Accuracy gained by our Algorithm on Test Set = 81.67%


In [53]:
#The algorithm maganged to achieve >80% Accuracy withoud any regularization
#Lets look at 10 random predictions from the test data

choices = np.random.choice(60, size=10, replace=False)
batch_test = X_test[choices, :]
y_actual = y_test[choices]
y_actual = y_actual.reshape(-1,1)

#Extracting predictions
y_preds = np.array(model.predict(batch_test))

#Making a dataframe of actual and predicted results
y_preds = y_preds.reshape(-1,1)
import pandas as pd
df = pd.DataFrame(np.concatenate((y_actual,y_preds), axis = 1), columns = ['Actual', 'Predicted'])
df['Remark'] = 'Incorrect Prediction'
df.loc[df['Actual'] == df['Predicted'], 'Remark'] = 'Correct Prediction'
df

Unnamed: 0,Actual,Predicted,Remark
0,0,0,Correct Prediction
1,0,0,Correct Prediction
2,0,0,Correct Prediction
3,1,1,Correct Prediction
4,1,1,Correct Prediction
5,0,0,Correct Prediction
6,1,0,Incorrect Prediction
7,0,0,Correct Prediction
8,0,0,Correct Prediction
9,1,0,Incorrect Prediction
