<h2 align="center"> Logistic Regresson from Scratch (Code) </h2> 
<h3 align="center"> Author: Ibrahim O Alabi, PhDc </h3>

This notebook is part of my series on Introduction to Python for Data Science. This is my way of contributing to open source knowledge. If you find this content useful, please consider leaving a **star** on this repository.

## Batch Gradient Ascent Algorithm for Logistic Regression

1. $\textbf{w}^0 \gets [0, 0, 0, ..., 0] $ (initialize the weights with 0s)
2. $b^0 \gets 0 $ (initialize the bias with 0)
3. **for** epoch = 1 to Maxiter **do**
    - Compute gradient w.r.t $\textbf{w}$ using the training data ($\nabla_{\textbf{w}} \mathcal{L (\textbf{w}, b| \mathcal{D}_{tr})}$)
    - Compute gradient w.r.t $b$ using the training data ($\nabla_{b} \mathcal{L (\textbf{w}, b| \mathcal{D}_{tr})}$)
    - Update $\textbf{w}$ ($\textbf{w}^{epoch+1}$)
    - Update $b$ ($b^{epoch+1}$)
    
Our code assumes that our iteration stops at the maximum iteration (Maxiter).

In [18]:
class LogisticRegression(object):
    def __init__(self):
        self.weights = None
        self.bias = None
        
    def _logistic(self, z):
        sigmoid = 1/(1 + np.exp(-z))    ### the logistic (sigmoid) function
        return sigmoid
    
    def fit(self, X, y, max_iters=1000, lr = 1e-3, C = 1):
        
        """_summary_
        
        fit: A funtion that trains the Logistic Regression classifier using the full batch gradient descent

        Args:
            X (numpy array):      Numpy array of shape (n, p) containing the training data, where n is the sample size 
                                  and p is the number of explanatory variables.
                             
            y (numpy array):      Numpy array of shape (n, 1) containing the corresponding class labels.
            
            max_iters (integer):  Maximum number of iteration during optimization. Defaults to 1000
            
            lr (float):           Learning rate used in the gradient descent optimization. Defaults to 0.001
            
            C (float) :           Regularization hyperparameter (smaller values = stronger regularization). Defaults to 1 
                             

        Returns:
                Numpy array of learned coefficients
        """
        
        # initialize weights
        p = X.shape[1]
        
        if self.weights is None:
            self.weights = np.zeros((p, 1))
            self.bias = np.zeros((1,1))
        
        # Maximizing the 
        for epoch in range(max_iters):
            z = np.dot(X,self.weights) + self.bias 
            sigma = self._logistic(z) 
            dw = np.dot(X.T, (y - sigma))
            db = np.sum((y - sigma))
            self.weights += lr*(dw - self.weights/C)
            self.bias += lr*db
        self.coefs = np.concatenate((self.bias,self.weights), axis=0)
        return self
    
    def predicted_prob(self, X_test):
        
        """_summary_
        
        predicted_prob: Uses the optimal weights and bias from fit to predict probailities for the test (or train) set. 

        Args:
            X_test (numpy array):   Numpy array of shape (n, p) containing the testing (or training) data, where n is the sample size 
                                    and p is the number of explanatory variables.
                             
        Returns:
            y_prob:                  predicted probabilities of an instantace belonging to the class labeled 1
        """
        
        z = np.dot(X_test,self.weights) + self.bias
        y_prob = self._logistic(z)
        return y_prob
    
    def predict(self, X_test, threshold = 0.5):
        
        """_summary_
        
        predict: Uses the optimal weights and bias from fit to predict labels for the test set. 
                 Note that training data may also serve as testing data.

        Args:
            X_test (numpy array):   Numpy array of shape (n, p) containing the testing (or training) data, where n is the sample size 
                                    and p is the number of explanatory variables.
                                    
            threshold:              0 <= threshold <= 1, threshold for making decision
                             
        Returns:
            prediction:             predicted class of all instances.
        """
        
        if threshold < 0 or threshold > 1:
            raise ValueError("Threshold must be a probability value") 
        else:
            prediction = (self.predicted_prob(X_test) > threshold).astype(int)
        return prediction

## Let's implement our code on the iris dataset

The iris dataset comes inbuilt with `sklearn.datasets`, so, let's import it from `sklearn.datasets`. In addition, we will compare our function with the LogisticRegression method from sklearn.

### Load Libraries

In [21]:
import numpy as np
from sklearn import metrics
from sklearn import datasets
import matplotlib.pyplot as plt
import sklearn.linear_model as slm
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split, StratifiedKFold, KFold

### Import dataset

We will do binary classifcation (setosa against others)

In [12]:
iris = datasets.load_iris()
X = iris.data
y = (np.where(iris.target > 0, 1, iris.target)).reshape(-1,1)
y = 1- y  ## setosa = 1, others = 0

### Train-test Split

Training data = 2/3 of the whole dataset, and testing data is 1/3 of the whole dataset

In [13]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=1/3,
    shuffle=True, stratify=y,  ## to keep the class proportion in the split
    random_state= 10  ## For reproducible results
)

### Standardizing

Let's standardize so that all input variables have zero mean and unit standard deviation. This is going to improve the training process and prevent variables with larger scales from dominating the training process.

In [14]:
scaler = StandardScaler()
X_train=scaler.fit_transform(X_train)
X_test=scaler.transform(X_test)

* **Note**

fit_transform != transform.

transform uses the parameters of fit contained in fit_transform.

### Fit the Model

In [23]:
model = LogisticRegression()
model.fit(X_train, y_train, max_iters=5000, lr=0.01, C = 0.05)
model.coefs

array([[-1.10932086],
       [-0.41956247],
       [ 0.4754382 ],
       [-0.6209608 ],
       [-0.59230536]])

### Let's Compare with sklearn's Logistic Regression

In [30]:
clf=slm.LogisticRegression(random_state=0, C=0.05).fit(X_train, y_train.ravel())

In [41]:
np.concatenate((clf.intercept_.reshape(-1,1),clf.coef_.T))

array([[-1.10932086],
       [-0.4195625 ],
       [ 0.47543818],
       [-0.62096099],
       [-0.59230516]])

Our coefficients are identical! We will continue with our own implementation of the Logistic Regression.

Model: 

$$
P(y_i = \text{setosa} | \textbf{x}_i)  = \frac{1}{1 + e^{-(-1.109\ -\ 0.420x_{1i}\ +\ 0.475x_{2i}\ -\ 0.621x_{3i}\ -\ 0.592x_{4i})}}
$$

where

- $x_1$ = sepal length (cm)
- $x_2$ = sepal width (cm)
- $x_3$ = petal length (cm)
- $x_4$ = petal width (cm)

In [44]:
y_hat = model.predict(X_test)

print(f"Testing accuracy: {metrics.accuracy_score(y_true = y_test, y_pred = y_hat)}")
print(f"Testing F1 Score: {metrics.f1_score(y_true = y_test, y_pred = y_hat)}")
print(f"Testing precision: {metrics.precision_score(y_true = y_test, y_pred = y_hat)}")
print(f"Testing recall: {metrics.recall_score(y_true = y_test, y_pred = y_hat)}")

Testing accuracy: 1.0
Testing F1 Score: 1.0
Testing precision: 1.0
Testing recall: 1.0
