# Binary Digits Classification With Logistic Regression

Notebook by Anthony Rodriguez

## Introduction

This notebook explores binary classification on binary digits using logistic regression. Python and the Numpy library will be used to construct the logistic regression algorithm. The data is a [toy dataset](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html#sklearn.datasets.load_digits) in SKLearn, but the dataset is a test/partial dataset originally from an [optical recognition of handwritten digits dataset](https://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits). The images being used are 8X8 images and we will only use digits 0 and 1.

**Some EDA on this dataset can be seen in this [notebook](../../ExploratoryDataAnalysis/Digits.ipynb).**

In the [Multiple Linear Regression notebook](../Regression/MultipleLinearRegression.ipynb) we explored the differences in runtime between using non-matrix calculations and matrix calculations with Numpy. So, in this notebook we will just use matrix operations in our own Logistc Regression Model.

In [1]:
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split

import numpy as np
import math

## Load Data

In [2]:
digits_X, digits_y = load_digits(n_class=2, return_X_y=True)

In [3]:
digits_X

array([[ 0.,  0.,  5., ...,  0.,  0.,  0.],
       [ 0.,  0.,  0., ..., 10.,  0.,  0.],
       [ 0.,  0.,  1., ...,  3.,  0.,  0.],
       ...,
       [ 0.,  0.,  5., ...,  8.,  1.,  0.],
       [ 0.,  0.,  6., ...,  4.,  0.,  0.],
       [ 0.,  0.,  6., ...,  6.,  0.,  0.]])

In [4]:
digits_y

array([0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1,
       1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0,
       1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1,
       1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1,
       0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1,
       0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1,
       0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0,
       1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1,
       1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1,
       1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1,
       0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0,
       0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0,
       0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0,
       1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0,

## Train-Test Split

In [5]:
test_size = 0.2
random_state = 37

X_train, X_test, y_train, y_test = train_test_split(digits_X, digits_y, test_size=test_size, random_state=random_state)

In [6]:
val_size = 0.15
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=val_size, random_state=random_state)

## Create Model

In [7]:
class LogisticRegression:
    """
    The logistic regression model.
    """
    def __init__(self, alpha=1.0, max_iters=1000):
        """
        Creates LogisticRegression object.
        
        Parameters
        -----------
        self (LogisticRegression): A LogisticRegression object.
        alpha (float): The learning rate.
        max_iters (int): The maximum number of iterations of the gradient descent algorithm.
        """
        self.__alpha = alpha
        self.__max_iters = max_iters
        self.__w = None
        self.__b = None
        
    def get_parameters(self):
        '''
        self (LogisticRegression): A LogisticRegression object.
        
        Returns
        --------
        (ndarray(n,), ndarray(1,)) : The model parameters, weights and bias.
        '''
        return self.__w, self.__b
    
    def get_cost_history(self):
        '''
        self (LogisticRegression): A LogisticRegression object.
        
        Returns
        --------
        (list) : The history of cost when training the model.
        '''
        return self.__J_history
        
    def train(self, X_train, y_train, print_iterations=False):
        '''
        Trains the logistic regression model.
        
        Parameters
        -----------
        self (LogisticRegression): A LogisticRegression object.
        X_train (ndarray (m, n)): The training examples.
        y_train (ndarray(m,)): The training targets.
        print_iterations (bool): True if wanting information to be printed on every iteration, false otherwise.
        '''        
        self.__w = np.zeros_like(X_train[0])
        self.__b = np.array([0])
        
        # An array to store cost J and w's at each iteration primarily for graphing later
        self.__J_history = []
        self.__p_history = []
        
        self.__gradient_descent(X_train, y_train, print_iterations)

    def compute_accuracy(self, predictions, targets):
        '''
        Computes the accuracy of the predictions.
        
         Parameters
        -----------
        self (LogisticRegression): A LogisticRegression object.
        predictions (ndarray (m,)): The predictions.
        targets (ndarray (m,)): The targets.
        
        Returns
        --------
        (float): The accuracy percentage.
        '''
        return sum(predictions == targets) / len(targets)
        
    def predict(self, X):
        '''
        Predicts the categorical value using X, model weights, bias and the Sigmoid function.
        
        Parameters
        -----------
        self (LogisticRegression): A LogisticRegression object.
        X (ndarray (m, n)): The data to use for predictions.
        
        Returns
        --------
        (ndarray(m,)): The predictions.
        '''
        z = X @ self.__w + self.__b
        g = self.__sigmoid(z)
        g_categorical = np.where(g < 0.5, 0, 1)
        return g_categorical
        
    def __sigmoid(self, z):
        '''
        Returns the sigmoid of the parameter value.
        
        Parameters
        -----------
        self (LogisticRegression): A LogisticRegression object.
        z (float): The value to use in the sigmoid function.
        
        Returns
        ----------
        (float) the sigmoid value of z.
        '''
        return 1 / (1 + np.exp(-z))
        
    def __gradient_descent(self, X_train, y_train, print_iterations):
        '''
        Runs the gradient descent algorithm.
        
        Parameters
        ----------
        self (LogisticRegression): A LogisticRegression object.
        X_train (ndarray (m, n)): The training examples.
        y_train (ndarray(m,)): The training targets.
        print_iterations (bool): True if wanting information to be printed on every iteration, false otherwise.
        '''
        if print_iterations:
            self.__print_header()
            
        for i in range(self.__max_iters):
            # Calculate the gradient and update the parameters using gradient_function
            dj_dw, dj_db = self.__compute_gradient(X_train, y_train)
            
            # Record old parameters
            last_b, last_w = self.__b, self.__w

            # Update Parameters
            self.__b = self.__b - self.__alpha * dj_db                            
            self.__w = self.__w - self.__alpha * dj_dw
            
            # Check if new parameters are equal to the last iterations parameters
            if self.__is_convergence(last_w, last_b):                
                if print_iterations:                    
                    print('Convergence')
                    info = f'{i}\t    {J_history[-1][0]:.2}\t'

                    for w_i in w:
                        info += f'{w_i:.1e}  '

                    info += f'{b:.2e}  '

                    for dj_dw_i in dj_dw:
                        info += f' {dj_dw_i:.1e}  '

                    info += f'{dj_db}'
                    print(info)
                
                break
                
            # Print cost every at intervals 10 times or as many iterations if max iters < 10
            if print_iterations and i% math.ceil(self.__max_iters/10) == 0:
                info = f'{i}\t    {self.__J_history[-1][0]:.2}\t'
                
                for w_i in self.__w:
                    info += f'{w_i:.1e}  '
                    
                info += f'{self.__b[0]:.2e}  '
                    
                for dj_dw_i in dj_dw:
                    info += f' {dj_dw_i:.1e}  '
                
                info += f'{dj_db[0]:.2e}'
                print(info)
                
    def __compute_gradient(self, X, y):
        '''
        Computes the gradient.
        
        Parameters
        -----------
        self (LogisticRegression): A LogisticRegression object.
        X (ndarray (m, n)): The training examples.
        y (ndarray(m,)): The training targets.
        
        Returns
        --------
        (tuple (float, float)): the derivative of the loss w.r.t the weights,
                                the derivative of the loss w.r.t. the bias.
        '''
        m,n = X.shape
        dj_dw = np.zeros((n,))
        dj_db = 0

        loss = self.__compute_loss(X, y)
        dj_dw = np.sum(X * loss.reshape((loss.shape[0],1)), axis=0)
        dj_db = np.sum(loss)        
        dj_dw, dj_db = dj_dw/m, dj_db/m
        
        return dj_dw, dj_db
    
    def __compute_loss(self, X, y):
        '''
        Computes the loss.
        
        Parameters
        -----------
        self (LogisticRegression): A LogisticRegression object.
        X (ndarray (m, n)): The training examples.
        y (ndarray(m,)): The training targets.
        
        Returns
        --------
        (ndarray(m,)): the losses.
        '''
        f_X = self.predict(X)
        return f_X - y.reshape(-1)
    
    def __is_convergence(self, last_w, last_b):
        '''
        Determines whether or not the model weights and bias have converged.
        
        Parameters
        -----------
        self (LogisticRegression): A LogisticRegression object.
        last_w (ndarray(n,)): The previous model weights.
        last_b (ndarray(1, )): The previous bias.
        
        Returns
        --------
        (bool): True if previous model weights are equal to the current model weights and
                if previous model bias is equal to the current model bias, false otherwise.
        '''
        return self.__arrays_are_equal(self.__w, last_w) and self.__arrays_are_equal(self.__b, last_b)
    
    def __are_equal(self, val_a, val_b, epsilon = 1.0e-6):
        '''
        Determines whether or not the difference of two ints/floats are within a number (epsilon).
        
        Parameters
        -----------
        self (LogisticRegression): A LogisticRegression object.
        val_a (int/float): A value to be compared.
        val_b (int/float): A value to be compared.
        epsilon (float): The minimum value the difference needs to be in order
                         for the values to be considered equal.
        '''
        return abs(val_a - val_b) < epsilon
    
    def __arrays_are_equal(self, arr_a, arr_b, epsilon=1.0e-6):
        '''
        Determines whether or not the difference of all elements between
        two 1-D arrays are within a number (epsilon).
        
        Parameters
        -----------
        self (LogisticRegression): A LogisticRegression object.
        val_a (ndarray(1,)): An array to be compared.
        val_b (ndarray(1,)): An array to be compared.
        epsilon (float): The minimum value the difference needs to be in order
                         for the array to be considered equal.
        '''
        are_equal = abs(arr_a - arr_b) < epsilon
        return are_equal.all()
    
    def __print_header(self):
        '''
        Prints a header for the output of the gradient descent algorithm.
        
        Parameters
        -----------
        self (LogisticRegression): A LogisticRegression object.
        '''
        header = 'Iteration | Cost      '
        
        for i in range(self.__w.shape[0]):
            header += f' | w_{i}   '

        header += '| b      '

        for i in range(self.__w.shape[0]):
            header += f' | dj_dw_{i} '

        header += ' | dj_db  |'    
        print(header)

## Find Best Hyperparameter (Learning Rate)

In [8]:
%%time
learning_rates = [1.0e-4, 1.0e-3, 1.0e-2, 1.0e-1, 1.0]
all_acc_histories = {}
all_params = {}
highest_val_acc = float('-inf')
best_learning_rate = None

for learning_rate in learning_rates:
    # Create LinearRegressionGD object to use linear regression architecture for our model.
    lr = LogisticRegression(alpha=learning_rate, max_iters=10000)

    # Train architecture for linear regression model.
    lr.train(X_train, y_train)
    
    # Get predictions of validation set
    y_preds = lr.predict(X_val)
    
    # Get cost for validation set.
    curr_val_acc = lr.compute_accuracy(y_preds, y_val)
    
    if  curr_val_acc > highest_val_acc:
        highest_val_acc = curr_val_acc
        best_learning_rate = learning_rate
    
    # Get the parameters from the trained model that used the given learning rate.
    all_params[learning_rate] = lr.get_parameters()
    
    # Tet the cost history when training the model using the given learning rate.
    all_acc_histories[learning_rate] = lr.get_cost_history()

CPU times: total: 0 ns
Wall time: 8.23 ms


  return 1 / (1 + np.exp(-z))


In [9]:
print(f'Best learning rate: {best_learning_rate}')
print(f'Highest accuracy: {highest_val_acc}')

Best learning rate: 0.0001
Highest accuracy: 0.9772727272727273


## Predict Test Set

Now that we got the best learning rate, let's use it and bump the max iterations to 100000 to see if we can get the cost to 0.

In [10]:
%%time

lr = LogisticRegression(best_learning_rate, 100000)
lr.train(X_train, y_train)
y_test_preds = lr.predict(X_test)

CPU times: total: 0 ns
Wall time: 3.2 ms


In [11]:
lr.compute_accuracy(y_test_preds, y_test)

1.0

## Using Scikit-Learn

In [12]:
from sklearn.linear_model import LogisticRegression as SKLogisticRegression

lr_sk = SKLogisticRegression()
lr_sk.fit(X_train, y_train)


In [13]:
print("Scikit-learn Logistic Regressionaccuracy on test set:", lr_sk.score(X_test, y_test))

Scikit-learn Logistic Regressionaccuracy on test set: 1.0


## Conclusion

Our own  Logistic Regression model and Scikit-learn's Logistic Regression model both predicted the test set perfectly!