# Applying hyperparameter tuning to already created logistic regression

### Note:

In the logistic regression from scratch file, I had implemented a model which gave a linear decision boundary and after that I had also implemented the model that gave a quadratic boundary which helped increase accuracy further

In this file we will try to improve upon the linear logistic regression model by implementing it again but this time with hyperparameter tuning using random search

---

Lets start

First import the libraries

In [33]:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt

defining the class

In [34]:
class Logistic_Regression():

    def __init__(self, learning_rate=0.01, iter_num=1000, reg_strength=0, early_stopping=False, early_stopping_tol=1e-4, early_stopping_patience=5, weight_init='zeros'):
        self.learning_rate = learning_rate
        self.iter_num = iter_num
        self.reg_strength = reg_strength
        self.early_stopping = early_stopping
        self.early_stopping_tol = early_stopping_tol
        self.early_stopping_patience = early_stopping_patience
        self.weight_init = weight_init
        self.losses = []
        self.best_weights = None

    def fit(self, X, Y, X_val=None, Y_val=None):
        self.m, self.n = X.shape
        self.w, self.b = self.initialize_weights()
        self.X = X
        self.Y = Y

        if self.early_stopping:
            best_loss = np.inf
            patience_count = 0

        for i in range(self.iter_num):
            self.update_weights()
            loss = self.compute_loss()
            self.losses.append(loss)

            if self.early_stopping and X_val is not None and Y_val is not None:
                val_loss = self.compute_loss(X_val, Y_val)
                if val_loss < best_loss - self.early_stopping_tol:
                    best_loss = val_loss
                    patience_count = 0
                    self.best_weights = self.w, self.b
                else:
                    patience_count += 1
                    if patience_count >= self.early_stopping_patience:
                        break

    def initialize_weights(self):
        if self.weight_init == 'zeros':
            return np.zeros(self.n), 0
        elif self.weight_init == 'random':
            return np.random.randn(self.n), 0

    def update_weights(self):
        Y_hat = self.predict_proba(self.X)
        dw = (1 / self.m) * np.dot(self.X.T, (Y_hat - self.Y))
        db = (1 / self.m) * np.sum(Y_hat - self.Y)

        dw += (self.reg_strength / self.m) * self.w

        self.w = self.w - self.learning_rate * dw
        self.b = self.b - self.learning_rate * db

    def predict_proba(self, X):
        return 1 / (1 + np.exp(-(X.dot(self.w) + self.b)))

    def predict(self, X):
        return np.where(self.predict_proba(X) > 0.5, 1, 0)

    def compute_loss(self, X=None, Y=None):
        if X is None and Y is None:
            X, Y = self.X, self.Y

        Y_hat = self.predict_proba(X)
        loss = (-Y * np.log(Y_hat) - (1 - Y) * np.log(1 - Y_hat)).mean()

        loss += 0.5 * (self.reg_strength / self.m) * np.sum(self.w**2)

        return loss

defining the 'man of the match': the random search code snippet is shown as follows

Its a pretty simple implementation

Detailed comments have been provided in this code cell to make it crystal clear

In [35]:
    """
    Parameters of the function:
        X_train ndarray: Training feature data.
        Y_train ndarray: Training label data.
        X_val ndarray: Validation/test feature data.
        Y_val ndarray: Validation/testing label data.
        hyperparameters dict: Dictionary containing hyperparameter names as keys and their possible values as lists.
        num_combinations (int, optional): Number of random hyperparameter combinations to try. Default is 10.

    Return type of the function:
        tuple: A tuple containing the best set of hyperparameters and the corresponding accuracy on the validation set.
    """

def random_search(X_train, Y_train, X_val, Y_val, hyperparameters, num_combinations=10):
    best_hyperparams = None
    best_accuracy = -np.inf
    
    # Iterate over num_combinations random hyperparameter combinations
    for _ in range(num_combinations):
        # Randomly sample hyperparameters from the hyperparameter space
        params = {k: np.random.choice(v) for k, v in hyperparameters.items()}
        
        # Create a logistic regression model with the sampled hyperparameters
        model = Logistic_Regression(**params)
        
        # Train the model on the training dataset
        model.fit(X_train, Y_train, X_val, Y_val)
        
        # Evaluate the model's accuracy on the validation dataset
        accuracy = np.mean(model.predict(X_val) == Y_val)
        
        # Update the best hyperparameters and accuracy if the current model performs better
        if accuracy > best_accuracy:
            best_accuracy = accuracy
            best_hyperparams = params

    return best_hyperparams, best_accuracy

This point onwards the code is pretty simple and straightforward

In [36]:
# Load the train dataset
train_dataset = pd.read_csv('ds1_train.csv')
train_dataset.head()

# Extract the feature columns (X) and the output label column (Y)
X_train = train_dataset[['x_1', 'x_2']].values
Y_train = train_dataset['y'].values

Note: Unlike last time in the scratch file, here i have normalised using standard deviation, for better normalisation

In [37]:
# Normalize the features using standardization 
# standardization is normalisation with use of standard deviation and not Xmax and Xmin in the denominator
def standardize_features(X):
    X_mean = np.mean(X, axis=0)
    X_std = np.std(X, axis=0)
    X_standardized = (X - X_mean) / X_std
    return X_standardized

X_train = standardize_features(X_train)

In [38]:
# Load the validation dataset 
val_dataset = pd.read_csv('ds1_test.csv')
X_val = val_dataset[['x_1', 'x_2']].values
Y_val = val_dataset['y'].values

# Normalize the features for validation data using the same standardization parameters from training data
X_val = standardize_features(X_val)

The code block below is the main block where all the magic happens

Hyperparameter choosable values are set which are then passed to random search to find their possible best combination

Then that combination gets fit to the model at the end

In [39]:
# Hyperparameters to tune being set to be chosen by the random search model
hyperparameters = {
    'learning_rate': [0.01, 0.1,1],
    'iter_num': [20000, 25000,10000,100000,50000],
    'reg_strength': [0, 0.01, 0.1, 1],
    'early_stopping': [False,True],
    'weight_init': ['zeros', 'random']
}

# Perform random search to find the best hyperparameters
best_hyperparams, best_accuracy = random_search(X_train, Y_train, X_val, Y_val, hyperparameters, num_combinations=20)

# we have set a big variety of different hyperparameter values to check for random search
# also we have chosen 20 combinations to be random searched through
# hence the model takes some time to train

print("Best Hyperparameters:", best_hyperparams)
print("Best Accuracy:", best_accuracy)

# Fit the model with the best hyperparameters on the entire training dataset
best_model = Logistic_Regression(**best_hyperparams)
best_model.fit(X_train, Y_train)


Best Hyperparameters: {'learning_rate': 0.01, 'iter_num': 50000, 'reg_strength': 1.0, 'early_stopping': True, 'weight_init': 'random'}
Best Accuracy: 0.85


This is the best output as you can see: 

"Best Hyperparameters: {'learning_rate': 0.01, 'iter_num': 50000, 'reg_strength': 1.0, 'early_stopping': True, 'weight_init': 'random'}
Best Accuracy: 0.85 "

Hence a 2% increase has occured in the linear decision boundary model after applying random search and tuning the hyperparameters

Quite a lot of reasons have caused the increment to be not quite big. Some of them which I suspect are the case here:

* the amount of training data is not big enough.
* avoiding using feature engineering to increase accuracy as it technically is not a hyperparameter, but rather a technique to enhance the model by adding more levels of complexity in the decision boundary and hence in the overall model. Also utilizing polynomial logistic regression is not so easy as there is a risk of overfitting quite easily. Hence proper preprocessing, and technique usage like regularisation has to be executed in order to ensure that technique's success.
* also the model is a simple logistic regression which isnt the best classifier model out there
* the accuracy for the simple model was already near its peak value due to me manually setting good values by hit and trial to get best output in the scratch code, and so not much increment could be achieved due to the sheer less room left for improvement
* I am not a professional, as I am also currently learning and am fairly new to the topics of ML and DL, so I am not qualified enough or skillful enough to write better code than the one I am currently providing

### Note:

Since the random search uses a different seed each time it's code block is run, there was one instance where the accuracy i achieved was 89% with certain parameter combinations. but i didnt copy paste that output. Hence after realising that everytime randomsearch will give different best outputs, I decided to just copy paste the best output I currently got, as shown in the markdown cells above.

# Thank you

---