## Logistic Regression

In [None]:
from sklearn.linear_model import LogisticRegression
from libs.dataloader import load_and_split_data
from libs.utils import find_optimal_hyperparameters, load_model_from_json, fit_and_evaluate

### Load and split the dataset

In [2]:
X_train, X_test, y_train, y_test = load_and_split_data('data/training_data_preprocessed.csv',
                                                       target_column='increase_stock',
                                                       class_zero='low_bike_demand',
                                                       test_size=0.2,
                                                       random_state=0)

### Load, fit and evaluate the initial model

In [3]:
model = LogisticRegression(max_iter=5000)

results = fit_and_evaluate(model,
                           X_train,
                           y_train,
                           X_test,
                           y_test,
                           verbose=True)

Evaluating LogisticRegression
Accuracy: 0.8625
Precision: 0.6400
Recall: 0.5517
F1: 0.5926
ROC AUC: 0.8952
Confusion Matrix: 
[[244  18]
 [ 26  32]]



We evaluate the model on precision because misclassifying high_bike_demand as low_bike_demand is more critical than the reverse. Predicting low_bike_demand during high_bike_demand results in not having enough bikes available for all the users, whereas overestimating demand simply leads to surplus bikes, which is less disruptive for users.

### Find optimal hyperparameters

In [4]:
# Define the hyperparameter grid
param_grid = {
    'C': [0.01, 0.1, 1, 10, 100],   # Regularization strength
    'penalty': ['l1', 'l2'],        # Regularization type
    'solver': ['liblinear', 'saga'] # Solver options compatible with L1/L2
}

find_optimal_hyperparameters(LogisticRegression,
                             param_grid,
                             X_train,
                             y_train,
                             cv=5,
                             scoring='accuracy',
                             save_dir='output/best_params',
                             save_file='logreg_best_params.json')

Best parameters found:  {'C': 100, 'penalty': 'l1', 'solver': 'liblinear'}
Saving best parameters to 'output/best_params/logreg_best_params.json'


{'C': 100, 'penalty': 'l1', 'solver': 'liblinear'}

### Use optimal hyperparameters to train and evaluate

In [5]:
opt_model = load_model_from_json(LogisticRegression, 'output/best_params/logreg_best_params.json')

opt_results = fit_and_evaluate(opt_model, 
                               X_train, 
                               y_train, 
                               X_test, 
                               y_test,
                               verbose=True)

Evaluating LogisticRegression
Accuracy: 0.8688
Precision: 0.6538
Recall: 0.5862
F1: 0.6182
ROC AUC: 0.9025
Confusion Matrix: 
[[244  18]
 [ 24  34]]

