# Logistic Regression with Implementation

## 1. Introduction

Logistic regression is one of the most fundamental machine learning models for binary classification. I will summarize its methodology and implement it from scratch using NumPy.

Binary classification
For example, the doctor would like to base on patients's features, including mean radius, mean texture, etc, to classify breat cancer into one of the following two case:

"malignant":  𝑦=1 
"benign":  𝑦=0 
which correspond to serious and gentle case respectively.

We would like to load the breast cancer data from scikit-learn as a toy dataset, and split the data into the training and test datasets.

## 2. Logistic Regression Model

[To be continued.]

## 3. Data Preparation and Preprocessing

In [1]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import numpy as np

import sklearn
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.linear_model import LogisticRegression as LogisticRegressionSklearn

# https://github.com/bowen0701/machine-learning/blob/master/np_logistic_regression.py
from np_logistic_regression import LogisticRegression

# https://github.com/bowen0701/machine-learning/blob/master/np_metrics.py
from np_metrics import accuracy

In [2]:
%load_ext autoreload
%autoreload 2

In [3]:
# Read breast cancer data.
X, y = load_breast_cancer(return_X_y=True)

In [4]:
X.shape, y.shape

((569, 30), (569,))

In [5]:
X[:3]

array([[1.799e+01, 1.038e+01, 1.228e+02, 1.001e+03, 1.184e-01, 2.776e-01,
        3.001e-01, 1.471e-01, 2.419e-01, 7.871e-02, 1.095e+00, 9.053e-01,
        8.589e+00, 1.534e+02, 6.399e-03, 4.904e-02, 5.373e-02, 1.587e-02,
        3.003e-02, 6.193e-03, 2.538e+01, 1.733e+01, 1.846e+02, 2.019e+03,
        1.622e-01, 6.656e-01, 7.119e-01, 2.654e-01, 4.601e-01, 1.189e-01],
       [2.057e+01, 1.777e+01, 1.329e+02, 1.326e+03, 8.474e-02, 7.864e-02,
        8.690e-02, 7.017e-02, 1.812e-01, 5.667e-02, 5.435e-01, 7.339e-01,
        3.398e+00, 7.408e+01, 5.225e-03, 1.308e-02, 1.860e-02, 1.340e-02,
        1.389e-02, 3.532e-03, 2.499e+01, 2.341e+01, 1.588e+02, 1.956e+03,
        1.238e-01, 1.866e-01, 2.416e-01, 1.860e-01, 2.750e-01, 8.902e-02],
       [1.969e+01, 2.125e+01, 1.300e+02, 1.203e+03, 1.096e-01, 1.599e-01,
        1.974e-01, 1.279e-01, 2.069e-01, 5.999e-02, 7.456e-01, 7.869e-01,
        4.585e+00, 9.403e+01, 6.150e-03, 4.006e-02, 3.832e-02, 2.058e-02,
        2.250e-02, 4.571e-03, 2.357e

In [6]:
y[:5]

array([0, 0, 0, 0, 0])

In [7]:
# Split data into training and test datasets.
X_train_raw, X_test_raw, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=71, shuffle=True, stratify=y)

In [8]:
print(X_train_raw.shape, y_train.shape)
print(X_test_raw.shape, y_test.shape)

(426, 30) (426,)
(143, 30) (143,)


In [9]:
# Feature engineering for standardizing features by min-max scaler.
min_max_scaler = MinMaxScaler()

X_train = min_max_scaler.fit_transform(X_train_raw)
X_test = min_max_scaler.transform(X_test_raw)

## 4. Fitting Logistic Regression from Scratch

In [10]:
# Fit our Logistic Regression.
clf = LogisticRegression(batch_size=100, lr=5, n_epochs=100)

clf.fit(X_train, y_train)

epoch 1: loss 1.7624097476465403
epoch 11: loss 0.08531405379884446
epoch 21: loss 0.07546780421763014
epoch 31: loss 0.06061107482508217
epoch 41: loss 0.07066675347132022
epoch 51: loss 0.06624984449545909
epoch 61: loss 0.07946156456165038
epoch 71: loss 0.04244837018695555
epoch 81: loss 0.04599355450292453
epoch 91: loss 0.20816019466056923


<np_logistic_regression.LogisticRegression at 0x7f3e902949b0>

In [11]:
# Get coefficient.
clf.get_coeff()

(array([[12.58238886]]),
 array([-1.44828804, -2.4451783 , -1.54606817, -2.38209972, -0.53998014,
         0.1065096 , -3.59740319, -5.51118006, -0.75081283,  2.81550995,
        -4.66326531, -0.2845859 , -3.66005788, -2.97214053,  1.01134402,
         2.87572042,  1.54450324,  0.94991713,  2.08574918,  2.22362632,
        -4.17370632, -4.0605423 , -3.81021779, -3.88435588, -2.68032624,
        -0.96638358, -2.6955358 , -4.65629093, -2.06641196, -0.60069727]))

In [12]:
# Predicted probabilities for training data.
p_pred_train = clf.predict(X_train)
p_pred_train

array([9.81739936e-01, 1.76616481e-10, 3.40336632e-04, 9.89404781e-01,
       9.94418978e-01, 9.96694044e-01, 6.72956664e-03, 2.70643618e-03,
       9.99589118e-01, 1.62120998e-02, 9.95502282e-01, 9.71949915e-01,
       7.44820761e-01, 2.85686599e-05, 5.11605598e-07, 7.95056599e-03,
       1.24678732e-02, 7.98166225e-07, 9.99651468e-01, 5.50698011e-10,
       9.63022033e-01, 9.98426830e-01, 1.06572570e-08, 8.14663624e-01,
       9.98933453e-01, 1.73137781e-01, 3.26778515e-01, 9.99991195e-01,
       3.98096368e-02, 7.14831513e-01, 1.17517045e-04, 2.95048101e-02,
       4.30810933e-06, 9.99835054e-01, 9.74804842e-02, 2.51839219e-03,
       9.97317151e-02, 9.72539417e-01, 1.27184203e-02, 2.85987048e-02,
       9.68963510e-01, 1.25343284e-02, 9.98863311e-01, 6.40478274e-01,
       1.13779740e-03, 5.84987403e-01, 9.97077869e-01, 1.34710237e-02,
       1.57891711e-05, 8.72927595e-08, 9.77424996e-01, 9.41670357e-01,
       3.85596191e-03, 9.99063057e-01, 2.64019581e-06, 9.94782397e-01,
      

In [13]:
# Predicted labels for training data.
y_pred_train = (p_pred_train > 0.5) * 1
y_pred_train

array([1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1,
       0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1,
       0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1,
       1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1,
       1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1,
       1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1,
       0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1,
       0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1,
       1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1,
       0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1,
       1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1,

In [14]:
# Predicted label correctness for training data.
y_pred_train == y_train

array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True, False,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True, False,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True, False,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,

In [15]:
# Predicted label correctness for test data.
p_pred_test = clf.predict(X_test)
y_pred_test = (p_pred_test > 0.5) * 1
y_pred_test == y_test

array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True, False,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True, False,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True, False,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,

In [16]:
# Prediction accuracy for test data.
accuracy(y_test, y_pred_test)

0.9790209790209791

## 5. Fitting Sklearn's Logistic Regression

In [17]:
# Fit sklearn's Logistic Regression.
clf2 = LogisticRegressionSklearn(C=1e4, solver='lbfgs', max_iter=500)

clf2.fit(X_train, y_train)

LogisticRegression(C=10000.0, class_weight=None, dual=False,
          fit_intercept=True, intercept_scaling=1, max_iter=500,
          multi_class='warn', n_jobs=None, penalty='l2', random_state=None,
          solver='lbfgs', tol=0.0001, verbose=0, warm_start=False)

In [18]:
# Get coefficients.
clf2.intercept_, clf2.coef_

(array([56.05003313]),
 array([[  53.58643008,  -27.26324946,   48.30813261,   10.51429012,
          -14.72140714,   98.48590155,  -52.54093157,  -52.1529564 ,
           -5.09714283,  -53.94912949,  -33.90682397,   -5.47905937,
          -19.38082761,  -44.01004736,   38.82361457,  -51.40837833,
           83.34837589,  -21.99821834,   14.90317319,   80.03225554,
          -59.03462026,   -3.91874979,  -63.60095116, -103.93267556,
           -8.01066718,   20.06509642,  -22.04450625,  -21.2526363 ,
          -21.51142977,  -11.72401291]]))

In [19]:
# Predicted labels for training data.
y_pred_train = clf2.predict(X_train)
y_pred_train

array([1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1,
       0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1,
       0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1,
       1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1,
       1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1,
       1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1,
       0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1,
       0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1,
       1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1,
       1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1,
       0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1,
       1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1,

In [20]:
# Predicted label correctness for training data.
y_pred_train == y_train

array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,

In [21]:
# Predicted label correctness for test data.
y_pred_test = clf2.predict(X_test) 
y_pred_test == y_test

array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True, False,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True, False,  True,  True,  True,  True,
        True, False,  True,  True,  True,  True,  True,  True, False,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True, False,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,

In [22]:
# # Prediction accuracy for test data.
accuracy(y_test, y_pred_test)

0.965034965034965