# Logistic Regression with Implementation

## Introduction

Logistic regression is one of the most fundamental machine learning models for binary classification. I will summarize its methodology and implement it from scratch using NumPy.

The problem we solve is **binary classification,** for example, the doctor would like to base on patients's features, including mean radius, mean texture, etc, to classify breat cancer into one of the following two case:

- "malignant":  ùë¶=1 
- "benign":  ùë¶=0 

which correspond to serious and gentle case respectively.

We will load the breast cancer data from scikit-learn as a toy dataset, and split the data into the training and test datasets.

## Logistic Regression Model

[To be continued.]

## Numpy Implementation of Logistic Regression

In [2]:
import random
import numpy as np

# PyTorch imports.
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

# TensorFlow import.
import tensorflow as tf

In [3]:
np.random.seed(71)

In [4]:
class LogisticRegression(object):
    """Numpy implementation of Logistic Regression."""

    def __init__(self, batch_size=64, lr=0.01, n_epochs=1000):
        self.batch_size = batch_size
        self.lr = lr
        self.n_epochs = n_epochs

    def get_data(self, X_train, y_train, shuffle=True):
        """Get dataset and information."""
        self.X_train = X_train
        self.y_train = y_train

        # Get the numbers of examples and inputs.
        self.n_examples, self.n_inputs = self.X_train.shape

        if shuffle:
            idx = list(range(self.n_examples))
            random.shuffle(idx)
            self.X_train = self.X_train[idx]
            self.y_train = self.y_train[idx]

    def _create_weights(self):
        """Create model weights and bias."""
        self.w = np.zeros(self.n_inputs).reshape(self.n_inputs, 1)
        self.b = np.zeros(1).reshape(1, 1)

    def _logit(self, X):
        """Logit: unnormalized log probability."""
        return np.matmul(X, self.w) + self.b

    def _sigmoid(self, logit):
        """Sigmoid function by stabilization trick.

        sigmoid(z) = 1 / (1 + exp(-z)) 
                   = exp(z) / (1 + exp(z)) * exp(z_max) / exp(z_max)
                   = exp(z - z_max) / (exp(-z_max) + exp(z - z_max)),
        where z is the logit, and z_max = z - max(0, z).
        """
        logit_max = np.maximum(0, logit)
        logit_stable = logit - logit_max
        return np.exp(logit_stable) / (np.exp(-logit_max) + np.exp(logit_stable))
    
    def _model(self, X):
        """Logistic regression model."""
        logit = self._logit(X)
        return self._sigmoid(logit)

    def _loss(self, y, logit):
        """Cross entropy loss by stabilizaiton trick.

        cross_entropy_loss(y, z) 
          = - 1/n * \sum_{i=1}^n y_i * log p(y_i = 1|x_i) + (1 - y_i) * log p(y_i = 0|x_i)
          = - 1/n * \sum_{i=1}^n y_i * (z_i - log(1 + exp(z_i))) + (1 - y_i) * (-log(1 + exp(z_i))),
        where z is the logit, z_max = z - max(0, z),
          log p(y = 1|x)
            = log (1 / (1 + exp(-z))) 
            = log (exp(z) / (1 + exp(z)))
            = z - log(1 + exp(z))
        and 
          log(1 + exp(z)) := logsumexp(z)
            = log(exp(0) + exp(z))
            = log(exp(0) + exp(z) * exp(z_max) / exp(z_max))
            = z_max + log(exp(-z_max) + exp(z - z_max)).
        """
        logit_max = np.maximum(0, logit)
        logit_stable = logit - logit_max
        logsumexp_stable = logit_max + np.log(np.exp(-logit_max) + np.exp(logit_stable))
        self.cross_entropy = -(y * (logit - logsumexp_stable) + (1 - y) * (-logsumexp_stable))
        return np.mean(self.cross_entropy)

    def _optimize(self, X, y):
        """Optimize by stochastic gradient descent."""
        m = X.shape[0]

        y_ = self._model(X) 
        dw = 1 / m * np.matmul(X.T, y_ - y)
        db = np.mean(y_ - y)

        for (param, grad) in zip([self.w, self.b], [dw, db]):
            param[:] = param - self.lr * grad

    def _fetch_batch(self):
        """Fetch batch dataset."""
        idx = list(range(self.n_examples))
        for i in range(0, self.n_examples, self.batch_size):
            idx_batch = idx[i:min(i + self.batch_size, self.n_examples)]
            yield (self.X_train.take(idx_batch, axis=0), self.y_train.take(idx_batch, axis=0))

    def fit(self):
        """Fit model."""
        self._create_weights()

        for epoch in range(1, self.n_epochs + 1):
            total_loss = 0
            for X_train_b, y_train_b in self._fetch_batch():
                y_train_b = y_train_b.reshape((y_train_b.shape[0], -1))
                self._optimize(X_train_b, y_train_b)
                train_loss = self._loss(y_train_b, self._logit(X_train_b))
                total_loss += train_loss * X_train_b.shape[0]

            if epoch % 100 == 0:
                print('epoch {0}: training loss {1}'.format(epoch, total_loss / self.n_examples))

        return self

    def get_coeff(self):
        return self.b, self.w.reshape((-1,))

    def predict(self, X):
        return self._model(X).reshape((-1,))

## Data Preparation and Preprocessing

In [5]:
import sklearn
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.linear_model import LogisticRegression as LogisticRegressionSklearn

# https://github.com/bowen0701/machine-learning/blob/master/metrics.py
from metrics import accuracy

In [6]:
%load_ext autoreload
%autoreload 2

In [7]:
# Read breast cancer data.
X, y = load_breast_cancer(return_X_y=True)

In [8]:
X.shape, y.shape

((569, 30), (569,))

In [9]:
X[:3]

array([[1.799e+01, 1.038e+01, 1.228e+02, 1.001e+03, 1.184e-01, 2.776e-01,
        3.001e-01, 1.471e-01, 2.419e-01, 7.871e-02, 1.095e+00, 9.053e-01,
        8.589e+00, 1.534e+02, 6.399e-03, 4.904e-02, 5.373e-02, 1.587e-02,
        3.003e-02, 6.193e-03, 2.538e+01, 1.733e+01, 1.846e+02, 2.019e+03,
        1.622e-01, 6.656e-01, 7.119e-01, 2.654e-01, 4.601e-01, 1.189e-01],
       [2.057e+01, 1.777e+01, 1.329e+02, 1.326e+03, 8.474e-02, 7.864e-02,
        8.690e-02, 7.017e-02, 1.812e-01, 5.667e-02, 5.435e-01, 7.339e-01,
        3.398e+00, 7.408e+01, 5.225e-03, 1.308e-02, 1.860e-02, 1.340e-02,
        1.389e-02, 3.532e-03, 2.499e+01, 2.341e+01, 1.588e+02, 1.956e+03,
        1.238e-01, 1.866e-01, 2.416e-01, 1.860e-01, 2.750e-01, 8.902e-02],
       [1.969e+01, 2.125e+01, 1.300e+02, 1.203e+03, 1.096e-01, 1.599e-01,
        1.974e-01, 1.279e-01, 2.069e-01, 5.999e-02, 7.456e-01, 7.869e-01,
        4.585e+00, 9.403e+01, 6.150e-03, 4.006e-02, 3.832e-02, 2.058e-02,
        2.250e-02, 4.571e-03, 2.357e

In [10]:
y[:3]

array([0, 0, 0])

In [11]:
# Split data into training and test datasets.
X_train_raw, X_test_raw, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=71, shuffle=True, stratify=y)

In [12]:
print(X_train_raw.shape, y_train.shape)
print(X_test_raw.shape, y_test.shape)

(426, 30) (426,)
(143, 30) (143,)


In [45]:
# Feature engineering for standardizing features by min-max scaler.
min_max_scaler = MinMaxScaler()

X_train = min_max_scaler.fit_transform(X_train_raw)
X_test = min_max_scaler.transform(X_test_raw)

In [47]:
# Convert arrays to float32.
X_train, X_test, y_train, y_test = (
    np.float32(X_train), np.float32(X_test), np.float32(y_train), np.float32(y_test))

In [48]:
X_train.dtype, y_train.dtype

(dtype('float32'), dtype('float32'))

## Fitting Logistic Regression in NumPy

In [12]:
# Fit our Logistic Regression.
logreg = LogisticRegression(batch_size=64, lr=1, n_epochs=1000)

In [13]:
# Get datasets and build graph.
logreg.get_data(X_train, y_train, shuffle=True)

In [14]:
logreg.fit()

epoch 100: training loss 0.10471912386231665
epoch 200: training loss 0.08454669375629945
epoch 300: training loss 0.07554912526419114
epoch 400: training loss 0.07015805509606268
epoch 500: training loss 0.06646531267573902
epoch 600: training loss 0.06373787850011073
epoch 700: training loss 0.06162297443169833
epoch 800: training loss 0.05992532367850267
epoch 900: training loss 0.05852633216647383
epoch 1000: training loss 0.057349146010503435


<__main__.LogisticRegression at 0x7f844e186f60>

In [15]:
# Get coefficient.
logreg.get_coeff()

(array([[16.52129462]]),
 array([-1.40672925, -3.62553084, -1.42597638, -2.83856207, -1.53105707,
         1.69560885, -4.60904947, -7.00227206, -1.55716025,  2.80913001,
        -8.67203252, -0.33897257, -6.70854376, -5.02197434,  1.41026049,
         4.40956172,  1.93940382,  0.83835797,  3.44227294,  3.46665622,
        -5.4506232 , -4.78411994, -4.81910325, -5.12896557, -3.27458594,
        -0.90580351, -4.48998728, -4.91530489, -3.62362622, -1.15632904]))

In [16]:
# Predicted probabilities for training data.
p_train_ = logreg.predict(X_train)
p_train_[:10]

array([9.94886718e-01, 6.66396278e-13, 7.77193847e-05, 9.98286083e-01,
       9.99107123e-01, 9.99467033e-01, 3.61089969e-03, 1.01959963e-03,
       9.99965184e-01, 1.18665034e-02])

In [17]:
# Predicted labels for training data.
y_train_ = (p_train_ > 0.5) * 1
y_train_[:3]

array([1, 0, 0])

In [18]:
# Prediction accuracy for training data.
accuracy(y_train, y_train_)

0.9882629107981221

In [19]:
# Predicted label correctness for test data.
p_test_ = logreg.predict(X_test)
print(p_test_[:10])
y_test_ = (p_test_ > 0.5) * 1

[1.52413246e-04 9.99934493e-01 2.51019009e-03 9.97170130e-01
 4.25204590e-05 9.99858929e-01 2.60675238e-06 9.99999604e-01
 4.53260387e-01 4.70845180e-09]


In [20]:
# Prediction accuracy for test data.
accuracy(y_test, y_test_)

0.972027972027972

## PyTorch Implementation of Logistic Regression

In [64]:
class LogisticRegressionTorch(nn.Module):
    """PyTorch implementation of Logistic Regression."""

    def __init__(self, batch_size=64, lr=0.01, n_epochs=1000):
        super(LogisticRegressionTorch, self).__init__()
        self.batch_size = batch_size
        self.lr = lr
        self.n_epochs = n_epochs

    def get_data(self, X_train, y_train, shuffle=True):
        """Get dataset and information."""
        self.X_train = X_train
        self.y_train = y_train

        # Get the numbers of examples and inputs.
        self.n_examples, self.n_inputs = self.X_train.shape

        if shuffle:
            idx = list(range(self.n_examples))
            random.shuffle(idx)
            self.X_train = self.X_train[idx]
            self.y_train = self.y_train[idx]

    def _create_model(self):
        """Create logistic regression model."""
        self.model = nn.Sequential(
            nn.Linear(self.n_inputs, 1),
            nn.Sigmoid(),
        )

    def forward(self, x):
        y = self.model(x)
        return y

    def _create_loss(self):
        """Create (binary) cross entropy loss."""
        self.criterion = nn.BCELoss()

    def _create_optimizer(self):
        """Create optimizer by stochastic gradient descent."""
        self.optimizer = optim.SGD(self.model.parameters(), lr=self.lr)

    def build_graph(self):
        """Build computational graph."""
        self._create_model()
        self._create_loss()
        self._create_optimizer()

    def _fetch_batch(self):
        """Fetch batch dataset."""
        idx = list(range(self.n_examples))
        for i in range(0, self.n_examples, self.batch_size):
            idx_batch = idx[i:min(i + self.batch_size, self.n_examples)]
            yield (self.X_train.take(idx_batch, axis=0), 
                   self.y_train.take(idx_batch, axis=0))

    def fit(self):
        """Fit model."""
        for epoch in range(1, self.n_epochs + 1):
            total_loss = 0
            for X_train_b, y_train_b in self._fetch_batch():
                # Convert to Tensor from NumPy array and reshape ys.
                X_train_b, y_train_b = (
                    torch.from_numpy(X_train_b), 
                    torch.from_numpy(y_train_b).view(-1, 1))

                y_pred_b = self.model(X_train_b)
                batch_loss = self.criterion(y_pred_b, y_train_b)
                total_loss += batch_loss * X_train_b.shape[0]

                # Zero grads, performs backward pass, and update weights.
                self.optimizer.zero_grad()
                batch_loss.backward()
                self.optimizer.step()

            if epoch % 100 == 0:
                print('Epoch {0}: training loss: {1}'
                      .format(epoch, total_loss / self.n_examples))

    def get_coeff(self):
        """Get model coefficients."""
        # Detach var which require grad.
        return (self.model[0].bias.detach().numpy(),
                self.model[0].weight.detach().numpy())

    def predict(self, X):
        """Predict for new data."""
        with torch.no_grad():
            X_ = torch.from_numpy(X)
            return self.model(X_).numpy().reshape((-1,))

## Fitting Logistic Regression in PyTorch

In [77]:
# Fit PyTorch Logistic Regression.
logreg_torch = LogisticRegressionTorch(batch_size=64, lr=0.5, n_epochs=1000)

In [78]:
logreg_torch.get_data(X_train, y_train, shuffle=True)

In [79]:
logreg_torch.build_graph()

In [80]:
logreg_torch.model

Sequential(
  (0): Linear(in_features=30, out_features=1, bias=True)
  (1): Sigmoid()
)

In [81]:
logreg_torch.fit()

Epoch 100: training loss: 0.1344248503446579
Epoch 200: training loss: 0.10645044595003128
Epoch 300: training loss: 0.09378331899642944
Epoch 400: training loss: 0.08620325475931168
Epoch 500: training loss: 0.08102358877658844
Epoch 600: training loss: 0.07719012349843979
Epoch 700: training loss: 0.07419849187135696
Epoch 800: training loss: 0.07177480310201645
Epoch 900: training loss: 0.06975651532411575
Epoch 1000: training loss: 0.06804037094116211


In [82]:
# Get coefficient.
logreg_torch.get_coeff()

(array([13.545173], dtype=float32),
 array([[-1.3439738 , -2.3745942 , -1.2252212 , -2.306892  , -1.0910149 ,
          0.473653  , -3.6592252 , -5.858444  , -0.9620514 ,  2.6239612 ,
         -5.7088504 , -0.78457713, -4.560005  , -3.5590298 ,  1.3276924 ,
          3.3612664 ,  1.8847641 ,  0.8996072 ,  2.5249946 ,  2.2905066 ,
         -4.4273787 , -4.0534863 , -4.046817  , -4.318798  , -2.7103667 ,
         -0.9162895 , -2.9415145 , -5.005501  , -2.149271  , -0.6811218 ]],
       dtype=float32))

In [83]:
# Predicted probabilities for training data.
p_train_ = logreg_torch.predict(X_train)
p_train_[:10]

array([9.8857111e-01, 7.6673147e-11, 3.9149617e-04, 9.9429590e-01,
       9.9696511e-01, 9.9811780e-01, 8.3037931e-03, 3.4119210e-03,
       9.9978751e-01, 2.1330371e-02], dtype=float32)

In [84]:
# Predicted labels for training data.
y_train_ = (p_train_ > 0.5) * 1
y_train_[:3]

array([1, 0, 0])

In [85]:
# Prediction accuracy for training data.
accuracy(y_train, y_train_)

0.9835680751173709

In [86]:
# Predicted label correctness for test data.
p_test_ = logreg_torch.predict(X_test)
print(p_test_[:10])
y_test_ = (p_test_ > 0.5) * 1

[7.6844194e-04 9.9963987e-01 1.4914670e-02 9.9478203e-01 3.2207184e-04
 9.9945217e-01 3.5559508e-05 9.9999511e-01 5.2174652e-01 1.5732945e-07]


In [87]:
# Prediction accuracy for test data.
accuracy(y_test, y_test_)

0.972027972027972

## TensorFlow Implementation of Logistic Regression

In [21]:
def reset_tf_graph(seed=71):
    """Reset default TensorFlow graph."""
    tf.reset_default_graph()
    tf.set_random_seed(seed)
    np.random.seed(seed)


class LogisticRegressionTF(object):
    """A TensorFlow implementation of Logistic Regression."""

    def __init__(self, batch_size=64, learning_rate=0.01, n_epochs=1000):
        self.batch_size = batch_size
        self.n_epochs = n_epochs
        self.learning_rate = learning_rate

    def get_data(self, X_train, y_train, shuffle=True):
        """Get dataset and information."""
        self.X_train = X_train
        self.y_train = y_train

        # Get the numbers of examples and inputs.
        self.n_examples, self.n_inputs = self.X_train.shape

        idx = list(range(self.n_examples))
        if shuffle:
            random.shuffle(idx)
        self.X_train = self.X_train[idx]
        self.y_train = self.y_train[idx]

    def _create_placeholders(self):
        """Create placeholder for features and labels."""
        self.X = tf.placeholder(tf.float32, shape=(None, self.n_inputs), name='X')
        self.y = tf.placeholder(tf.float32, shape=(None, 1), name='y')

    def _create_weights(self):
        """Create and initialize model weights and bias."""
        self.w = tf.get_variable(shape=[self.n_inputs, 1],
                                 initializer=tf.random_normal_initializer(),
                                 name='weights')
        self.b = tf.get_variable(shape=[1],
                                 initializer=tf.zeros_initializer(),
                                 name='bias')

    def _logit(self, X):
        """Logit: unnormalized log probability."""
        return tf.matmul(X, self.w) + self.b

    def _model(self, X):
        """Logistic regression model."""
        logits = self._logit(X)
        return tf.math.sigmoid(logits)

    def _create_model(self):
        # Create logistic regression model.
        self.logits = self._logit(self.X)

    def _create_loss(self):
        # Create cross entropy loss.
        self.cross_entropy = tf.nn.sigmoid_cross_entropy_with_logits(
            labels=self.y,
            logits=self.logits,
            name='cross_entropy')
        self.loss = tf.reduce_mean(self.cross_entropy, name='loss')

    def _create_optimizer(self):
        # Create gradient descent optimization.
        self.optimizer = (
            tf.train.GradientDescentOptimizer(learning_rate=self.learning_rate)
            .minimize(self.loss))

    def build_graph(self):
        """Build computational graph."""
        self._create_placeholders()
        self._create_weights()
        self._create_model()
        self._create_loss()
        self._create_optimizer()

    def _fetch_batch(self): 
        """Fetch batch dataset."""
        idx = list(range(self.n_examples))
        for i in range(0, self.n_examples, self.batch_size):
            idx_batch = idx[i:min(i + self.batch_size, self.n_examples)]
            yield (self.X_train[idx_batch, :], self.y_train[idx_batch].reshape(-1, 1))

    def fit(self):
        """Fit model."""
        saver = tf.train.Saver()

        with tf.Session() as sess:            
            sess.run(tf.global_variables_initializer())

            for epoch in range(1, self.n_epochs + 1):
                total_loss = 0
                for X_train_b, y_train_b in self._fetch_batch():
                    feed_dict = {self.X: X_train_b, self.y: y_train_b}
                    _, batch_loss = sess.run([self.optimizer, self.loss],
                                             feed_dict=feed_dict)
                    total_loss += batch_loss * X_train_b.shape[0]

                if epoch % 100 == 0:
                    print('Epoch {0}: training loss: {1}'
                          .format(epoch, total_loss / self.n_examples))

            # Save model.
            saver.save(sess, 'checkpoints/logreg')

    def get_coeff(self):
        with tf.Session() as sess:
            sess.run(tf.global_variables_initializer())
            # Load model.
            saver = tf.train.Saver()
            saver.restore(sess, 'checkpoints/logreg')
            return self.b.eval(), self.w.eval().reshape((-1,))

    def predict(self, X):
        with tf.Session() as sess:
            sess.run(tf.global_variables_initializer())
            # Load model.
            saver = tf.train.Saver()
            saver.restore(sess, 'checkpoints/logreg')
            return self._model(X).eval().reshape((-1,))

## Fitting Logistic Regression in TensorFlow

In [22]:
reset_tf_graph()
logreg_tf = LogisticRegressionTF(batch_size=64, learning_rate=0.5, n_epochs=1000)

In [23]:
logreg_tf.get_data(X_train, y_train, shuffle=True)

In [24]:
logreg_tf.build_graph()

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


In [25]:
logreg_tf.fit()

Epoch 100: training loss: 0.13793940541330077
Epoch 200: training loss: 0.10872595124121563
Epoch 300: training loss: 0.09537097629806805
Epoch 400: training loss: 0.08737332209455015
Epoch 500: training loss: 0.08191518744392574
Epoch 600: training loss: 0.07788145919920693
Epoch 700: training loss: 0.07473775576537764
Epoch 800: training loss: 0.07219409795714096
Epoch 900: training loss: 0.07007863250136935
Epoch 1000: training loss: 0.06828221622767024


In [26]:
logreg_tf.get_coeff()

INFO:tensorflow:Restoring parameters from checkpoints/logreg


(array([13.305871], dtype=float32),
 array([-0.48871428, -3.111531  , -1.0498405 , -1.9807501 , -0.7382942 ,
         0.17857291, -3.6536336 , -7.2938232 , -1.1253657 ,  2.6964319 ,
        -4.308619  , -0.7267858 , -5.074624  , -3.93839   ,  0.9107544 ,
         3.1807373 ,  2.4267256 ,  1.1122545 ,  2.5476625 ,  2.9644372 ,
        -4.529931  , -3.8162556 , -4.2280235 , -4.247452  , -2.2962825 ,
        -0.90618277, -3.2818813 , -4.6058207 , -2.6785274 , -0.06868351],
       dtype=float32))

In [27]:
# Predicted probabilities for training data.
p_train_ = logreg_tf.predict((tf.cast(X_train, dtype=tf.float32)))
print(p_train_[:10])

# Predicted labels for training data.
y_train_ = (p_train_ > 0.5) * 1
print(y_train_[:10])

# Prediction accuracy for training data.
accuracy(y_train, y_train_)

INFO:tensorflow:Restoring parameters from checkpoints/logreg
[9.8821372e-01 0.0000000e+00 4.0054321e-04 9.9521744e-01 9.9722141e-01
 9.9857461e-01 7.6086819e-03 3.6685169e-03 9.9977803e-01 2.2061288e-02]
[1 0 0 1 1 1 0 0 1 0]


0.9788732394366197

In [28]:
# Predicted probabilities for test data.
p_test_ = logreg_tf.predict((tf.cast(X_test, dtype=tf.float32)))
print(p_test_[:10])

# Predicted labels for training data.
y_test_ = (p_test_ > 0.5) * 1
y_test_[:3]

# Prediction accuracy for training data.
accuracy(y_test, y_test_)

INFO:tensorflow:Restoring parameters from checkpoints/logreg
[1.1076927e-03 9.9964797e-01 1.5742362e-02 9.9530858e-01 3.5384297e-04
 9.9942517e-01 3.5881996e-05 9.9999422e-01 4.2618003e-01 1.7881393e-07]


0.965034965034965

## Benchmark with Sklearn's Logistic Regression

In [29]:
# Fit sklearn's Logistic Regression.
logreg_sk = LogisticRegressionSklearn(C=1e4, solver='lbfgs', max_iter=500)

logreg_sk.fit(X_train, y_train)

LogisticRegression(C=10000.0, max_iter=500)

In [30]:
# Get coefficients.
logreg_sk.intercept_, logreg_sk.coef_

(array([56.06250509]),
 array([[  53.5460616 ,  -27.2575739 ,   48.30697654,   10.5636878 ,
          -14.75837806,   98.5009966 ,  -52.51936527,  -52.16906591,
           -5.08742246,  -53.96348797,  -33.97198842,   -5.48905184,
          -19.38885928,  -43.89981909,   38.75665922,  -51.43678914,
           83.21007672,  -21.89925037,   14.96797392,   79.99757062,
          -59.04206865,   -3.91791317,  -63.58395555, -103.96747709,
           -7.9699581 ,   20.04904076,  -21.96650031,  -21.30939901,
          -21.55187209,  -11.69936363]]))

In [31]:
# Predicted labels for training data.
p_train_ = logreg_sk.predict(X_train)
p_train_[:3]

array([1, 0, 0])

In [32]:
y_train_ = (p_train_ > 0.5) * 1

In [33]:
# Predicted label correctness for training data.
# y_pred_train == y_train

In [34]:
# Prediction accuracy for training data.
accuracy(y_train, y_train_)

1.0

In [35]:
# Predicted label correctness for test data.
p_test_ = logreg_sk.predict(X_test)
y_test_ = (p_test_ > 0.5) * 1

In [36]:
# # Prediction accuracy for test data.
accuracy(y_test, y_test_)

0.965034965034965