- Logistic regression is a statistical model that in its basic form uses a logistic function to model a binary dependent variable, although many more complex extensions exist. In regression analysis, logistic regression (or logit regression) is estimating the parameters of a logistic model (a form of binary regression).

****

# Making Logistic Regression Predictions

            y = 1.0/(1.0+e^-β0-βjxi)

β0 is the intercept term.

β1 is the coefficient for xi.

y is the predicted output with real value between 0 and 1. to convert this to binary output of 0 or 1, this would either need to be rounded to an integer value or a cutoff point be provided to specify  the class segregation point.

In [4]:
dataset = [[-2.0011, 0],
           [-1.4654, 0],
           [0.0965, 0],
           [1.3881, 0],
           [3.0641, 0],
           [7.6275, 1],
           [5.3324, 1],
           [6.9225, 1],
           [8.6754, 1],
           [7.6737, 1]]

# Using Scikit Learn to Estimate Coefficients 

In [7]:
from sklearn.linear_model import LogisticRegression
import numpy as np

In [8]:
dataset

[[-2.0011, 0],
 [-1.4654, 0],
 [0.0965, 0],
 [1.3881, 0],
 [3.0641, 0],
 [7.6275, 1],
 [5.3324, 1],
 [6.9225, 1],
 [8.6754, 1],
 [7.6737, 1]]

In [9]:
X = np.array(dataset)[:, 0:1]
y = np.array(dataset)[:, 1]

In [10]:
X

array([[-2.0011],
       [-1.4654],
       [ 0.0965],
       [ 1.3881],
       [ 3.0641],
       [ 7.6275],
       [ 5.3324],
       [ 6.9225],
       [ 8.6754],
       [ 7.6737]])

In [13]:
clf_LR = LogisticRegression(C=1.0, penalty='l2', tol=0.0001, solver='lbfgs')

In [14]:
clf_LR.fit(X, y)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='lbfgs', tol=0.0001, verbose=0,
                   warm_start=False)

In [17]:
y_pred = clf_LR.predict(X)
clf_LR.predict(X)

array([0., 0., 0., 0., 0., 1., 1., 1., 1., 1.])

In [16]:
clf_LR.predict_proba(X)

array([[0.99853453, 0.00146547],
       [0.99740804, 0.00259196],
       [0.98643844, 0.01356156],
       [0.94830288, 0.05169712],
       [0.75430207, 0.24569793],
       [0.02307807, 0.97692193],
       [0.21456501, 0.78543499],
       [0.04771639, 0.95228361],
       [0.00766657, 0.99233343],
       [0.02199284, 0.97800716]])

In [18]:
np.column_stack((y_pred, y))

array([[0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.]])