# Logistic Regression

## Training a Binary Classifier

Problem: You need to train a simple classifier model.

In [1]:
# Load libraries
from sklearn.linear_model import LogisticRegression
from sklearn import datasets
from sklearn.preprocessing import StandardScaler

In [2]:
# Load data with only two classes
iris = datasets.load_iris()
features = iris.data[:100,:]
target = iris.target[:100]

In [4]:
# Standardize features
scaler = StandardScaler()
features_standardized = scaler.fit_transform(features)

# Create logistic regression object
logistic_regression = LogisticRegression(solver='lbfgs', random_state=0)

# Train model
model = logistic_regression.fit(features_standardized, target)

In [5]:
model.C

1.0

In [6]:
model.classes_

array([0, 1])

In [7]:
model.coef_

array([[ 0.82463184, -1.15668147,  1.52895672,  1.53842497]])

In [8]:
features.shape

(100, 4)

In [9]:
model.intercept_

array([0.16650499])

In [46]:
# Create new observation
new_observation = [[.5, .5, .5, .5]]

# Predict class
model.predict(new_observation)

array([1])

In [47]:
# View predicted probabilities
model.predict_proba(new_observation)

array([[0.17738424, 0.82261576]])

Our observation had an 17.7% chance of being class 0 and 82.2% chance of being class 1

In [40]:
import numpy as np

np.arange(0,30,.1)

array([ 0. ,  0.1,  0.2,  0.3,  0.4,  0.5,  0.6,  0.7,  0.8,  0.9,  1. ,
        1.1,  1.2,  1.3,  1.4,  1.5,  1.6,  1.7,  1.8,  1.9,  2. ,  2.1,
        2.2,  2.3,  2.4,  2.5,  2.6,  2.7,  2.8,  2.9,  3. ,  3.1,  3.2,
        3.3,  3.4,  3.5,  3.6,  3.7,  3.8,  3.9,  4. ,  4.1,  4.2,  4.3,
        4.4,  4.5,  4.6,  4.7,  4.8,  4.9,  5. ,  5.1,  5.2,  5.3,  5.4,
        5.5,  5.6,  5.7,  5.8,  5.9,  6. ,  6.1,  6.2,  6.3,  6.4,  6.5,
        6.6,  6.7,  6.8,  6.9,  7. ,  7.1,  7.2,  7.3,  7.4,  7.5,  7.6,
        7.7,  7.8,  7.9,  8. ,  8.1,  8.2,  8.3,  8.4,  8.5,  8.6,  8.7,
        8.8,  8.9,  9. ,  9.1,  9.2,  9.3,  9.4,  9.5,  9.6,  9.7,  9.8,
        9.9, 10. , 10.1, 10.2, 10.3, 10.4, 10.5, 10.6, 10.7, 10.8, 10.9,
       11. , 11.1, 11.2, 11.3, 11.4, 11.5, 11.6, 11.7, 11.8, 11.9, 12. ,
       12.1, 12.2, 12.3, 12.4, 12.5, 12.6, 12.7, 12.8, 12.9, 13. , 13.1,
       13.2, 13.3, 13.4, 13.5, 13.6, 13.7, 13.8, 13.9, 14. , 14.1, 14.2,
       14.3, 14.4, 14.5, 14.6, 14.7, 14.8, 14.9, 15

In [42]:
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [43]:
np.logspace(-3, 2, 6)

array([1.e-03, 1.e-02, 1.e-01, 1.e+00, 1.e+01, 1.e+02])

In [44]:
np.logspace(-7, -2, 6)

array([1.e-07, 1.e-06, 1.e-05, 1.e-04, 1.e-03, 1.e-02])

In [45]:
np.linspace(1, 2, 6)

array([1. , 1.2, 1.4, 1.6, 1.8, 2. ])

## Training a Multiclass Classifier

Problem : Given more than two classes, you need to train a classifier model.

Use LogisticRegression with one-vs-rest or multinomial methods.

In [49]:
# Create one-vs-rest logistic regression object
logistic_regression = LogisticRegression(random_state=0, solver='lbfgs', multi_class="ovr") # by default!!!

# Train model
model = logistic_regression.fit(features_standardized, target)

When using LogisticRegression we can select which of the two techniques we want, with OVR, `ovr`, being the default argument. We can switch to an MNL by setting the argument to `multinomial`.

## Reducing Variance Through Regularization

Problem: You need to reduce the variance of your logistic regression model.

In [51]:
from sklearn.linear_model import LogisticRegressionCV

# Create decision tree classifier object
logistic_regression = LogisticRegressionCV(penalty='l2', 
                                           Cs=10, 
                                           random_state=0, 
                                           n_jobs=-1)

# Train model
model = logistic_regression.fit(features_standardized, target)



Higher values of α increase the penalty for larger parameter values (i.e., more complex models)

## Training a Classifier on Very Large Data

Problem: You need to train a simple classifier model on a very large set of data.

LogisticRegression using the stochastic average gradient (SAG) solver:

In [52]:
# Create logistic regression object
logistic_regression = LogisticRegression(random_state=0, solver="sag")

# Train model
model = logistic_regression.fit(features_standardized, target)

## Handling Imbalanced Classes

Problem: You need to train a simple classifier model.

In [56]:
# Make class highly imbalanced by removing first 40 observations
features = features[40:,:]
target = target[40:]

# Create target vector indicating if class 0, otherwise 1
target = np.where((target == 0), 0, 1)

# Standardize features
scaler = StandardScaler()
features_standardized = scaler.fit_transform(features)

# Create decision tree classifier object
logistic_regression = LogisticRegression(random_state=0, class_weight="balanced")

# Train model
model = logistic_regression.fit(features_standardized, target)



In [57]:
features.shape

(60, 4)

In [76]:
target

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

In [75]:
np.count_nonzero(target == 1)

50

In [79]:
(target == 0).sum()

10