# Logistic Regression

Logistic regression fits a logistic model to data and makes predictions about the probability of an 
event (between 0 and 1). 

This recipe shows the fitting of a logistic regression algorithm to the iris dataset. Because this is 
a mutliclass classification problem and logistic regression makes predictions between 0 and 1, a 
one­vs­all scheme is used (one model is created per class).

In [2]:
from sklearn import datasets 
from sklearn import metrics 
from sklearn.linear_model import LogisticRegression

Load the Iris dataset

Iris flower dataset (4x150, reals, multi-label classification)

    sepal length in cm
    sepal width in cm
    petal length in cm
    petal width in cm
    class: -- Iris Setosa = 0 -- Iris Versicolour = 1 -- Iris Virginica = 2

In [3]:
dataset = datasets.load_iris()
print dataset.data[0:10,]
print dataset.target[0:10,]

[[ 5.1  3.5  1.4  0.2]
 [ 4.9  3.   1.4  0.2]
 [ 4.7  3.2  1.3  0.2]
 [ 4.6  3.1  1.5  0.2]
 [ 5.   3.6  1.4  0.2]
 [ 5.4  3.9  1.7  0.4]
 [ 4.6  3.4  1.4  0.3]
 [ 5.   3.4  1.5  0.2]
 [ 4.4  2.9  1.4  0.2]
 [ 4.9  3.1  1.5  0.1]]
[0 0 0 0 0 0 0 0 0 0]


Fit a logistic regression model to the data 

In [4]:
model = LogisticRegression() 
model.fit(dataset.data, dataset.target) 
print(model)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)


Make predictions 

In [5]:
expected = dataset.target 
predicted = model.predict(dataset.data)

Summarize the fit of the model 

In [39]:
print(metrics.classification_report(expected, predicted)) 
print(metrics.confusion_matrix(expected, predicted))

             precision    recall  f1-score   support

          0       1.00      1.00      1.00        50
          1       0.98      0.90      0.94        50
          2       0.91      0.98      0.94        50

avg / total       0.96      0.96      0.96       150

[[50  0  0]
 [ 0 45  5]
 [ 0  1 49]]


The precision is the ratio tp / (tp + fp) where tp is the number of true positives and fp the number of false positives. The precision is intuitively the ability of the classifier not to label as positive a sample that is negative.

The recall is the ratio tp / (tp + fn) where tp is the number of true positives and fn the number of false negatives. The recall is intuitively the ability of the classifier to find all the positive samples.

The F-beta score can be interpreted as a weighted harmonic mean of the precision and recall, where an F-beta score reaches its best value at 1 and worst score at 0.

The F-beta score weights recall more than precision by a factor of beta. beta == 1.0 means recall and precision are equally important.

The support is the number of occurrences of each class in y_true.