# Logistic Regression: Application

In this notebook, we examine how to apply logistic regression in Python on a sample dataset containing information on whether an indiviuda made a purchase based on multiple variables.

Sources:
1. <a href='https://www.udemy.com/course/machinelearning/'>Machine Learning A-Z™: Hands-On Python & R In Data Science</a>
2. <a href='https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html'>sklearn.linear_model.LogisticRegression</a>
3. <a href='https://kiwidamien.github.io/are-you-sure-thats-a-probability.html'>Are you sure that's a probability?</a>

In [10]:
# Import analytical libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import style
style.use('ggplot')

# Import machine learning libraries
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression

# Import other support
import os

## Load & Preview Data

In [2]:
# Define purchase data file path
purchase_data_file_path = os.path.join('..', 'Data', 'Social_Network_Ads.csv')

# Load purchase data
purchases = pd.read_csv(purchase_data_file_path)

In [3]:
# Preview data
display(purchases.shape)
display(purchases.head())
display(purchases.describe())
display(purchases.isna().sum())

(400, 3)

Unnamed: 0,Age,EstimatedSalary,Purchased
0,19,19000,0
1,35,20000,0
2,26,43000,0
3,27,57000,0
4,19,76000,0


Unnamed: 0,Age,EstimatedSalary,Purchased
count,400.0,400.0,400.0
mean,37.655,69742.5,0.3575
std,10.482877,34096.960282,0.479864
min,18.0,15000.0,0.0
25%,29.75,43000.0,0.0
50%,37.0,70000.0,0.0
75%,46.0,88000.0,1.0
max,60.0,150000.0,1.0


Age                0
EstimatedSalary    0
Purchased          0
dtype: int64

## Prepare Data

In [49]:
# Define features and labels
X = purchases.drop(['Purchased'], axis=1).values
y = purchases['Purchased'].values

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=0)

In [50]:
# Initialize scalar
scaler = StandardScaler()

# Fit scaler to data
scaler.fit(X_train)

# Scale features
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

## Prepare Classifier

In [51]:
# Initialize regressor object
classifier = LogisticRegression(random_state=0)

# Fit regressor to data
classifier.fit(X_train, y_train)

# Test classifier
print(classifier.score(X_test, y_test))

0.89


## Use Classifier

At this stage we have "built" our machine learning classifier.  Now we will examine how we can use this on an example of an individual who did make a purchase, and who did not make a purchase.  Examining our original data, the first case who did not make the purchase is a 27 year old with a salary of 84000, while the first case of someone who did make the purchase is a 32 year old with a salary of 150,000.

In [186]:
# Create example data point
example_0 = np.array([27, 84000])
example_1 = np.array([32, 150000])

examples = np.array([example_0, example_1])

# Scale example datapoint
examples = scaler.transform(examples)

# Predict class for example cases
print(classifier.predict(examples))

[0 1]


In addition to using the .predict() method, we may also use the .predict_prob() method, which returns the probability that an observation falls into each class.

In [187]:
print(classifier.predict_proba(examples))

[[0.94139491 0.05860509]
 [0.40700336 0.59299664]]


Above, we see that there is a 94% probabiltiy that the first example falls into class 0 (will not purchase), and a 0.06% probability that this example falls into class 1 (will purchase).  In the second example, there is a 41% probability that this case falls into class 0, and a 59% probability that this case falls into class 1.  The probabilities for each case will always sum to 1.  When reading such a probabilties matrix, we can not only predict a datapoint's class, but also determine the confidence in that prediction.  In the first case, we predict that the data point falls into class 0, and we are 94% confident thereof, while in the second case, we predict that the datapoint falls into class 1, but we are only 59% confident thereof.

##### Pick up here

In [129]:
# Predict purchase class for test points
y_predict = classifier.predict(X_test)

In [185]:
lenth_of_predictions = len(y_predict)

y_predict = y_predict.reshape(lenth_of_predictions, 1)
y_test = y_test.reshape(lenth_of_predictions,1)

np.concatenate([y_predict, y_test], axis=1)

array([[0, 0],
       [0, 0],
       [0, 0],
       [0, 0],
       [0, 0],
       [0, 0],
       [0, 0],
       [1, 1],
       [0, 0],
       [1, 0],
       [0, 0],
       [0, 0],
       [0, 0],
       [0, 0],
       [0, 0],
       [0, 0],
       [0, 0],
       [0, 0],
       [1, 1],
       [0, 0],
       [0, 0],
       [1, 1],
       [0, 0],
       [1, 1],
       [0, 0],
       [1, 1],
       [0, 0],
       [0, 0],
       [0, 0],
       [0, 0],
       [0, 0],
       [0, 1],
       [1, 1],
       [0, 0],
       [0, 0],
       [0, 0],
       [0, 0],
       [0, 0],
       [0, 0],
       [1, 1],
       [0, 0],
       [0, 0],
       [0, 0],
       [0, 0],
       [1, 1],
       [0, 0],
       [0, 0],
       [1, 1],
       [0, 0],
       [1, 1],
       [1, 1],
       [0, 0],
       [0, 0],
       [0, 0],
       [1, 1],
       [0, 1],
       [0, 0],
       [0, 0],
       [0, 1],
       [0, 0],
       [0, 0],
       [1, 1],
       [0, 0],
       [0, 1],
       [0, 0],
       [1, 1],
       [0,

## Visualize Classifier