**Run the following two cells before you begin.**

In [1]:
%autosave 10

Autosaving every 10 seconds


In [2]:
import pandas as pd
import numpy as np

______________________________________________________________________
**First, import your data set and define the sigmoid function.**
<details>
    <summary>Hint:</summary>
    The definition of the sigmoid is $f(x) = \frac{1}{1 + e^{-X}}$.
</details>

In [3]:
# Import the data set
df = pd.read_csv("cleaned_data.csv")

In [27]:
import math
# Define the sigmoid function
def sigmoid(x):
    return (1/(1+math.exp(-x)))

**Now, create a train/test split (80/20) with `PAY_1` and `LIMIT_BAL` as features and `default payment next month` as values. Use a random state of 24.**

In [6]:
# Create a train/test split
X = df[['LIMIT_BAL','PAY_1']]
Y = df['default payment next month']

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.20, random_state = 24)

______________________________________________________________________
**Next, import LogisticRegression, with the default options, but set the solver to `'liblinear'`.**

In [7]:
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression(solver = 'liblinear')

______________________________________________________________________
**Now, train on the training data and obtain predicted classes, as well as class probabilities, using the testing data.**

In [8]:
# Fit the logistic regression model on training data
classifier.fit(X_train, y_train)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='auto', n_jobs=None, penalty='l2',
                   random_state=None, solver='liblinear', tol=0.0001, verbose=0,
                   warm_start=False)

In [9]:
# Make predictions using `.predict()`
y_pred = classifier.predict(X_test)

In [10]:
# Find class probabilities using `.predict_proba()`
y_prob = classifier.predict_proba(X_test)

______________________________________________________________________
**Then, pull out the coefficients and intercept from the trained model and manually calculate predicted probabilities. You'll need to add a column of 1s to your features, to multiply by the intercept.**

In [15]:
# Add column of 1s to features
x_1 = X_test.copy()
x_1['one'] = 1
x_1.head()

Unnamed: 0,LIMIT_BAL,PAY_1,one
14306,160000,2,1
2978,50000,1,1
16641,200000,-1,1
18580,200000,3,1
131,50000,1,1


In [24]:
# Get coefficients and intercepts from trained model
coef = classifier.coef_
intercept = classifier.intercept_
print(coef)
print(intercept)

[[-6.80876727e-06  8.27451187e-11]]
[-6.57647457e-11]


In [30]:
# Manually calculate predicted probabilities
y_prob1=[]

for i in range(len(x_1)):
    z = coef[0][0]*x_1.iloc[i,0] + coef[0][1]*x_1.iloc[i,1] + intercept[0]*x_1.iloc[i,2]
    y_prob1.append(sigmoid(z))


______________________________________________________________________
**Next, using a threshold of `0.5`, manually calculate predicted classes. Compare this to the class predictions output by scikit-learn.**

In [32]:
# Manually calculate predicted classes
y_pred1 = []
for i in y_prob1:
    if i < 0.5:
        y_pred1.append(0)
    else:
        y_pred1.append(1)


In [41]:
# Compare to scikit-learn's predicted classes
from sklearn.metrics import confusion_matrix

skl = confusion_matrix(y_test,y_pred)
manual = confusion_matrix(y_test,y_pred1)

print(skl)
print(manual)

[[4178    0]
 [1155    0]]
[[4178    0]
 [1155    0]]


______________________________________________________________________
**Finally, calculate ROC AUC using both scikit-learn's predicted probabilities, and your manually predicted probabilities, and compare.**

In [37]:
# Use scikit-learn's predicted probabilities to calculate ROC AUC
from sklearn.metrics import roc_auc_score

ROC_AUC = roc_auc_score(y_test, y_prob[:,1])
print(ROC_AUC)

0.627207450280691


In [39]:
# Use manually calculated predicted probabilities to calculate ROC AUC
ROC_AUC1 = roc_auc_score(y_test, y_prob1)
print(ROC_AUC)

0.627207450280691
