**Run the following two cells before you begin.**

In [1]:
%autosave 10

Autosaving every 10 seconds


In [6]:
import pandas as pd
import numpy as np
from sklearn.metrics import *
from sklearn.model_selection import train_test_split

______________________________________________________________________
**First, import your data set and define the sigmoid function.**
<details>
    <summary>Hint:</summary>
    The definition of the sigmoid is $f(x) = \frac{1}{1 + e^{-X}}$.
</details>

In [3]:
# Import the data set
df = pd.read_csv('cleaned_data.csv')

In [4]:
# Define the sigmoid function
def sigmoid(x):
    return 1/(1+np.exp(-x))

**Now, create a train/test split (80/20) with `PAY_1` and `LIMIT_BAL` as features and `default payment next month` as values. Use a random state of 24.**

In [12]:
# Create a train/test split
X_train,X_test , y_train,y_test = train_test_split(df[['PAY_1','LIMIT_BAL']].values\
                                                   ,df['default payment next month'].values,test_size=0.2,random_state=24)

______________________________________________________________________
**Next, import LogisticRegression, with the default options, but set the solver to `'liblinear'`.**

In [13]:
from sklearn.linear_model import LogisticRegression

In [14]:
lr = LogisticRegression(solver='liblinear')

______________________________________________________________________
**Now, train on the training data and obtain predicted classes, as well as class probabilities, using the testing data.**

In [16]:
# Fit the logistic regression model on training data
lr.fit(X_train,y_train)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='liblinear', tol=0.0001, verbose=0,
                   warm_start=False)

In [17]:
# Make predictions using `.predict()`
test_pred = lr.predict(X_test)

In [18]:
# Find class probabilities using `.predict_proba()`
test_pred_prob = lr.predict_proba(X_test)

______________________________________________________________________
**Then, pull out the coefficients and intercept from the trained model and manually calculate predicted probabilities. You'll need to add a column of 1s to your features, to multiply by the intercept.**

In [23]:
# Add column of 1s to features
ones_and_features = np.hstack([np.ones((X_test.shape[0],1)), X_test])

In [24]:
# Get coefficients and intercepts from trained model
intercept_and_coefs = np.concatenate([lr.intercept_.reshape(1,1), lr.coef_], axis=1)
intercept_and_coefs

array([[-6.57647457e-11,  8.27451187e-11, -6.80876727e-06]])

In [31]:
# Manually calculate predicted probabilities
manual_pred_prob = np.dot(intercept_and_coefs ,np.transpose(ones_and_features))
manual_pred_prob  = sigmoid(manual_pred_prob)

______________________________________________________________________
**Next, using a threshold of `0.5`, manually calculate predicted classes. Compare this to the class predictions output by scikit-learn.**

In [33]:
# Manually calculate predicted classes
manual_predict = manual_pred_prob >=0.5

In [36]:
# Compare to scikit-learn's predicted classes
np.array_equal(test_pred.reshape(1,-1),manual_predict)

True

______________________________________________________________________
**Finally, calculate ROC AUC using both scikit-learn's predicted probabilities, and your manually predicted probabilities, and compare.**

In [38]:
# Use scikit-learn's predicted probabilities to calculate ROC AUC
roc_auc_score(y_test,test_pred_prob[:,1])

0.627207450280691

In [42]:
# Use manually calculated predicted probabilities to calculate ROC AUC
roc_auc_score(y_test , manual_pred_prob.reshape(manual_pred_prob.shape[1],))

0.627207450280691