**Run the following two cells before you begin.**

In [1]:
%autosave 10

Autosaving every 10 seconds


In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

______________________________________________________________________
**First, import your data set and define the sigmoid function.**
<details>
    <summary>Hint:</summary>
    The definition of the sigmoid is $f(x) = \frac{1}{1 + e^{-X}}$.
</details>

In [3]:
# Import the data set
df=pd.read_csv('cleaned_data.csv')

In [4]:
# Define the sigmoid function
def sigmoid(z):
    
    return 1/(1+np.exp(-z))

**Now, create a train/test split (80/20) with `PAY_1` and `LIMIT_BAL` as features and `default payment next month` as values. Use a random state of 24.**

In [5]:
# Create a train/test split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
df[['LIMIT_BAL','PAY_1']], df['default payment next month'].values,
test_size=0.2, random_state=24)

print(X_train.shape)

(21331, 2)


______________________________________________________________________
**Next, import LogisticRegression, with the default options, but set the solver to `'liblinear'`.**

In [6]:
from sklearn.linear_model import LogisticRegression
my_lr = LogisticRegression(solver='liblinear')



______________________________________________________________________
**Now, train on the training data and obtain predicted classes, as well as class probabilities, using the testing data.**

In [7]:
# Fit the logistic regression model on training data
my_lr.fit(X_train, y_train)


LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)

In [8]:
# Make predictions using `.predict()`
y_pred=my_lr.predict(X_test)
y_pred


array([0, 0, 0, ..., 0, 0, 0], dtype=int64)

In [9]:
# Find class probabilities using `.predict_proba()`
y_pred1=my_lr.predict_proba(X_test)
y_pred1

array([[0.74826924, 0.25173076],
       [0.584297  , 0.415703  ],
       [0.79604453, 0.20395547],
       ...,
       [0.584297  , 0.415703  ],
       [0.82721498, 0.17278502],
       [0.66393435, 0.33606565]])

______________________________________________________________________
**Then, pull out the coefficients and intercept from the trained model and manually calculate predicted probabilities. You'll need to add a column of 1s to your features, to multiply by the intercept.**

In [11]:
# Add column of 1s to features


In [10]:
# Get coefficients and intercepts from trained model
b=my_lr.intercept_
w=my_lr.coef_
print(w)
print(b)

[[-6.80876727e-06  8.27451187e-11]]
[-6.57647457e-11]


In [79]:
# Manually calculate predicted probabilities
def predict(w, b, X):
    z = np.dot(w,X.T)
    a = sigmoid(z)
    return a.T
pred_prob=predict(w,b,X_test)
print(pred_prob.shape)

(5333, 1)


______________________________________________________________________
**Next, using a threshold of `0.5`, manually calculate predicted classes. Compare this to the class predictions output by scikit-learn.**

In [80]:
# Manually calculate predicted classes
man_pred=[]
for i in pred_prob[0]:
    if i>0.5:
        man_pred.append(1)
    else:
        man_pred.append(0)
man_pred=np.array(man_pred)
        


In [81]:
# Compare to scikit-learn's predicted classes
print(sum(y_pred))
print(sum(man_pred))
print("test accuracy: {} %".format(100 - np.mean(np.abs(y_pred - y_test)) * 100))

0
0
test accuracy: 78.34239639977498 %


______________________________________________________________________
**Finally, calculate ROC AUC using both scikit-learn's predicted probabilities, and your manually predicted probabilities, and compare.**

In [82]:
# Use scikit-learn's predicted probabilities to calculate ROC AUC
from sklearn.metrics import roc_auc_score
print(roc_auc_score(y_test,y_pred1[:,1]))

0.627207450280691


In [85]:
# Use manually calculated predicted probabilities to calculate ROC AUC
print(roc_auc_score(y_test,pred_prob))


0.627207450280691
