**Run the following two cells before you begin.**

In [1]:
%autosave 10

Autosaving every 10 seconds


In [2]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score

______________________________________________________________________
**First, import your data set and define the sigmoid function.**
<details>
    <summary>Hint:</summary>
    The definition of the sigmoid is $f(x) = \frac{1}{1 + e^{-X}}$.
</details>

In [3]:
# Import the data set
df=pd.read_csv("Task 1 Data Set/cleaned_data.csv")
df.head(5)

Unnamed: 0,ID,LIMIT_BAL,SEX,EDUCATION,MARRIAGE,AGE,PAY_1,PAY_2,PAY_3,PAY_4,...,PAY_AMT3,PAY_AMT4,PAY_AMT5,PAY_AMT6,default payment next month,EDUCATION_CAT,graduate school,high school,others,university
0,798fc410-45c1,20000,2,2,1,24,2,2,-1,-1,...,0,0,0,0,1,university,0,0,0,1
1,8a8c8f3b-8eb4,120000,2,2,2,26,-1,2,0,0,...,1000,1000,0,2000,1,university,0,0,0,1
2,85698822-43f5,90000,2,2,2,34,0,0,0,0,...,1000,1000,1000,5000,0,university,0,0,0,1
3,0737c11b-be42,50000,2,2,1,37,0,0,0,0,...,1200,1100,1069,1000,0,university,0,0,0,1
4,3b7f77cc-dbc0,50000,1,2,1,57,-1,0,-1,0,...,10000,9000,689,679,0,university,0,0,0,1


In [4]:
def sigmoid_func(q):
    p=1/(1/1+np.exp(-q))
    return p

**Now, create a train/test split (80/20) with `PAY_1` and `LIMIT_BAL` as features and `default payment next month` as values. Use a random state of 24.**

In [5]:
# Create a train/test split
x=df[['PAY_1','LIMIT_BAL']]
y=df['default payment next month']
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25, random_state=1)

______________________________________________________________________
**Next, import LogisticRegression, with the default options, but set the solver to `'liblinear'`.**

In [6]:
Logistic=LogisticRegression(solver='liblinear')

______________________________________________________________________
**Now, train on the training data and obtain predicted classes, as well as class probabilities, using the testing data.**

In [7]:
# Fit the logistic regression model on training data
Logistic.fit(x_train,y_train)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='liblinear', tol=0.0001, verbose=0,
                   warm_start=False)

In [8]:
# Make predictions using `.predict()`
y_pred=Logistic.predict(x_test)
y_pred

array([0, 0, 0, ..., 0, 0, 0], dtype=int64)

In [9]:
# Find class probabilities using `.predict_proba()`
y_pred_prob=Logistic.predict_proba(x_test)
y_pred_prob

array([[0.83835089, 0.16164911],
       [0.96170667, 0.03829333],
       [0.83835089, 0.16164911],
       ...,
       [0.79765047, 0.20234953],
       [0.79765047, 0.20234953],
       [0.92194077, 0.07805923]])

______________________________________________________________________
**Then, pull out the coefficients and intercept from the trained model and manually calculate predicted probabilities. You'll need to add a column of 1s to your features, to multiply by the intercept.**

In [10]:
# Add column of 1s to features
np.ones((x_test.shape[0],1))

array([[1.],
       [1.],
       [1.],
       ...,
       [1.],
       [1.],
       [1.]])

In [11]:
# Get coefficients and intercepts from trained model
Logistic_coeff=Logistic.coef_
Logistic_interc=Logistic.intercept_
print("Coefficients -->",Logistic_coeff,"\nIntercepts -->",Logistic_interc)

Coefficients --> [[ 8.27642915e-11 -6.85836968e-06]] 
Intercepts --> [-6.69409999e-11]


In [12]:
# Manually calculate predicted probabilities
y_manu_pred_prob=np.dot(np.concatenate([Logistic_interc.reshape(1,1),Logistic_coeff],axis=1),np.transpose(np.hstack([np.ones((x_test.shape[0],1)),x_test])))
y_manu_pred_prob

array([[-1.64600872, -3.22343375, -1.64600872, ..., -1.37167393,
        -1.37167394, -2.46901308]])

______________________________________________________________________
**Next, using a threshold of `0.5`, manually calculate predicted classes. Compare this to the class predictions output by scikit-learn.**

In [13]:
sig=sigmoid_func(y_manu_pred_prob)

In [14]:
# Manually calculate predicted classes
manu=sig>=0.5
manu.shape

(1, 6666)

In [15]:
# Compare to scikit-learn's predicted classes
np.array_equal(manu,y_pred.reshape(1,-1))

True

______________________________________________________________________
**Finally, calculate ROC AUC using both scikit-learn's predicted probabilities, and your manually predicted probabilities, and compare.**

In [16]:
# Use scikit-learn's predicted probabilities to calculate ROC AUC
print(roc_auc_score(y_test,y_pred_prob[:,1]))

0.63023130868276


In [17]:
# Use manually calculated predicted probabilities to calculate ROC AUC
print(roc_auc_score(y_test,manu.reshape(manu.shape[1])))

0.5
