**Run the following two cells before you begin.**

In [1]:
%autosave 10

Autosaving every 10 seconds


In [44]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score


______________________________________________________________________
**First, import your data set and define the sigmoid function.**
<details>
    <summary>Hint:</summary>
    The definition of the sigmoid is $f(x) = \frac{1}{1 + e^{-X}}$.
</details>

In [10]:
# Import the data set
df = pd.read_csv("cleaned_data.csv")
df.head()

Unnamed: 0,ID,LIMIT_BAL,SEX,EDUCATION,MARRIAGE,AGE,PAY_1,PAY_2,PAY_3,PAY_4,...,PAY_AMT3,PAY_AMT4,PAY_AMT5,PAY_AMT6,default payment next month,EDUCATION_CAT,graduate school,high school,others,university
0,798fc410-45c1,20000,2,2,1,24,2,2,-1,-1,...,0,0,0,0,1,university,0,0,0,1
1,8a8c8f3b-8eb4,120000,2,2,2,26,-1,2,0,0,...,1000,1000,0,2000,1,university,0,0,0,1
2,85698822-43f5,90000,2,2,2,34,0,0,0,0,...,1000,1000,1000,5000,0,university,0,0,0,1
3,0737c11b-be42,50000,2,2,1,37,0,0,0,0,...,1200,1100,1069,1000,0,university,0,0,0,1
4,3b7f77cc-dbc0,50000,1,2,1,57,-1,0,-1,0,...,10000,9000,689,679,0,university,0,0,0,1


In [11]:
# Define the sigmoid function
def sigmoid_function(x):
    sigmoid = 1/(1 + np.exp(-x)) 
    return sigmoid

**Now, create a train/test split (80/20) with `PAY_1` and `LIMIT_BAL` as features and `default payment next month` as values. Use a random state of 24.**

In [12]:
X = df[['PAY_1','LIMIT_BAL']].values
y = df['default payment next month'].values
X_train,X_test,y_train,y_test=train_test_split(X, y, test_size=0.2,random_state=42)

______________________________________________________________________
**Next, import LogisticRegression, with the default options, but set the solver to `'liblinear'`.**

In [14]:
model=LogisticRegression(solver='liblinear')


LogisticRegression(solver='liblinear')

______________________________________________________________________
**Now, train on the training data and obtain predicted classes, as well as class probabilities, using the testing data.**

In [None]:
# Fit the logistic regression model on training data
model.fit(X_train, y_train)

In [15]:
# Make predictions using `.predict()`
predictions = model.predict(X_test)
predictions

array([0, 0, 0, ..., 0, 0, 0], dtype=int64)

In [17]:
# Find class probabilities using `.predict_proba()`
predictions_probabilities=model.predict_proba(X_test)
predictions_probabilities

array([[0.86916481, 0.13083519],
       [0.94481988, 0.05518012],
       [0.66290875, 0.33709125],
       ...,
       [0.72047061, 0.27952939],
       [0.8053691 , 0.1946309 ],
       [0.87666522, 0.12333478]])

______________________________________________________________________
**Then, pull out the coefficients and intercept from the trained model and manually calculate predicted probabilities. You'll need to add a column of 1s to your features, to multiply by the intercept.**

In [18]:
# Add column of 1s to features
np.ones((X_test.shape[0],1))
one_value_feature=np.hstack([np.ones((X_test.shape[0],1)),X_test])
one_value_feature

array([[ 1.0e+00, -2.0e+00,  2.8e+05],
       [ 1.0e+00, -1.0e+00,  4.2e+05],
       [ 1.0e+00, -1.0e+00,  1.0e+05],
       ...,
       [ 1.0e+00,  0.0e+00,  1.4e+05],
       [ 1.0e+00,  0.0e+00,  2.1e+05],
       [ 1.0e+00,  2.0e+00,  2.9e+05]])

In [23]:
# Get coefficients and intercepts from trained model
print(f'The coefficients are {model.coef_[0][0]}, {model.coef_[0][1]}  and the intercept is {model.intercept_[0]}')

The coefficients are 8.229765737070355e-11, -6.762836883034355e-06  and the intercept is -6.5925549683243e-11


In [41]:
# Manually calculate predicted probabilities
coefficients_intercept_array = np.array([[model.intercept_[0], model.coef_[0][0], model.coef_[0][1]]])
manual_predicted_probability=np.dot(coefficients_intercept_array,np.transpose(one_value_feature))
#Passing it through sigmoid function
manual_predicted_probability_sigmoid=sigmoid_function(manual_predicted_probability)
manual_predicted_probability_sigmoid

array([[0.13083519, 0.05518012, 0.33709125, ..., 0.27952939, 0.1946309 ,
        0.12333478]])

______________________________________________________________________
**Next, using a threshold of `0.5`, manually calculate predicted classes. Compare this to the class predictions output by scikit-learn.**

In [42]:
# Manually calculate predicted classes
manual_predictions=manual_predicted_probability_sigmoid>=0.5
manual_predictions

array([[False, False, False, ..., False, False, False]])

In [43]:
# Compare to scikit-learn's predicted classes
np.array_equal(predictions.reshape(1,-1),manual_predictions)

True

______________________________________________________________________
**Finally, calculate ROC AUC using both scikit-learn's predicted probabilities, and your manually predicted probabilities, and compare.**

In [45]:
# Use scikit-learn's predicted probabilities to calculate ROC AUC
roc_auc_score(y_test,predictions_probabilities[:,1])

0.6374912949931919

In [46]:
# Use manually calculated predicted probabilities to calculate ROC AUC
roc_auc_score(y_test,manual_predicted_probability.reshape(manual_predicted_probability.shape[1],))

0.6374912949931919