### Fitting Logistic Regression

In this first notebook, you will be fitting a logistic regression model to a dataset where we would like to predict if a transaction is fraud or not.

To get started let's read in the libraries and take a quick look at the dataset.

In [166]:
import numpy as np
import pandas as pd
import statsmodels.api as sm

df = pd.read_csv('./fraud_dataset.csv')
df.head()

Unnamed: 0,transaction_id,duration,day,fraud
0,28891,21.3026,weekend,False
1,61629,22.932765,weekend,False
2,53707,32.694992,weekday,False
3,47812,32.784252,weekend,False
4,43455,17.756828,weekend,False


`1.` As you can see, there are two columns that need to be changed to dummy variables.  Replace each of the current columns to the dummy version.  Use the 1 for `weekday` and `True`, and 0 otherwise.  Use the first quiz to answer a few questions about the dataset.

In [167]:
df_new = df.copy() 

In [168]:
df_new[['no_fraud','fraud']] = pd.get_dummies(df['fraud']) 

In [169]:
df_new.head()

Unnamed: 0,transaction_id,duration,day,fraud,no_fraud
0,28891,21.3026,weekend,0,1
1,61629,22.932765,weekend,0,1
2,53707,32.694992,weekday,0,1
3,47812,32.784252,weekend,0,1
4,43455,17.756828,weekend,0,1


In [170]:
df_new[df_new.fraud==1].head(2)

Unnamed: 0,transaction_id,duration,day,fraud,no_fraud
15,32057,4.909117,weekday,1,0
80,33212,3.931617,weekday,1,0


In [171]:
df_new=df_new.join(pd.get_dummies(df_new['day']))

In [172]:
df_new.head(2)

Unnamed: 0,transaction_id,duration,day,fraud,no_fraud,weekday,weekend
0,28891,21.3026,weekend,0,1,0,1
1,61629,22.932765,weekend,0,1,0,1


In [173]:
df_new.drop('day', axis=1, inplace=True)

In [174]:

df_new.head(2

Unnamed: 0,transaction_id,duration,fraud,no_fraud,weekday,weekend
0,28891,21.3026,0,1,0,1
1,61629,22.932765,0,1,0,1


> We would drop 'no_fraud', 'weekend' as our baseline

In [175]:
df_new.drop(['no_fraud','weekend'], axis=1, inplace=True)

In [176]:
df_new.head()

Unnamed: 0,transaction_id,duration,fraud,weekday
0,28891,21.3026,0,0
1,61629,22.932765,0,0
2,53707,32.694992,0,1
3,47812,32.784252,0,0
4,43455,17.756828,0,0


In [177]:
df_new.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8793 entries, 0 to 8792
Data columns (total 4 columns):
transaction_id    8793 non-null int64
duration          8793 non-null float64
fraud             8793 non-null uint8
weekday           8793 non-null uint8
dtypes: float64(1), int64(1), uint8(2)
memory usage: 154.6 KB


In [178]:
fraud_prop = df_new.query('fraud == 1').shape[0]/df_new.shape[0]

f'Proportion of Fraudlent Trnasactions: {fraud_prop}'

'Proportion of Fraudlent Trnasactions: 0.012168770612987604'

In [179]:
avg_fraud_duration = df_new.query('fraud == 1').duration.mean()
avg_fraud_duration

4.6242473706156568

In [180]:
weekday_prop = df_new.query('weekday == 1').shape[0]/df_new.shape[0]

f'Proportion of weekday transactions: {weekday_prop}'

'Proportion of weekday transactions: 0.3452746502900034'

In [181]:
avg_nonfraud_duration = df_new.query('fraud == 0').duration.mean()
avg_nonfraud_duration

30.013583132522555

`2.` Now that you have dummy variables, fit a logistic regression model to predict if a transaction is fraud using both day and duration.  Don't forget an intercept!  Use the second quiz below to assure you fit the model correctly.

In [182]:
df_new['intercept'] =1 
response = df_new['fraud'] 
predictors = df_new[['intercept', 'duration', 'weekday']] 

In [183]:
fraud_model= sm.Logit(response, predictors)

result = fraud_model.fit()

result.summary2()

Optimization terminated successfully.
         Current function value: inf
         Iterations 16


  return 1/(1+np.exp(-X))
  return np.sum(np.log(self.cdf(q*np.dot(X,params))))
  return 1 - self.llf/self.llnull


0,1,2,3
Model:,Logit,No. Iterations:,16.0
Dependent Variable:,fraud,Pseudo R-squared:,
Date:,2020-05-27 08:01,AIC:,inf
No. Observations:,8793,BIC:,inf
Df Model:,2,Log-Likelihood:,-inf
Df Residuals:,8790,LL-Null:,-inf
Converged:,1.0000,Scale:,1.0

0,1,2,3,4,5,6
,Coef.,Std.Err.,z,P>|z|,[0.025,0.975]
intercept,9.8709,1.9438,5.0783,0.0000,6.0613,13.6806
duration,-1.4637,0.2905,-5.0389,0.0000,-2.0331,-0.8944
weekday,2.5465,0.9043,2.8160,0.0049,0.7741,4.3188


2