### Fitting Logistic Regression

In this first notebook, you will be fitting a logistic regression model to a dataset where we would like to predict if a transaction is fraud or not.

To get started let's read in the libraries and take a quick look at the dataset.

In [1]:
import numpy as np
import pandas as pd
import statsmodels.api as sm


df = pd.read_csv('data/fraud_dataset.csv')
df[df['fraud']==True].head()

Unnamed: 0,transaction_id,duration,day,fraud
15,32057,4.909117,weekday,True
80,33212,3.931617,weekday,True
134,24194,3.273424,weekend,True
179,29647,2.433195,weekend,True
193,33493,3.679496,weekday,True


`1.` As you can see, there are two columns that need to be changed to dummy variables.  Replace each of the current columns to the dummy version.  Use the 1 for `weekday` and `True`, and 0 otherwise.  Use the first quiz to answer a few questions about the dataset.

In [2]:
new_df= pd.get_dummies(df)
new_df.rename(columns={'day_weekday':'weekday'},inplace=True)
del new_df['day_weekend']
new_df['fraud']=new_df['fraud'].map({True:1,False:0})
new_df[new_df['fraud']==True].head()

Unnamed: 0,transaction_id,duration,fraud,weekday
15,32057,4.909117,1,1
80,33212,3.931617,1,1
134,24194,3.273424,1,0
179,29647,2.433195,1,0
193,33493,3.679496,1,1


`2.` Now that you have dummy variables, fit a logistic regression model to predict if a transaction is fraud using both day and duration.  Don't forget an intercept!  Use the second quiz below to assure you fit the model correctly. Also remember to use the `.summary2() method to get your summary results.

In [9]:
y=new_df['fraud']
x1=new_df[['weekday','duration']]
x=sm.add_constant(x1)
log_mod=sm.Logit(y,x)
results=log_mod.fit()
results.summary2()

Optimization terminated successfully.
         Current function value: 0.002411
         Iterations 16


0,1,2,3
Model:,Logit,Pseudo R-squared:,0.963
Dependent Variable:,fraud,AIC:,48.4009
Date:,2022-01-19 21:31,BIC:,69.646
No. Observations:,8793,Log-Likelihood:,-21.2
Df Model:,2,LL-Null:,-578.1
Df Residuals:,8790,LLR p-value:,1.39e-242
Converged:,1.0000,Scale:,1.0
No. Iterations:,16.0000,,

0,1,2,3,4,5,6
,Coef.,Std.Err.,z,P>|z|,[0.025,0.975]
const,9.8709,1.9438,5.0783,0.0000,6.0613,13.6806
weekday,2.5465,0.9043,2.8160,0.0049,0.7741,4.3188
duration,-1.4637,0.2905,-5.0389,0.0000,-2.0331,-0.8944


In [4]:
new_df['fraud'].mean()

0.012168770612987604

In [5]:
new_df.query('fraud == 1')['duration'].mean()

4.624247370615658

In [6]:
new_df['weekday'].mean()

0.3452746502900034

In [7]:
new_df.query('fraud == 0')['duration'].mean()

30.013583132522587

In [10]:
np.exp(2.5465),np.exp(-1.4637)

(12.762357271496972, 0.2313785882117941)