### Interpreting Results of Logistic Regression

In this notebook (and quizzes), you will be getting some practice with interpreting the coefficients in logistic regression.  Using what you saw in the previous video should be helpful in assisting with this notebook.

The dataset contains four variables: `admit`, `gre`, `gpa`, and `prestige`:

* `admit` is a binary variable. It indicates whether or not a candidate was admitted into UCLA (admit = 1) our not (admit = 0).
* `gre` is the GRE score. GRE stands for Graduate Record Examination.
* `gpa` stands for Grade Point Average.
* `prestige` is the prestige of an applicant alta mater (the school attended before applying), with 1 being the highest (highest prestige) and 4 as the lowest (not prestigious).

To start, let's read in the necessary libraries and data.

In [1]:
import numpy as np
import pandas as pd
import statsmodels.api as sm

df = pd.read_csv("./admissions.csv")
df.head()

  from pandas.core import datetools


Unnamed: 0,admit,gre,gpa,prestige
0,0,380,3.61,3
1,1,660,3.67,3
2,1,800,4.0,1
3,1,640,3.19,4
4,0,520,2.93,4


There are a few different ways you might choose to work with the `prestige` column in this dataset.  For this dataset, we will want to allow for the change from prestige 1 to prestige 2 to allow a different acceptance rate than changing from prestige 3 to prestige 4.

1. With the above idea in place, create the dummy variables needed to change prestige to a categorical variable, rather than quantitative, then answer quiz 1 below.

In [2]:
df[['prest_1', 'prest_2', 'prest_3', 'prest_4']] = pd.get_dummies(df['prestige'])
df.head(1)

Unnamed: 0,admit,gre,gpa,prestige,prest_1,prest_2,prest_3,prest_4
0,0,380,3.61,3,0,0,1,0


In [3]:
df = df.drop('prest_1', axis = 1)
df.head(1)

Unnamed: 0,admit,gre,gpa,prestige,prest_2,prest_3,prest_4
0,0,380,3.61,3,0,1,0


In [4]:
from scipy import stats

stats.chisqprob = lambda chisq, df: stats.chi2.sf(chisq, df)



`2.` Now, fit a logistic regression model to predict if an individual is admitted using `gre`, `gpa`, and `prestige` with a baseline of the prestige value of `1`.  Use the results to answer quiz 2 and 3 below.  Don't forget an intercept.

In [5]:
df['intercept'] = 1
logit = sm.Logit(df['admit'], df[['gre', 'gpa', 'prest_2', 'prest_3', 'prest_4']])
results = logit.fit()
results.summary()

Optimization terminated successfully.
         Current function value: 0.589121
         Iterations 5


0,1,2,3
Dep. Variable:,admit,No. Observations:,397.0
Model:,Logit,Df Residuals:,392.0
Method:,MLE,Df Model:,4.0
Date:,"Wed, 03 Feb 2021",Pseudo R-squ.:,0.05722
Time:,01:52:01,Log-Likelihood:,-233.88
converged:,True,LL-Null:,-248.08
,,LLR p-value:,1.039e-05

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
gre,0.0014,0.001,1.308,0.191,-0.001,0.003
gpa,-0.1323,0.195,-0.680,0.497,-0.514,0.249
prest_2,-0.9562,0.302,-3.171,0.002,-1.547,-0.365
prest_3,-1.5375,0.332,-4.627,0.000,-2.189,-0.886
prest_4,-1.8699,0.401,-4.658,0.000,-2.657,-1.083


In [6]:
results.summary2()

0,1,2,3
Model:,Logit,Pseudo R-squared:,0.057
Dependent Variable:,admit,AIC:,477.7621
Date:,2021-02-03 01:53,BIC:,497.6817
No. Observations:,397,Log-Likelihood:,-233.88
Df Model:,4,LL-Null:,-248.08
Df Residuals:,392,LLR p-value:,1.0387e-05
Converged:,1.0000,Scale:,1.0
No. Iterations:,5.0000,,

0,1,2,3,4,5,6
,Coef.,Std.Err.,z,P>|z|,[0.025,0.975]
gre,0.0014,0.0010,1.3085,0.1907,-0.0007,0.0034
gpa,-0.1323,0.1946,-0.6800,0.4965,-0.5137,0.2490
prest_2,-0.9562,0.3016,-3.1709,0.0015,-1.5473,-0.3652
prest_3,-1.5375,0.3323,-4.6270,0.0000,-2.1888,-0.8862
prest_4,-1.8699,0.4014,-4.6580,0.0000,-2.6567,-1.0831


In [10]:
np.exp(1.2948), np.exp(0.5120), np.exp(-0.0504)

(3.6502658479614949, 1.668625110139667, 0.95084900881912227)

In [11]:
1/np.exp(-0.0504)

1.0516916889274768