[source 1](https://towardsdatascience.com/fitting-mlr-and-binary-logistic-regression-using-python-research-oriented-modeling-dcc22f1f0edf)

In [1]:
import numpy as np
import pandas as pd
import statsmodels.api as sm       
from statsmodels.formula.api import logit

from matplotlib import pyplot as plt
plt.rc("figure", figsize=(16,8))
plt.rc("font", size=14)

In [2]:
df = pd.read_csv("https://stats.idre.ucla.edu/stat/data/binary.csv")

In [3]:
df['rank'] = [str(i) for i in df['rank']]

In [4]:
df.head()

Unnamed: 0,admit,gre,gpa,rank
0,0,380,3.61,3
1,1,660,3.67,3
2,1,800,4.0,1
3,1,640,3.19,4
4,0,520,2.93,4


# 1. Predict admission by the results of GPA and GRE tests and the rank of the university.

In [5]:
model = logit(formula='admit ~ gpa + gre + rank', data=df).fit()
print(model.summary())

Optimization terminated successfully.
         Current function value: 0.573147
         Iterations 6
                           Logit Regression Results                           
Dep. Variable:                  admit   No. Observations:                  400
Model:                          Logit   Df Residuals:                      394
Method:                           MLE   Df Model:                            5
Date:                Mon, 01 Feb 2021   Pseudo R-squ.:                 0.08292
Time:                        20:55:05   Log-Likelihood:                -229.26
converged:                       True   LL-Null:                       -249.99
Covariance Type:            nonrobust   LLR p-value:                 7.578e-08
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -3.9900      1.140     -3.500      0.000      -6.224      -1.756
rank[T.2]     -0.6754      0.

# 2. Interpret the coefficients. Which are significant? Which are positive/negative?

## Significance

In [6]:
model.pvalues < 0.05

Intercept    True
rank[T.2]    True
rank[T.3]    True
rank[T.4]    True
gpa          True
gre          True
dtype: bool

All coefficients are **significant** as can be seen from their p-values which all are lower than 0.05

In [7]:
# Odds Ratio
round(np.exp(model.params), 3)

Intercept    0.019
rank[T.2]    0.509
rank[T.3]    0.262
rank[T.4]    0.212
gpa          2.235
gre          1.002
dtype: float64

- **rank=2**: increase to *second* rank multiplies odds of admission by **0.509** times
- **rank=3**: increase to *third* rank multiplies odds of admission by **0.262** times
- **rank=4**: increase to *fourth* rank multiplies odds of admission by **0.212** times


- **gpa**: increase in one unit of *gpa* multiplies odds of admission by **2.235** times
- **gre**: increase in one unit of *gre* multiplies odds of admission by **1.002** times

In [8]:
# Average Marginal Effects
model.get_margeff().summary()

0,1
Dep. Variable:,admit
Method:,dydx
At:,overall

Unnamed: 0,dy/dx,std err,z,P>|z|,[0.025,0.975]
rank[T.2],-0.1314,0.06,-2.184,0.029,-0.249,-0.013
rank[T.3],-0.2608,0.062,-4.176,0.0,-0.383,-0.138
rank[T.4],-0.3019,0.076,-3.956,0.0,-0.451,-0.152
gpa,0.1564,0.063,2.485,0.013,0.033,0.28
gre,0.0004,0.0,2.107,0.035,3.07e-05,0.001


- **rank=2**: increase to *second* rank **decreases** the probability of admission by **13.14%**
- **rank=3**: increase to *third* rank **decreases** the probability of admission by **26.08%**
- **rank=4**: increase to *fourth* rank **decreases** the probability of admission by **30.19%**


- **gpa**: increase in one unit of *gpa* **increases** the probability of admission by **15.64%**
- **gre**: increase in one unit of *gre* **increases** the probability of admission by **0.04%**

As can be seen from the odds ratios and AMEs increasing in school **rank affect negatively** on admission probability and increasing in **GPA & GRA affect positively** on admission probability.

# 3. Test how the model fits the data. Find pseudo R2 and PCP (ePCP).

In [9]:
print(model.summary().tables[0][3][2], model.summary().tables[0][3][3])

  Pseudo R-squ.:      0.08292


Pseudo R-squared is **lower** than 0.1 which is not enough for considering model as good.

In [10]:
(sum([round(i) for i in model.predict()]==df.admit)/len(df))*100

71.0

Our model have **71** percent of corrected predicted values! 