# 로지스칙 회귀 + 표준화

- 독립변수는 정규분포를 따른다고 가정한다.

|변수|구분|설명|
|-:|-|-|
|합격여부|범주형|1=합격, 0=불합격|
|필기점수|연속형|800점 만점|
|학부성적|연속형|4.0 만점|
|병원경력|범주형|1: 10년이상, 2: 2~5년, 3: 1~5년, 4: 1년 미만|

## 1. 작업 준비
#### 패키지 참조 및 그래프 초기화

In [2]:
from pandas import read_excel
from matplotlib import pyplot as plt
import seaborn as sb
import sys, os

sys.path.append(os.path.dirname(os.path.dirname(os.getcwd())))
from helper import my_logit, scalling

plt.rcParams['font.family'] ='AppleGothic' if sys.platform == 'darwin' else 'Malgun Gothic'
plt.rcParams['font.size'] = 12
plt.rcParams['figure.figsize'] = (10, 5)
plt.rcParams['axes.unicode_minus'] = False

#### 데이터 가져오기

In [20]:
df = read_excel('https://data.hossam.kr/E05/gradeuate.xlsx')
df

Unnamed: 0,합격여부,필기점수,학부성적,병원경력
0,0,380,3.61,3
1,1,660,3.67,3
2,1,800,4.00,1
3,1,640,3.19,4
4,0,520,2.93,4
...,...,...,...,...
395,0,620,4.00,2
396,0,560,3.04,3
397,0,460,2.63,2
398,0,700,3.65,2


## 2. 데이터 표준화

In [21]:
df_tmp = df.drop('합격여부', axis = 1)
std_df = scalling(df_tmp)
std_df['합격여부'] = df['합격여부']
std_df

Unnamed: 0,필기점수,학부성적,병원경력,합격여부
0,-1.800263,0.579072,0.545968,0
1,0.626668,0.736929,0.545968,1
2,1.840134,1.605143,-1.574296,1
3,0.453316,-0.525927,1.606100,1
4,-0.586797,-1.209974,1.606100,0
...,...,...,...,...
395,0.279964,1.605143,-0.514164,0
396,-0.240093,-0.920570,0.545968,0
397,-1.106854,-1.999259,-0.514164,0
398,0.973373,0.684310,-0.514164,0


## 3. 로지스틱 회귀 분석 (모듈기능 활용)

In [22]:
logit_result = my_logit(std_df, y='합격여부', x=['필기점수', '학부성적', '병원경력'])
print(logit_result.summary)

Optimization terminated successfully.
         Current function value: 0.574302
         Iterations 5
                           Logit Regression Results                           
Dep. Variable:                   합격여부   No. Observations:                  400
Model:                          Logit   Df Residuals:                      396
Method:                           MLE   Df Model:                            3
Date:                Tue, 01 Aug 2023   Pseudo R-squ.:                 0.08107
Time:                        11:55:46   Log-Likelihood:                -229.72
converged:                       True   LL-Null:                       -249.99
Covariance Type:            nonrobust   LLR p-value:                 8.207e-09
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -0.8591      0.117     -7.345      0.000      -1.088      -0.630
필기점수           0.2647      0.

In [23]:
logit_result.cmdf

Unnamed: 0,Negative,Positive
True,253,29
False,98,20


In [24]:
logit_result.odds_rate_df

Unnamed: 0,odds_rate
Intercept,0.423557
필기점수,1.302986
학부성적,1.343577
병원경력,0.589627


In [25]:
logit_result.prs

0.08107331586891475

In [27]:
logit_result.result_df.T

Unnamed: 0,0
설명력(Pseudo-Rsqe),0.081073
정확도(Accuracy),0.705
정밀도(Precision),0.591837
"재현율(Recall, TPR)",0.228346
"위양성율(Fallout, FPR)",0.07326
"특이성(Specificity, TNR)",0.92674
RAS,0.577543
f1_score,0.329545
