## R에서 회귀분석 및 변수선택법

- R은 통계 분야에 특화된 언어라서 Python보다 손쉽게 회귀분석 및 변수선택법을 수행할 수 있음

In [0]:
moneyball = read.csv("data/baseball.csv")

In [0]:
# RS와 RA 두 변수를 이용한 선형회귀분석 모델(상수항 언급할 필요 없음)

lm = lm(Playoffs ~ RS + RA, data = moneyball)
summary(lm)


Call:
lm(formula = Playoffs ~ RS + RA, data = moneyball)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.65661 -0.24151 -0.07115  0.16883  0.90119 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.1090808  0.0863715  -1.263    0.207    
RS           0.0024746  0.0001088  22.753   <2e-16 ***
RA          -0.0020450  0.0001070 -19.121   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.3231 on 1229 degrees of freedom
Multiple R-squared:  0.3445,	Adjusted R-squared:  0.3434 
F-statistic: 322.9 on 2 and 1229 DF,  p-value: < 2.2e-16


In [0]:
# RS와 RA 두 변수를 이용한 로지스틱회귀분석 모델(상수항 언급할 필요 없음)

glm = glm(Playoffs ~ RS + RA, data = moneyball, family = binomial)
summary(glm)


Call:
glm(formula = Playoffs ~ RS + RA, family = binomial, data = moneyball)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-2.48032  -0.39052  -0.11953  -0.02199   3.01063  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) -5.098231   0.980387   -5.20 1.99e-07 ***
RS           0.032685   0.002266   14.42  < 2e-16 ***
RA          -0.029999   0.002228  -13.46  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1226.31  on 1231  degrees of freedom
Residual deviance:  632.57  on 1229  degrees of freedom
AIC: 638.57

Number of Fisher Scoring iterations: 7


In [0]:
# RS, RA, OBP, SLG, BA, G 모든 변수를 이용한 로지스틱회귀분석 모델(상수항 언급할 필요 없음)

glm = glm(Playoffs ~ RS + RA + OBP + SLG + BA + G, data = moneyball, family = binomial)
summary(glm)


Call:
glm(formula = Playoffs ~ RS + RA + OBP + SLG + BA + G, family = binomial, 
    data = moneyball)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-2.45163  -0.38411  -0.11670  -0.02104   2.85167  

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -7.051707  29.004020  -0.243   0.8079    
RS            0.022982   0.004327   5.311 1.09e-07 ***
RA           -0.031057   0.002330 -13.331  < 2e-16 ***
OBP          44.845883  18.452902   2.430   0.0151 *  
SLG          18.942805   8.275962   2.289   0.0221 *  
BA          -22.765310  15.575113  -1.462   0.1438    
G            -0.041106   0.173061  -0.238   0.8122    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1226.3  on 1231  degrees of freedom
Residual deviance:  623.4  on 1225  degrees of freedom
AIC: 637.4

Number of Fisher Scoring iterations: 7


In [0]:
# 변수선택법(후진제거법)

step(glm, direction = "backward")

Start:  AIC=637.4
Playoffs ~ RS + RA + OBP + SLG + BA + G

       Df Deviance     AIC
- G     1   623.45  635.45
<none>      623.40  637.40
- BA    1   625.54  637.54
- SLG   1   628.71  640.71
- OBP   1   629.36  641.36
- RS    1   654.05  666.05
- RA    1  1020.98 1032.98

Step:  AIC=635.45
Playoffs ~ RS + RA + OBP + SLG + BA

       Df Deviance     AIC
<none>      623.45  635.45
- BA    1   625.59  635.59
- SLG   1   628.99  638.99
- OBP   1   629.63  639.63
- RS    1   654.49  664.49
- RA    1  1021.18 1031.18



Call:  glm(formula = Playoffs ~ RS + RA + OBP + SLG + BA, family = binomial, 
    data = moneyball)

Coefficients:
(Intercept)           RS           RA          OBP          SLG           BA  
  -13.86694      0.02282     -0.03107     45.33790     19.19142    -22.67300  

Degrees of Freedom: 1231 Total (i.e. Null);  1226 Residual
Null Deviance:	    1226 
Residual Deviance: 623.5 	AIC: 635.5

In [0]:
# 변수선택법(전진선택법)

step(glm, direction = "forward")

Start:  AIC=637.4
Playoffs ~ RS + RA + OBP + SLG + BA + G




Call:  glm(formula = Playoffs ~ RS + RA + OBP + SLG + BA + G, family = binomial, 
    data = moneyball)

Coefficients:
(Intercept)           RS           RA          OBP          SLG           BA  
   -7.05171      0.02298     -0.03106     44.84588     18.94281    -22.76531  
          G  
   -0.04111  

Degrees of Freedom: 1231 Total (i.e. Null);  1225 Residual
Null Deviance:	    1226 
Residual Deviance: 623.4 	AIC: 637.4