Skip to content

Latest commit

 

History

History
95 lines (75 loc) · 3.51 KB

slr-rsq.md

File metadata and controls

95 lines (75 loc) · 3.51 KB
layout mathjax author affiliation e_mail date title chapter section topic theorem sources proof_id shortcut username
proof
true
Joram Soch
BCCN Berlin
joram.soch@bccn-berlin.de
2021-10-27 08:31:00 -0700
Relationship between coefficient of determination and correlation coefficient in simple linear regression
Statistical Models
Univariate normal data
Simple linear regression
Coefficient of determination in terms of correlation coefficient
authors year title in pages url
Wikipedia
2021
Simple linear regression
Wikipedia, the free encyclopedia
retrieved on 2021-10-27
authors year title in pages url
Wikipedia
2021
Coefficient of determination
Wikipedia, the free encyclopedia
retrieved on 2021-10-27
authors year title in pages url
Wikipedia
2021
Correlation
Wikipedia, the free encyclopedia
retrieved on 2021-10-27
P280
slr-rsq
JoramSoch

Theorem: Assume a simple linear regression model with independent observations

$$ \label{eq:slr} y = \beta_0 + \beta_1 x + \varepsilon, ; \varepsilon_i \sim \mathcal{N}(0, \sigma^2), ; i = 1,\ldots,n $$

and consider estimation using ordinary least squares. Then, the coefficient of determination is equal to the squared correlation coefficient between $x$ and $y$:

$$ \label{eq:slr-R2} R^2 = r_{xy}^2 ; . $$

Proof: The ordinary least squares estimates for simple linear regression are

$$ \label{eq:slr-ols} \begin{split} \hat{\beta}_0 &= \bar{y} - \hat{\beta}_1 \bar{x} \ \hat{\beta}1 &= \frac{s{xy}}{s_x^2} ; . \end{split} $$

The coefficient of determination $R^2$ is defined as the proportion of the variance explained by the independent variables, relative to the total variance in the data. This can be quantified as the ratio of explained sum of squares to total sum of squares:

$$ \label{eq:slr-R2-s1} R^2 = \frac{\mathrm{ESS}}{\mathrm{TSS}} ; . $$

Using the explained and total sum of squares for simple linear regression, we have:

$$ \label{eq:slr-R2-s2} \begin{split} R^2 &= \frac{\sum_{i=1}^{n} (\hat{y}i - \bar{y})^2}{\sum{i=1}^{n} (y_i - \bar{y})^2} \ &= \frac{\sum_{i=1}^{n} (\hat{\beta}_0 + \hat{\beta}1 x_i - \bar{y})^2}{\sum{i=1}^{n} (y_i - \bar{y})^2} ; . \end{split} $$

By applying \eqref{eq:slr-ols}, we can further develop the coefficient of determination:

$$ \label{eq:slr-R2-s3} \begin{split} R^2 &= \frac{\sum_{i=1}^{n} (\bar{y} - \hat{\beta}1 \bar{x} + \hat{\beta}1 x_i - \bar{y})^2}{\sum{i=1}^{n} (y_i - \bar{y})^2} \ &= \frac{\sum{i=1}^{n} \left( \hat{\beta}1 (x_i - \bar{x}) \right)^2}{\sum{i=1}^{n} (y_i - \bar{y})^2} \ &= \hat{\beta}1^2 , \frac{\frac{1}{n-1} \sum{i=1}^{n} (x_i - \bar{x})^2}{\frac{1}{n-1} \sum_{i=1}^{n} (y_i - \bar{y})^2} \ &= \hat{\beta}_1^2 , \frac{s_x^2}{s_y^2} \ &= \left( \frac{s_x}{s_y} , \hat{\beta}_1 \right)^2 ; . \end{split} $$

Using the relationship between correlation coefficient and slope estimate, we conclude:

$$ \label{eq:slr-R2-qed} R^2 = \left( \frac{s_x}{s_y} , \hat{\beta}1 \right)^2 = r{xy}^2 ; . $$