## Linear_Regression_Modelling

Consider two data series, $X = \left(x_{1}, x_{2}, ..., x_{n}\right)$ and $Y = \left(y_{1}, y_{2}, ..., y_{n}\right)$, both with mean zero. We use linear regression (ordinary least squares) to regress $Y$ against $X$ (without ﬁtting any intercept), as in $Y = aX + \epsilon$ where $\epsilon$ denotes a series of error terms.

Problems:

1. Calculate the value of the regression coefﬁcient $a$. If possible, express it in terms of the standard deviations $\sigma_{X}$ and $\sigma_{Y}$ and the correlation coefficient $\rho_{XY}$ between the two data series. You will need to show a complete derivation to score full marks.  

2. We scale up both data series by constant factors $s$ and $t$, i.e. $X' = sX$ and $Y' = tY$ , and regress $Y'$ against $X'$ as in $Y' = a'X' + \epsilon$. How does the new regression coefﬁcient $a'$ relate to the original coefﬁcient $a$? And what about the new correlation $\rho_{X'Y'}$ vs. the original correlation $\rho_{XY}$ ? Note that the new $\epsilon$ is not necessarily the same as the original one, it merely denotes another series of error terms.  

3. We now do the ‘inverse’ regression of $X$ against $Y$ , resulting in $X = bY + \epsilon$. How is the slope $b$ of the ‘inverse’ regression related to the slope $a$ of the original regression?   

4. Suppose that $\rho_{XY} = 0.01$. Is the resulting value of $a$ statistically signiﬁcantly different from $0$ at the $95\,\%$ level if:   
    i. $n = 10^{2}$  
    ii. $n = 10^{3}$  
    iii. $n = 10^{4}$  

In order to make life a bit easier, I'm going to assume that $\epsilon \sim \mathcal{N}\left(0,\sigma_{\epsilon}^{2}\right)$, $\textit{Cov}\left[\epsilon_{i}, \epsilon_{j}\right] = \sigma_{\epsilon}^{2} \cdot \delta_{ij}$ and $\textit{Var}\left[\epsilon_{i}\right] = \sigma^{2}_{\epsilon}\,\forall i\in\{1\dots,n\}$.

## Problem 1:

The idea of linear regression is to use a function $f^{*}\left(X\right)$ that is linear in a set of parameters $a_{i}\in A$ to predict $Y$ as close as possible. The function $f^{*}\left(X\right)$ is derived by chosing the parameters $a_{i}\in A$ for a function $f\left(A, X\right)$ so that this is achieved.

In our case we work with $f\left(a, X\right) = a\cdot X$ so that $Y = a\cdot X + \epsilon$. 

When we use ordinary least square to regress $Y$ to $X$, what we want to do is to minimize is the squared difference between our observed values $Y$ and the prediction from our function $f\left(a, X\right)$. We denote that as our loss function $L$, given by:

\begin{equation}
L = \sum_{i=1}^{n} \left(y_{i}-a\cdot x_{i}\right)^{2}
\end{equation}

By minimizing $L$, we can find the set of parameters for which $f\left(A, X\right)$ becomes $f^{*}\left(X\right)$, so let's do that now.

\begin{equation}
\begin{aligned}
\frac{\partial L}{\partial a} &=& \frac{\partial}{\partial a}\sum_{i=1}^{n} \left(y_{i}-a\cdot x_{i}\right)^{2} \\
&=& \sum_{i=1}^{n}\frac{\partial}{\partial a} \left(y_{i}-a\cdot x_{i}\right)^{2} \\
&=& \sum_{i=1}^{n}2\cdot\left(y_{i}-a\cdot x_{i}\right)\cdot\left(-x_{i}\right) \\
&=& -2\cdot\sum_{i=1}^{n}\left(y_{i}\cdot x_{i}-a\cdot x_{i}^{2}\right) \\
&=& -2\cdot\left(\sum_{i=1}^{n}y_{i}\cdot x_{i}\right) + 2\cdot\left(\sum_{i=1}^{n} a\cdot x_{i}^{2}\right) \\
&=& -2\cdot\left(\sum_{i=1}^{n}y_{i}\cdot x_{i}\right) + 2a\cdot\left(\sum_{i=1}^{n} x_{i}^{2}\right) \\
\end{aligned}
\end{equation}

From setting $\frac{\partial L}{\partial a}=0$, it follows that:

\begin{equation}
\begin{aligned}
a\cdot\sum_{i=1}^{n} x_{i}^{2} &=& \sum_{i=1}^{n}y_{i}\cdot x_{i}  \\
a &=& \frac{\sum_{i=1}^{n}y_{i}\cdot x_{i}}{\sum_{i=1}^{n} x_{i}^{2}}
\end{aligned}
\end{equation}

Now we have found an expression for $a$ in terms of sums over $x_{i}$ and $y_{i}$. However we want to express it in terms of the standard deviations $\sigma_{X}$ and $\sigma_{Y}$ and the correlation $\rho_{XY}$. So let's work out those and see if we can substitute them in our result.

The standard deviations $\sigma_{k}$ for a sample with $k = X, Y$ and $\bar{k}=\frac{1}{n}\sum_{i=1}^{n}k_i$ as the mean value of $k$ is given by:


\begin{equation}
\sigma_{k}^{2} = \frac{\sum_{i=1}^{n}\left(k_{i}-\bar{k}\right)^{2}}{n-1}
\end{equation}

Now since the mean values of $X$ and $Y$ are both zero, we can simplify the standard deviations by substituting $\bar{k}=0$:

\begin{equation}
\sigma_{k}^{2} = \frac{\sum_{i=1}^{n}k_{i}^{2}}{n-1}
\end{equation}

The correlation coefficient $\rho_{XY}$ for a sample is given by:

\begin{equation}
\rho_{XY} = \frac{\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)\cdot\left(y_{i}-\bar{y}\right)}{\sqrt{\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)^{2}}\sqrt{\sum_{i=1}^{n}\left(y_{i}-\bar{y}\right)^{2}}}
\end{equation}

Using our mean values of zero, we simply get:

\begin{equation}
\rho_{XY} = \frac{\sum_{i=1}^{n}x_{i}\cdot y_{i}}{\sqrt{\sum_{i=1}^{n}x_{i}^{2}}\sqrt{\sum_{i=1}^{n}y_{i}^{2}}}
\end{equation}

We can expand our equation for $\rho_{XY}$ by multiplying with $\frac{1}{\frac{n-1}{n-1}}$ and get:

\begin{equation}
\begin{aligned}
\rho_{XY} &=& \frac{\sum_{i=1}^{n}x_{i}\cdot y_{i}}{\sqrt{\frac{\sum_{i=1}^{n}x_{i}^{2}}{n-1}}\sqrt{\frac{\sum_{i=1}^{n}y_{i}^{2}}{n-1}}}\cdot \frac{1}{n-1} \\
&=& \frac{\sum_{i=1}^{n}x_{i}\cdot y_{i}}{\sigma_{X}\sigma_{Y}}\cdot \frac{1}{n-1}
\end{aligned}
\end{equation}

We recognize that the sums in $sigma_{X}$ and in $\rho_{XY}$ appear in $a$, so let's expand $a$ so that we substitute in both:

\begin{equation}
\begin{aligned}
a &=& \frac{\sum_{i=1}^{n}y_{i}\cdot x_{i}}{\sum_{i=1}^{n} x_{i}^{2}} \\
&=& \sum_{i=1}^{n}y_{i}\cdot x_{i} \cdot \frac{\sigma_{X}\sigma_{Y}}{\sigma_{X}\sigma_{Y}}\frac{n-1}{n-1} \cdot \frac{1}{\sum_{i=1}^{n} x_{i}^{2}} \\
&=& \left(\frac{\sum_{i=1}^{n}x_{i}\cdot y_{i}}{\sigma_{X}\sigma_{Y}}\cdot \frac{1}{n-1}\right) \cdot \frac{\sigma_{X}\sigma_{Y}\cdot\left(n-1\right)}{\sum_{i=1}^{n} x_{i}^{2}} \\
&=& \rho_{XY} \cdot \frac{\sigma_{X}\sigma_{Y}}{\frac{\sum_{i=1}^{n} x_{i}^{2}}{n-1}} \\
&=& \rho_{XY} \cdot \frac{\sigma_{X}\sigma_{Y}}{\sigma_{X}^{2}} \\
&=& \rho_{XY} \cdot \frac{\sigma_{Y}}{\sigma_{X}} \\
\end{aligned}
\end{equation}


So we find that $a$ is given by:

\begin{equation}
a = \rho_{XY} \cdot \frac{\sigma_{Y}}{\sigma_{X}}
\end{equation}

Therefore our function $f^{*}\left(X\right)$ is given by:


\begin{equation}
f^{*}\left(X\right) = \frac{\rho_{XY}\cdot\sigma_{Y}}{\sigma_{X}} \cdot X
\end{equation}

I want to test this, so first let's grab a useful expression for $\rho_{XY}$. Using the covariance $\textit{cov}\left(X,Y\right)$ of $X$ and $Y$ given by:

\begin{equation}
\begin{aligned}
\textit{cov}\left(X,Y\right) = \frac{\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)\cdot\left(y_{i}-\bar{y}\right)}{n-1}
\end{aligned}
\end{equation}

We can rewrite the correlation coefficient as:

\begin{equation}
\begin{aligned}
\rho_{XY} = \frac{\textit{cov}\left(X,Y\right)}{\sigma_{X}\sigma_{Y}}
\end{aligned}
\end{equation}

Thus $a$ can be expressed as:

\begin{equation}
a = \frac{\textit{cov}\left(X,Y\right)}{\sigma_{X}^{2}}
= \frac{\textit{cov}\left(X,Y\right)}{\textit{var}\left(X\right)}
\end{equation}

In [14]:
import numpy as np
import matplotlib.pyplot as plt

mu_X = range(-10,11,1)
sigma_X = 0.5
measurements = 10

X = []
Y = []
for mu in mu_X:
    Y.extend([mu]*measurements)
    X.extend(np.random.normal(mu, sigma_X, measurements))

print("Mean of Y: {0:.2f}".format(np.mean(Y)))
print("Mean of X: {0:.2f}".format(np.mean(X)))
if np.mean(X) != 0:
    print("Due to statistical fluctuations the mean value of X is not exactly 0.\n")
    
cov_XY_matrix = np.cov(X,Y)
cov_XY = cov_XY_matrix[0,1]
var_X = cov_XY_matrix[0,0]
var_Y = cov_XY_matrix[1,1]

a = cov_XY / var_X

print("Our guess for the slope a is: {0:.3f}\n".format(a))

# Do linear regression
from sklearn.linear_model import LinearRegression
X_reg = np.array(X).reshape((-1, 1))

reg = LinearRegression(fit_intercept=0).fit(X_reg, Y)

print("Running the linear regression with a forced intercept of 0, we get:\n")
print("Slope a: {0:.3f}".format(reg.coef_[0]))
print("Intercept: {}".format(reg.intercept_))

print("\n The difference between our guess and the found slope is: {0:.5f}".format(abs(reg.coef_[0]-a)))

Mean of Y: 0.00
Mean of X: 0.05
Due to statistical fluctuations the mean value of X is not exactly 0.

Our guess for the slope a is: 0.990

Running the linear regression with a forced intercept of 0, we get:

Slope a: 0.990
Intercept: 0.0

 The difference between our guess and the found slope is: 0.00006


So as we can see our result seems to be pretty good. Since $\bar{x}$ is not exactly $0$, we have a small difference.

## Problem 2:

So let's see what scaling both $X$ and $Y$ does to everything that we have calculated so far. In order to do that we simply have to follow:

\begin{equation}
\begin{aligned}
X &\longrightarrow& X' = s\cdot X \\
Y &\longrightarrow& Y' = t\cdot Y
\end{aligned}
\end{equation}

We can start by the mean and find that is scaled by the specific factor, but since the old mean was zero, it remains unchanged:

\begin{equation}
\begin{aligned}
\bar{x}' &=& \frac{1}{n}\cdot\sum_{i=1}^{n} x_{i}' &=& \frac{1}{n}\cdot\sum_{i=1}^{n}s\cdot x_{i} &=& s\bar{x} &=& 0\\
\bar{y}' &=&  \frac{1}{n}\cdot\sum_{i=1}^{n} y_{i}' &=& \frac{1}{n}\cdot\sum_{i=1}^{n}t\cdot y_{i} &=& t\bar{y} &=& 0\\
\end{aligned}
\end{equation}

Knowing $\bar{x}'$ and $\bar{y}'$ we can work out $\sigma_{X'}$ and $\sigma_{Y'}$:

\begin{equation}
\begin{aligned}
\sigma_{X'}^{2} &=& \frac{\sum_{i=1}^{n}\left(x_{i}'-\bar{x}'\right)^{2}}{n-1} 
= \frac{\sum_{i=1}^{n}s^{2}x_{i}^{2}}{n-1} 
= s^{2}\cdot\frac{\sum_{i=1}^{n}x_{i}^{2}}{n-1} 
= s^{2}\sigma_{X}^{2} \\
\\
\sigma_{Y'}^{2} &=& \frac{\sum_{i=1}^{n}\left(y_{i}'-\bar{y}'\right)^{2}}{n-1} 
= \frac{\sum_{i=1}^{n}t^{2}y_{i}^{2}}{n-1} 
= t^{2}\cdot\frac{\sum_{i=1}^{n}y_{i}^{2}}{n-1} 
= t^{2}\sigma_{Y}^{2} \\
\end{aligned}
\end{equation}

And from there we can look at $\rho_{X'Y'}$:

\begin{equation}
\rho_{X'Y'} = \frac{\sum_{i=1}^{n}x_{i}'\cdot y_{i}'}{\sigma_{X'}\sigma_{Y'}}\cdot \frac{1}{n-1} 
= \frac{\sum_{i=1}^{n}sx_{i} \cdot t y_{i}}{s\sigma_{X}\cdot t\sigma_{Y}}\cdot \frac{1}{n-1} 
= \frac{st\sum_{i=1}^{n}x_{i}\cdot y_{i}}{st\cdot\sigma_{X}\cdot\sigma_{Y}}\cdot \frac{1}{n-1} 
= \frac{\sum_{i=1}^{n}x_{i}\cdot y_{i}}{\sigma_{X}\cdot\sigma_{Y}}\cdot \frac{1}{n-1} 
= \rho_{XY}
\end{equation}

As we can see $\rho_{X'Y'}$ is unchanged just like the mean values of $X$ and $Y$. Now we have everything we need to calculate the slope $a'$:

\begin{equation}
a' = \frac{\rho_{X'Y'}\cdot\sigma_{Y'}}{\sigma_{X'}}
= \frac{\rho_{XY}\cdot t\sigma_{Y}}{s\sigma_{X}}
= \frac{t}{s}\cdot \frac{\rho_{XY}\cdot\sigma_{Y}}{\sigma_{X}}
= \frac{t}{s}\cdot a
\end{equation}

So finally, we can see that $a'$ is just $a$ scaled by the ratio $\frac{t}{s}$. That makes perfect scence, since we also could have started with $Y' = a'X' + \epsilon$ and work out $a'$ from there in the following fashion:

\begin{equation}
\begin{aligned}
Y' = a'X' + \epsilon \\
\\
tY = a'sX+ \epsilon \\
\\
Y = \frac{s}{t}\cdot a' \cdot X + \frac{\epsilon}{t}
\end{aligned}
\end{equation}

And recognize that :

\begin{equation}
a = \frac{s}{t}\cdot a'
\end{equation}

Furhtermore, we can also think through a few cases. If we left $Y$ unscaled, so $t=1$ and we scale up $X$ by a factor $s>1$, then $a'$ must be smaller than $a$ to compensate the upscaling of $X$ for $Y$ to remain unchanged. On the other hand, if we scale up $Y$ by a factor $t>1$ and we left $X$ unchanged, then $a'$ must be bigger than $a$ to compensate the upscaling of $Y$. So $a'$ must depend on both scaling factors.

Now let's check that again with the same piece of code.

In [15]:
s = 7.4
t = 1/5

X_scale = [s*x for x in X]
Y_scale = [t*y for y in Y]

print("Mean of Y_scale: {0:.2f}".format(np.mean(Y_scale)))
print("Mean of X_scale: {0:.2f}".format(np.mean(X_scale)))
if np.mean(X_scale) != 0:
    print("Due to statistical fluctuations the mean value of X_scale is not exactly 0.\n")
    
a_scale = a * t/s

print("Our guess for the slope a_scale is: {0:.3f}\n".format(a_scale))

# Do linear regression
from sklearn.linear_model import LinearRegression
X_scale_reg = np.array(X_scale).reshape((-1, 1))

reg_scale = LinearRegression(fit_intercept=0).fit(X_scale_reg, Y_scale)

print("Running the linear regression with a forced intercept of 0, we get:\n")
print("Slope a_scale: {0:.3f}".format(reg_scale.coef_[0]))
print("Intercept_scale: {}".format(reg_scale.intercept_))

print("\n The difference between our guess and the found slope is: {0:.5f}".format(abs(reg_scale.coef_[0]-a_scale)))

Mean of Y_scale: -0.00
Mean of X_scale: 0.35
Due to statistical fluctuations the mean value of X_scale is not exactly 0.

Our guess for the slope a_scale is: 0.027

Running the linear regression with a forced intercept of 0, we get:

Slope a_scale: 0.027
Intercept_scale: 0.0

 The difference between our guess and the found slope is: 0.00000


## Problem 3:

Since we didn't change the structure of our problem, we can get the result by taking the old result and act with the following transformation on it:

\begin{equation}
\begin{aligned}
Y &\longrightarrow& X \\
X &\longrightarrow& Y \\
a &\longrightarrow& b \\
\end{aligned}
\end{equation}

Hence:

\begin{equation}
b = \frac{\rho_{YX}\cdot\sigma_{X}}{\sigma_{Y}} 
= \frac{\rho_{XY}\sigma_{X}}{\sigma_{Y}}
= \frac{\rho_{XY}\sigma_{X}}{\sigma_{Y}}\cdot\frac{\sigma_{X}}{\sigma_{X}}\cdot\frac{\sigma_{Y}}{\sigma_{Y}}
= \frac{\rho_{XY}\sigma_{Y}}{\sigma_{X}}\cdot\frac{\sigma_{X}^{2}}{\sigma_{Y}^{2}}
= a\cdot \frac{\sigma_{X}^{2}}{\sigma_{Y}^{2}}
= a\cdot \frac{\textit{var}\left(X\right)}{\textit{var}\left(Y\right)}
\end{equation}

So let us test that as well.

In [17]:
b = a * var_X / var_Y

print("Our guess for the slope b is: {0:.3f}\n".format(b))

# Do linear regression
from sklearn.linear_model import LinearRegression
Y_reg = np.array(Y).reshape((-1, 1))

reg_inverse = LinearRegression(fit_intercept=0).fit(Y_reg, X)

print("Running the linear regression with a forced intercept of 0, we get:\n")
print("Slope b: {0:.3f}".format(reg_inverse.coef_[0]))
print("Intercept inverse: {}".format(reg_inverse.intercept_))

print("\n The difference between our guess and the found slope is: {0:.5f}".format(abs(reg_inverse.coef_[0]-b)))

Our guess for the slope b is: 1.003

Running the linear regression with a forced intercept of 0, we get:

Slope b: 1.003
Intercept inverse: 0.0

 The difference between our guess and the found slope is: 0.00000


So far so good, let's move on to the final problem

## Problem 4:

Now we want to check two hypotheses, the null-hypothesis $H_{0}$ stating that the slope $a$ is zero and the alternative hypothesis $H_{a}$ stating that the slope $a$ is not zero.

\begin{equation}
\begin{aligned}
H_{0}: a = 0 \\
H_{a}: a \neq 0
\end{aligned}
\end{equation}

Now I normally would calculate the test statistic $t$ by:

\begin{equation}
t = \frac{\hat{a}-a_{0}}{\textit{SE}\left[\hat{a}\right]}
\end{equation}

With $\hat{a}$ as our estimated value for the slope $a$, $a_0 = 0$ the value of for the slope a in the case of our null-hypothesis and $\textit{SE}\left[\hat{a}\right]$ as the standard error for $\hat{a}$. Once we have the t-statistic, we can calculate the values for a given $n$ and compare that to a table for the $95\,\%$ significant level and see if our result is significant.

Now, I don't know a simple formula to calculate that by heart, so let's derive it (it will be long).

Let's start by finding the expected value for $\hat{a}$. For that we will our result for $\hat{a}$ from problem 1:

\begin{equation}
\hat{a} = \frac{\sum_{i=1}^{n}\left(y_{i}-\bar{y}\right)\cdot\left(x_{i}-\bar{x}\right)}{\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)^{2}}
\end{equation}

Given that both $\sum_{i=1}^{n}\left(k_{i}-\bar{k}\right) = 0$ for $k = x, y$, we can expand both sums and get rid of the term that is multiplied by $\bar{k}$, since we can just pull that infront of the sum. Hence:

\begin{equation}
\hat{a} = \frac{\sum_{i=1}^{n}y_{i}\cdot\left(x_{i}-\bar{x}\right)}{\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)\cdot x_{i}}
\end{equation}

Furhtermore we will need a neat little property of the expected value called the linearity of expectations:

\begin{equation}
\mathit{E}\left[aX+bY\right] = \frac{1}{n}\cdot\sum_{i=1}^{n}\left(aX+bY\right)
= \frac{1}{n}\cdot\left(\sum_{i=1}^{n}aX\right) + \frac{1}{n}\cdot\left(\sum_{i=1}^{n}bY\right)
= \frac{a}{n}\cdot\left(\sum_{i=1}^{n}X\right) + \frac{b}{n}\cdot\left(\sum_{i=1}^{n}Y\right)
= a\cdot \mathit{E}\left[X\right] + b \cdot \mathit{E}\left[Y\right]
\end{equation}

On top of that we will assume $X$ to be known fixed values so that $\mathit{E}\left[x_{i}\right] = x_{i}$, $\forall i \in \{1, \dots, n\}$ and that the error terms are normally distributed around the mean value of $0$ so that $\mathit{E}\left[\epsilon_{i}\right] = 0$, $\forall i \in \{1, \dots, n\}$.

Now let's begin to derive the expected value of $\hat{a}$:

\begin{equation}
\begin{aligned}
\mathit{E}\left[\hat{a}\right] &=& \mathit{E}\left[\frac{\sum_{i=1}^{n}\left(y_{i}-\bar{y}\right)\cdot\left(x_{i}-\bar{x}\right)}{\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)^{2}}\right]\\
&=& \frac{1}{\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)^{2}}\cdot\mathit{E}\left[\sum_{i=1}^{n}\left(y_{i}-\bar{y}\right)\cdot\left(x_{i}-\bar{x}\right)\right]\\
&=& \frac{\mathit{E}\left[\sum_{i=1}^{n}y_{i}\cdot\left(x_{i}-\bar{x}\right)\right]}{\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)\cdot x_{i}}\\
&=& \frac{\sum_{i=1}^{n}\mathit{E}\left[y_{i}\cdot\left(x_{i}-\bar{x}\right)\right]}{\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)\cdot x_{i}}\\
&=& \frac{\sum_{i=1}^{n}\mathit{E}\left[\left(a\cdot x_{i} + \epsilon_{i}\right)\cdot\left(x_{i}-\bar{x}\right)\right]}{\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)\cdot x_{i}}\\
&=& \frac{\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)\cdot\mathit{E}\left[a\cdot x_{i} + \epsilon_{i}\right]}{\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)\cdot x_{i}}\\
&=& \frac{\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)\cdot\left(a\cdot x_{i}+\mathit{E}\left[\epsilon_{i}\right]\right)}{\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)\cdot x_{i}}\\
&=& \frac{\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)\cdot a\cdot x_{i}}{\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)\cdot x_{i}}\\
&=& a 
\end{aligned}
\end{equation}

Great. As we can see we have an unbiased estimater for the slope $a$.

Now let's have a look at the variance of $\hat{a}$. But again let's quickly introduce some properties that we will need. Also the assumption about our fixed known values of $X$ still stands. We will start of with the definition of the variance and show that there is no linearity for the variance and that the variance of a constant vanishes. Given the definition of the variance

\begin{equation}
\begin{aligned}
\textit{Var}\left[X\right] &=& \mathit{E}\left[\left(X - \mathit{E}\left[X\right]\right)^{2}\right] \\
&=& \mathit{E}\left[X^{2} + \left(\mathit{E}\left[X\right]\right)^{2} - 2X\mathit{E}\left[X\right]\right] \\
&=& \mathit{E}\left[X^{2}\right] + \left(\mathit{E}\left[X\right]\right)^{2} - 2\left(\mathit{E}\left[X\right]\mathit{E}\left[X\right]\right) \\
&=& \mathit{E}\left[X^{2}\right] - \left(\mathit{E}\left[X\right]\right)^{2}\, ,
\end{aligned}
\end{equation}

we can work out that for a constant $a$ and $c$ and a random variable $X$:

\begin{equation}
\begin{aligned}
\textit{Var}\left[aX + c\right] &=& \mathit{E}\left[\left(aX + c - \mathit{E}\left[aX + c\right]\right)^{2}\right] \\
&=& \mathit{E}\left[\left(aX\right)^{2} + c^{2} + \left(\mathit{E}\left[aX + c\right]\right)^{2} + 2aX\cdot c - 2aX\cdot\mathit{E}\left[aX + c\right] - 2c\cdot\mathit{E}\left[aX + c\right]\right] \\
&=& \mathit{E}\left[\left(aX\right)^{2}\right] + \mathit{E}\left[c^{2}\right] + \left(\mathit{E}\left[aX + c\right]\right)^{2} + \mathit{E}\left[2aX\cdot c\right] - \mathit{E}\left[2aX\cdot\mathit{E}\left[aX + c\right]\right] - \mathit{E}\left[2c\cdot\mathit{E}\left[aX + c\right]\right] \\
&=& a^{2}\mathit{E}\left[X^{2}\right] + c^{2} + \left(a\mathit{E}\left[X\right] + c\right)^{2} + 2ac\cdot\mathit{E}\left[X\right] - 2a\mathit{E}\left[X\right]\cdot\mathit{E}\left[aX + c\right] - 2c\cdot\mathit{E}\left[aX + c\right] \\
&=& a^{2}\mathit{E}\left[X^{2}\right] + c^{2} + a^{2}\left(\mathit{E}\left[X\right]\right)^{2} + c^{2} + 2ac\mathit{E}\left[X\right] + 2ac\cdot\mathit{E}\left[X\right] - 2a\mathit{E}\left[X\right]\cdot\left(a\mathit{E}\left[X\right] + c\right) - 2c\cdot\left(a\mathit{E}\left[X\right] + c\right) \\
&=& a^{2}\mathit{E}\left[X^{2}\right] + 2c^{2} + a^{2}\left(\mathit{E}\left[X\right]\right)^{2}  + 4ac\cdot\mathit{E}\left[X\right] - 2a^{2}\left(\mathit{E}\left[X\right]\right)^{2}- 2ac\cdot\mathit{E}\left[X\right] - 2ac\cdot\mathit{E}\left[X\right] - 2c^{2} \\
&=& a^{2}\mathit{E}\left[X^{2}\right] - a^{2}\left(\mathit{E}\left[X\right]\right)^{2}\\
&=& a^{2}\left(\mathit{E}\left[X^{2}\right] - \left(\mathit{E}\left[X\right]\right)^{2}\right)\\
&=& a^{2}\textit{Var}\left[X\right]
\end{aligned}
\end{equation}

So adding a constant does nothing for the variance, which is what you would expect since both the values and the expected value of the random variable are shifted by the same amount. Furhtermore we can see that the variance also get's scaled by the square of the factor that is infront of the random variable.

Now we almost have everything we need to calculate $\textit{Var}\left[\hat{a}\right]$. The last thing we will need to know about the variance is when we look at the sum over the values of a random variable (multiplied by a constant $c_{i}$). Without proof this time:

\begin{equation}
\textit{Var}\left[\sum_{i=1}^{n}c_{i}\cdot\epsilon_{i}\right] = \sum_{i=1}^{n}\textit{Var}\left[c_{i}\cdot\epsilon_{i}\right] + 2\cdot \sum_{i<j}\sum_{j=1}^{n}c_{i}c_{j}\textit{Cov}\left[\epsilon_{i}\epsilon_{j}\right]
\end{equation}

Equipped with the knowledge about the variances we can finally attempt to calculate $\textit{Var}\left[\hat{a}\right]$. We will again assume that the values for $X$ are known fixed values so that $\textit{Var}\left[\left(x_{i}-\bar{x}\right)\cdot\epsilon_{i}\right] = \left(x_{i}-\bar{x}\right)^{2}\textit{Var}\left[\epsilon_{i}\right]$ for all $i\in \{1,\dots,n\}$.

\begin{equation}
\begin{aligned}
\textit{Var}\left[\hat{a}\right] 
&=& \textit{Var}\left[\frac{\sum_{i=1}^{n}\left(y_{i}-\bar{y}\right)\cdot\left(x_{i}-\bar{x}\right)}{\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)^{2}}\right] \\
&=& \frac{1}{\left(\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)^{2}\right)^{2}}\cdot\textit{Var}\left[\sum_{i=1}^{n}y_{i}\cdot\left(x_{i}-\bar{x}\right)\right] \\
&=& \frac{1}{\left(\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)^{2}\right)^{2}}\cdot\textit{Var}\left[\sum_{i=1}^{n}\left(ax_{i}+\epsilon_{i}\right)\cdot\left(x_{i}-\bar{x}\right)\right] \\
&=& \frac{1}{\left(\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)^{2}\right)^{2}}\cdot\textit{Var}\left[\left(\sum_{i=1}^{n}ax_{i}\cdot\left(x_{i}-\bar{x}\right)\right)+\left(\sum_{i=1}^{n}\epsilon_{i}\cdot\left(x_{i}-\bar{x}\right)\right)\right] \\
&=& \frac{1}{\left(\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)^{2}\right)^{2}}\cdot\textit{Var}\left[\sum_{i=1}^{n}\epsilon_{i}\cdot\left(x_{i}-\bar{x}\right)\right] \\
&=& \frac{1}{\left(\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)^{2}\right)^{2}}\cdot\sum_{i=1}^{n}\textit{Var}\left[\epsilon_{i}\cdot\left(x_{i}-\bar{x}\right)\right] \\
&=& \frac{1}{\left(\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)^{2}\right)^{2}}\cdot\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)^{2}\cdot\textit{Var}\left[\epsilon_{i}\right] \\
&=& \frac{1}{\left(\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)^{2}\right)^{2}}\cdot\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)^{2}\cdot\sigma_{\epsilon}^{2} \\
&=& \frac{\sigma_{\epsilon}^{2}}{\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)^{2}} \\
\end{aligned}
\end{equation}

There we go. From here we can easily compute the standard error of the slope $\textit{SE}\left[\hat{a}\right]$ with:

\begin{equation}
\textit{SE}\left[\hat{a}\right] 
= \sqrt{\textit{Var}\left[\hat{a}\right]}
= \frac{\sigma_{\epsilon}}{\sqrt{\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)^{2}}}
\end{equation}

With $\sigma_{\epsilon}$ as the variance of the error term $\epsilon$. Since we don't know the the variance of the error term, we need to estimate it with the sample variance $s$:

\begin{equation}
s^{2} 
= \frac{\sum_{i=1}^{n}\left(y_{i}-\hat{a}x_{i}\right)^{2}}{n-1} 
= \frac{\textit{RSS}}{n-1}
\end{equation}

With $\textit{RSS}$ as the residual sum of squares. For that we can show:

\begin{equation}
\begin{aligned}
\textit{RSS} 
&=& \sum_{i=1}^{n}\left(y_{i}-\hat{a}x_{i}\right)^{2} \\
&=& \sum_{i=1}^{n}\left(\left(y_{i}-\bar{y}\right)-\hat{a}\left(x_{i}-\bar{x}\right)\right)^{2} \\
&=& \sum_{i=1}^{n}\left(y_{i}-\bar{y}\right)^{2} + \hat{a}^{2}\cdot\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)^{2} - 2\hat{a}\cdot\sum_{i=1}^{n}\left(y_{i}-\bar{y}\right)\left(x_{i}-\bar{x}\right) \\
&=& \sum_{i=1}^{n}\left(y_{i}-\bar{y}\right)^{2} + \hat{a}\cdot\frac{\sum_{i=1}^{n}\left(y_{i}-\bar{y}\right)\cdot\left(x_{i}-\bar{x}\right)}{\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)^{2}}\cdot\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)^{2} - 2\hat{a}\cdot\sum_{i=1}^{n}\left(y_{i}-\bar{y}\right)\left(x_{i}-\bar{x}\right) \\
&=& \sum_{i=1}^{n}\left(y_{i}-\bar{y}\right)^{2} + \hat{a}\cdot\left(\sum_{i=1}^{n}\left(y_{i}-\bar{y}\right)\left(x_{i}-\bar{x}\right)\right) - 2\hat{a}\cdot\left(\sum_{i=1}^{n}\left(y_{i}-\bar{y}\right)\left(x_{i}-\bar{x}\right)\right) \\
&=& \sum_{i=1}^{n}\left(y_{i}-\bar{y}\right)^{2} - \hat{a}\cdot\left(\sum_{i=1}^{n}\left(y_{i}-\bar{y}\right)\left(x_{i}-\bar{x}\right)\right) \\
&=& \sum_{i=1}^{n}\left(y_{i}-\bar{y}\right)^{2} - \frac{\sum_{i=1}^{n}\left(y_{i}-\bar{y}\right)\cdot\left(x_{i}-\bar{x}\right)}{\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)^{2}}\cdot\left(\sum_{i=1}^{n}\left(y_{i}-\bar{y}\right)\left(x_{i}-\bar{x}\right)\right) \\
&=& \sum_{i=1}^{n}\left(y_{i}-\bar{y}\right)^{2} - \frac{\left(\sum_{i=1}^{n}\left(y_{i}-\bar{y}\right)\cdot\left(x_{i}-\bar{x}\right)\right)^{2}}{\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)^{2}} \\
&=& \sum_{i=1}^{n}\left(y_{i}-\bar{y}\right)^{2}\cdot \left(1-\frac{\left(\sum_{i=1}^{n}\left(y_{i}-\bar{y}\right)\cdot\left(x_{i}-\bar{x}\right)\right)^{2}}{\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)^{2}\cdot\sum_{i=1}^{n}\left(y_{i}-\bar{y}\right)^{2}}\right) \\
&=& \sum_{i=1}^{n}\left(y_{i}-\bar{y}\right)^{2}\cdot \left(1-\rho_{XY}^{2}\right) \\
\end{aligned}
\end{equation}

Now if we work out the $t$-statistics:

\begin{equation}
\begin{aligned}
t 
&=& \frac{\hat{a}-a_{0}}{\textit{SE}\left[\hat{a}\right]} \\
&=& \frac{\hat{a}\cdot\sqrt{\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)^{2}}}{\sigma_{\epsilon}} \\
&\approx& \frac{\hat{a}\cdot\sqrt{\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)^{2}}}{s} \\
&\approx& \frac{\hat{a}\cdot\sqrt{\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)^{2}}\cdot\sqrt{n-1}}{\sqrt{\textit{RSS}}} \\
&\approx& \frac{\hat{a}\cdot\sqrt{\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)^{2}}\cdot\sqrt{n-1}}{\sqrt{1-\rho_{XY}^{2}}\sqrt{\sum_{i=1}^{n}\left(y_{i}-\bar{y}\right)^{2}}} \\
&\approx& \frac{\rho_{XY} \cdot \frac{\sigma_{Y}}{\sigma_{X}}\cdot\sqrt{\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)^{2}}\cdot\sqrt{n-1}}{\sqrt{1-\rho_{XY}^{2}}\sqrt{\sum_{i=1}^{n}\left(y_{i}-\bar{y}\right)^{2}}} \\
&\approx& \rho_{XY}\cdot\frac{\sqrt{n-1}}{\sqrt{1-\rho_{XY}^{2}}} \\
\end{aligned}
\end{equation}

And now we can calculate the $t$-value for the given $\rho_{XY}$ and $n$ values.

In [13]:
n_list = [10**2, 10**3, 10**4]
rho_xy = 0.01

for n in n_list:
    print("For n = {}".format(n) + ", we find a t-value of: {0:.3f}".format(rho_xy * (n-1)**0.5 / (1-rho_xy**2)**0.5))

For n = 100, we find a t-value of: 0.100
For n = 1000, we find a t-value of: 0.316
For n = 10000, we find a t-value of: 1.000


The $t$-distributions with $n-1$ degrees of freedom can be approximated with a normal distribution for the given value of $n$. Thus we can apply the __$68–95–99.7$ rule__, which means if our __$t$-value is less than two__, then the result is __not significant__ at $95\,\%\,\textit{CL}\,$. Hence none of the above are statistical significant at $95\,\%\,\textit{CL}\,$. We would need a value of n greater or equal to:

\begin{equation}
t = \rho_{XY}\cdot\frac{\sqrt{n-1}}{\sqrt{1-\rho_{XY}^{2}}}\Longleftrightarrow n = \frac{t^{2}\left(1-\rho_{XY}^{2}\right)}{\rho_{XY}^{2}} + 1 \\
n = \frac{2^{2}\left(1-0.01^{2}\right)}{0.01^{2}} + 1 = 39,996
\end{equation}

In [15]:
n = 40000
rho_xy = 0.01

print("For n = {}".format(n) + ", we find a t-value of: {0:.3f}".format(rho_xy * (n-1)**0.5 / (1-rho_xy**2)**0.5))

For n = 40000, we find a t-value of: 2.000


Though keep in mind that this is an approximation and you should better be using a t-table (or compute the necessary t-value yourself!).