<a href="https://colab.research.google.com/github/anhquan-truong/PM520/blob/main/HW/PM520_HW1_AnhQuanTRUONG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Homework 1. Linear regression and normal equations

In [1]:
import jax
import jax.numpy as jnp
import jax.random as rdm
import jax.numpy.linalg as jnpla
import matplotlib.pyplot as plt


# 1. Linear model simulation
In class we defined a Python function that simulates $N$ $P\times 1$ variables $X$ (i.e. an $N \times P$ matrix $X$) and outcome $y$ as a linear function of $X$. Please include its definition here and use for problem 2.

Given $N \times P$ matrix $X$, a $P \times 1$ vector $a$, and $N \times 1$ outcome vector $y$, we can describe the $y$ as a linear function of $X$ as $$X\times a =y.$$

In [16]:
def sim_linear_reg(key, N, P, r2=0.5):
  key, x_key = rdm.split(key)
  a = rdm.normal(key, shape=(P,))
  X = rdm.normal(x_key, shape = (N, P))
  y = X @ a
  return a, X, y

seed = 912
key = rdm.PRNGKey(seed) # creating key from seed
print(key) # Print only the single key

N = 100 # variables
P = 10 # dimensions - or number of features

a, X, y = sim_linear_reg(key, N, P, r2=0.5)


[  0 912]


# 2. Just-in time decorator and ordinary least squares
Complete the definition of `ordinary_least_squares` below, that estimates the effect and its standard error. `@jit` wraps a function to perform just-in-time compilation, which boosts computational performance/speed.

Compare the times of with and without JIT
Hint: use [`block_until_ready()`](https://jax.readthedocs.io/en/latest/_autosummary/jax.block_until_ready.html) to get correct timing estimates.

In [15]:
import jax

from jax import jit


def ordinary_least_squares(X, y):
  beta_hat = jnpla.inv(X @ X.T) @ (X.T@y)
  return beta_hat

jit_ordinary_least_squares = jit(ordinary_least_squares)

jit_ordinary_least_squares(X,y)

TypeError: dot_general requires contracting dimensions to have the same shape, got (100,) and (10,).

# 3. OLS derivation
Assume that $y = X \beta + \epsilon$ where $y$ is $N \times 1$ vector, $X$ is an $N \times P$ matrix where $P < N$ and $\epsilon$ is a random variable such that $\mathbb{E}[\epsilon_i] = 0$ and $\mathbb{V}[\epsilon_i] = \sigma^2$ for all $i = 1 \dots n$. Derive the OLS "normal equations".

The goal is to find $\beta$ such that the residual sum of square is minimal. The $RSS(\beta)$ is

$$RSS(\beta)=\sum_{i=1}^n (y_i - x_i^T\beta)^2$$

We want to find

$$\beta^*=argmin \ RSS(\beta)$$

**Approach**: We find stationary point, i.e. the point with zero gradients. We take the derivative of $RSS(\beta)$ with respect to $\beta$

$$
\begin{align*}
\frac{\partial RSS(\beta)}{\partial \beta} &= 2\sum_{i=1}^n (x_i^T\beta-y_i)x_i \\
&=2\sum_{i=1}^n (x_i^T x_i \beta - x_i y_i) \\
&=2\sum_{i=1}^n (x_i^T \beta x_i - x_i y_i) \\
&=2\sum_{i=1}^n (x_i x_i^T) \beta - 2\sum_{i=1}^n (x_i y_i) \\
&=2\sum_{i=1}^n (x_i x_i^T) \beta - 2\sum_{i=1}^n (x_i y_i) \\
&=2[(X^T X) \beta - (X Y)] \\
\end{align*}
$$

From that we have
$$
\begin{align*}
\nabla RSS(\beta) &= 0 \iff 2[(X^T X) \beta - (X Y)]  = 0 \iff \beta = (X^T X)^{-1} (X^T Y)\\
\end{align*}
$$

assuming $(XX^T)^{-1}$ is invertible.
