# QR-decomposition

Here we implement a QR-decomposition which is amongst other things used in linear regression for **numerical stability**.
We apply our QR-decomposition for modelling a linear association and compare it to the analytical solution from `statsmodels` (which I recommend over `sklearn` as it computes t-values and condidence intervals).

The QR-decomposition can be used on any $(n \times p)$-dimensional **full rank** matrix $\mathbf{X}$ as:

\begin{equation}
\mathbf{X} = \mathbf{QR},
\end{equation}

where $\mathbf{Q} \in \mathbb{R}^{n \times n}$ and $\mathbf{R} \in \mathbb{R}^{n \times p}$. Most importantly: $\mathbf{Q}^T \mathbf{Q} = \mathbf{I}$.

Let's quickly recap linear regression. We model a linear dependency $f: \mathcal{X} \rightarrow \mathcal{Y}$ as:

\begin{align*}
\mathbf{y} \sim \mathcal{N}(\mathbf{X}\boldsymbol \beta, \boldsymbol \sigma^2),
\end{align*}

which we solve using maximum likelihood. For Gaussian responses this is identical to solving:

\begin{align*}
\min_{\boldsymbol \beta} ||\mathbf{y}  - \mathbf{X}\boldsymbol \beta||^2_2.
\end{align*}

Taking the derivative and setting it to $0$ yields:
\begin{align*}
\mathbf{X}^T\mathbf{X} \boldsymbol \beta &= \mathbf{X}^T \mathbf{y}\\
(\mathbf{QR})^T\mathbf{QR} \boldsymbol \beta &=  (\mathbf{QR})^T \mathbf{y} \\
\mathbf{R}^T\mathbf{Q}^T\mathbf{QR} \boldsymbol \beta &=  (\mathbf{QR})^T \mathbf{y} \\
\mathbf{R}^T \mathbf{R} \boldsymbol \beta &=  \mathbf{R}^T \mathbf{Q}^T \mathbf{y} \\
\mathbf{R} \boldsymbol \beta &=  \mathbf{Q}^T \mathbf{y}.
\end{align*}

The last part can be easily solved. Since $\mathbf{R}$ is upper triangular we can use `scipy`'s [solve_triangular](https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.solve_triangular.html) function.

Having set the theoretical groundwork, let's implement this.

In [1]:
import numpy as np
from scipy import linalg
import statsmodels.api as sm

  from pandas.core import datetools


First we create some artificial data:

In [2]:
P = 5
np.random.seed(23)
beta = np.random.normal(size=P)
x = np.random.normal(size=(100, P))
y =  np.dot(x, beta) + np.random.normal(size=(100))

Next we implement the QR-decomposition unsing the Householder triangularization. See *Numerical Recipes* by William Press *et al* and http://www.seas.ucla.edu/~vandenbe/133A/lectures/qr.pdf

In [3]:
def House(a):
    I = np.eye(a.shape[0])
    v = a / (a[0] + np.copysign(np.linalg.norm(a), a[0]))
    v[0] = 1
    H = I - (2 / np.dot(v, v)) * np.dot(v[:, None], v[None, :])
    return H

def qr(A):
    m, n = A.shape
    Q = np.eye(m)
    for i in range(n - (m == n)):
        H = np.eye(m)
        H[i:, i:] = House(A[i:, i])
        Q = np.dot(Q, H)
        A = np.dot(H, A)
    return Q, A

The QR-decomposition creates two matrices ($Q$ and $R$). We use it with the design matrix $\mathbf{X}$ as described above:

In [4]:
Q, R = qr(x)
print("Shape Q:", Q.shape)
print("Shape R:", R.shape)

Shape Q: (100, 100)
Shape R: (100, 5)


Having estimated $Q$ and $R$, we can estimate the coefficients of the linear model. However, first we need to reduce the matrices to fit the dimensions of our data.

$R$ is upper triangular, i.e. everything below the diagonal should be zero:

In [5]:
R[:5,:3]

array([[ 8.88330684e+00, -2.92345419e-01,  4.44391490e-01],
       [-1.54368023e-17, -1.05501105e+01, -6.54961705e-01],
       [ 8.28227958e-17, -1.49933382e-16, -9.68056219e+00],
       [ 1.24903834e-16,  3.35921668e-16,  4.76961851e-16],
       [ 2.40806889e-16, -1.21141555e-16, -8.41804989e-17]])

For $Q$ we just take the first $P$ columns to fit the dimensions of
\begin{equation}
\mathbf{R} \boldsymbol \beta =  \mathbf{Q}^T \mathbf{y}.
\end{equation}

In [6]:
R = R[:P, :P]
Q = Q[:, :P]

The next line solves the system and linear equations and by that estimates the coefficients $\boldsymbol \beta$ of the linear regression model:

In [7]:
beta_qr = linalg.solve_triangular(R, np.dot(Q.T, y))
beta_qr

array([ 0.73876796,  0.00352937, -0.68071947,  1.06087307,  0.85881879])

Let's see what the `statsmodel` implementation gives us:

In [8]:
fit = sm.OLS(y, x).fit()

In [9]:
fit.params

array([ 0.73876796,  0.00352937, -0.68071947,  1.06087307,  0.85881879])

Sweet, these are the exact same results. So our prototype implementation of the QR-decomposition apparently worked.
Using a well-maintained package is of course still preferable to our implementation. 

`statsmodels`'s summary function is almost the same as in modelling with `R`.

In [10]:
fit.summary()

0,1,2,3
Dep. Variable:,y,R-squared:,0.737
Model:,OLS,Adj. R-squared:,0.723
Method:,Least Squares,F-statistic:,53.13
Date:,"Sat, 21 Apr 2018",Prob (F-statistic):,4.9600000000000003e-26
Time:,20:20:55,Log-Likelihood:,-139.27
No. Observations:,100,AIC:,288.5
Df Residuals:,95,BIC:,301.6
Df Model:,5,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
x1,0.7388,0.113,6.548,0.000,0.515,0.963
x2,0.0035,0.095,0.037,0.970,-0.185,0.192
x3,-0.6807,0.104,-6.547,0.000,-0.887,-0.474
x4,1.0609,0.104,10.169,0.000,0.854,1.268
x5,0.8588,0.109,7.911,0.000,0.643,1.074

0,1,2,3
Omnibus:,0.065,Durbin-Watson:,1.959
Prob(Omnibus):,0.968,Jarque-Bera (JB):,0.109
Skew:,0.057,Prob(JB):,0.947
Kurtosis:,2.886,Cond. No.,1.27
