# Matrix Approach to Simple Linear Regression Analysis

### TBD

This chapter isn't so important for doing statistics in Python. Will get back to this for the fun of being able to do `numpy` well. But I assume the numerical programmer trying to code up their own raw statistical methods will already be more versed in this than I!

In [1]:
import numpy as np
import pandas as pd

In [6]:
df = pd.DataFrame({
    "x1": [68.5, 45.2, 91.3, 47.8, 46.9, 66.1, 49.5, 52, 48.9, 38.4, 87.9, 72.8, 88.4, 42.9, 52.5, 85.7, 41.3, 51.7, 89.6, 82.7, 52.3], 
    "x2": [16.7, 16.8, 18.2, 16.3, 17.3, 18.2, 15.9, 17.2, 16.6, 16, 18.3, 17.1, 17.4, 15.8, 17.8, 18.4, 16.5, 16.3, 18.1, 19.1, 16], 
    "y": [174.4, 164.4, 244.2, 154.6, 181.6, 207.5, 152.8, 163.2, 145.4, 137.2, 241.9, 191.1, 232, 145.3, 161.1, 209.7, 146.4, 144, 232.6, 224.1, 166.5]
})

df

Unnamed: 0,x1,x2,y
0,68.5,16.7,174.4
1,45.2,16.8,164.4
2,91.3,18.2,244.2
3,47.8,16.3,154.6
4,46.9,17.3,181.6
5,66.1,18.2,207.5
6,49.5,15.9,152.8
7,52.0,17.2,163.2
8,48.9,16.6,145.4
9,38.4,16.0,137.2


# 5.1 Matrices (p 176)

Numpy matrices are just array lists of row lists. Here we have a 3x2 array, indicating a list of 3 rows where each row list has 2 column values or fields. 

In [13]:
x = np.array([
    [16000, 23],
    [33000, 47],
    [21000, 35]
])
print(f"A {x.ndim} dimensional array of sizes {x.shape} for a total of {x.size} elements")
x

A 2 dimensional array of sizes (3, 2) for a total of 6 elements


array([[16000,    23],
       [33000,    47],
       [21000,    35]])

### Square Matrix

In [16]:
np.array([
    [4, 7],
    [3, 9]
])

array([[4, 7],
       [3, 9]])

### Column vectors

We can reshape this 1-dimensional array (row vector) into a single column vector. The `-1` here acts as a "size n" number of rows.

In [25]:
np.array([4, 7, 10]).reshape(-1,1)

array([[ 4],
       [ 7],
       [10]])

### Transposing a matrix

In [27]:
A = np.array([[2, 5], [7, 10], [3, 4]])
display(A)
display(A.transpose())

array([[ 2,  5],
       [ 7, 10],
       [ 3,  4]])

array([[ 2,  7,  3],
       [ 5, 10,  4]])

The transpose of a 1-dimensional row vector turned into a 2-dimensional column vector is a 2-dimensional row vector (notice the double list brackets). This is confirmed by its shape.

In [44]:
C = np.array([4, 7, 10])
display(C)
display(C.reshape(-1, 1))
display(C.reshape(-1, 1).transpose())

C.shape, C.reshape(-1, 1).shape, C.reshape(-1, 1).transpose().shape


array([ 4,  7, 10])

array([[ 4],
       [ 7],
       [10]])

array([[ 4,  7, 10]])

((3,), (3, 1), (1, 3))

### Matrix equality

Two matrices $A$ and $B$ are said to be equal if they have the same dimension and if all corresponding elements are equal. Conversely, if two matrices are equal, their corresponding elements are equal. For example, if:

$\underset{3x1}A= \begin{bmatrix} a_1 \\ a_2 \\ a_3 \end{bmatrix} \quad \quad \underset{3x1}B= \begin{bmatrix} 4 \\ 7 \\ 3 \end{bmatrix}$

then $A=B$ implies:

$a_1=4 \quad a_2=7 \quad a_3=3$

### Regression Example (p 180)

In regression analysis, one basic matrix is the vector $Y$, consisting ofthe $n$ observations on the response variable:

$\quad \underset{nx1}Y = \begin{bmatrix} Y_1 \\Y_2 \\ \vdots \\Y_n \end{bmatrix}$

Note that the tranpose $Y^{'}$ is the row vector

$\quad Y^{'} = \begin{bmatrix} Y_1 & Y_2 & \dots & Y_n \end{bmatrix}$

Another basic matrix in regression analysis is the $X$ matrix, which is defined as follows for simple linear regression analysis:

$\quad \underset{nx2}X = \begin{bmatrix} 1 & X1 \\1 & X2 \\\vdots & \vdots \\1 & X_n \end{bmatrix}$

The matrix $X$ consists of a column of 1s and a column containing the $n$ observations on the predictor variable $X$. Note that the transpose of $X$ is

$\quad \underset{nx2}X^{'} = \begin{bmatrix} 1 & 1 & \dots & 1 \\ X_1 & X_2 & \dots & X_n \end{bmatrix}$

The $X$ matrix is often referred to as the *design matrix*.

# 5.2 Matrix Addition and Subtraction (p 180)

In [47]:
A = np.array([
    [1, 4],
    [2, 5],
    [3, 6]
])

B = np.array([
    [1, 2],
    [2, 3],
    [3, 4]
])

display(A)
display(B)
display(A + B)
display(A - B)

array([[1, 4],
       [2, 5],
       [3, 6]])

array([[1, 2],
       [2, 3],
       [3, 4]])

array([[ 2,  6],
       [ 4,  8],
       [ 6, 10]])

array([[0, 2],
       [0, 2],
       [0, 2]])

### Regression Example (p 181)

The regression model:

$\quad Y_i = E\{Y_i\} + \epsilon_i \quad i=1, \dots, n$

can be written compactly in matrix notation. First, let us define the vector of the mean responses:

$\quad \underset{nx1}{E\{Y\}} = \begin{bmatrix} E\{Y_1\} \\ E\{Y_2\} \\ \vdots \\ E\{Y_n\} \end{bmatrix}$

and the vector of the error terms:

$\quad \underset{nx1}\epsilon = \begin{bmatrix} \epsilon_1 \\ \epsilon_2 \\ \vdots \\ \epsilon_n \end{bmatrix}$

Recalling the definition of the observations vector $Y$ above, we can write the regression model as follows:

$\quad \underset{nx1}Y = \underset{nx1}{E\{Y\}} + \underset{nx1}\epsilon$

because:

$\quad \begin{bmatrix} Y_1 \\Y_2 \\ \vdots \\Y_n \end{bmatrix} = \begin{bmatrix} E\{Y_1\} \\ E\{Y_2\} \\ \vdots \\ E\{Y_n\} \end{bmatrix} + \begin{bmatrix} \epsilon_1 \\ \epsilon_2 \\ \vdots \\ \epsilon_n \end{bmatrix} = \begin{bmatrix} E\{Y_1\} + \epsilon_1 \\ E\{Y_2\} + \epsilon_2 \\ \vdots \\ E\{Y_n\} + \epsilon_n \end{bmatrix}$

Thus, the observations vector $Y$ equals the sum of two vectors, a vector containing the expected values and another containing the error terms.

# 5.3 Matrix Multiplication (p 182)

# 5.4 Special Types of Matrices (p 185)

# 5.5 Linear Dependence and Rank of Matrix (p 188)

# 5.6 Inverse of a Matrix (p 189)

# 5.7 Some Basic results of Matrices (p 193)

# 5.8 Random Vectors and Matrices (p 193)

# 5.9 Simple Linear Regression Model in Matrix Terms (p 197)

# 5.10 Least Squares Estimation of Regression Parameters (p 199)

# 5.11 Fitted Values and Residuals (p 202)

# 5.12 Analysis of Variance Results (p 204)

# 5.13 Inferences in Regression Analysis (p 206)