# Activity-Wk10 : Objectives
- Find projection of a vector onto a subspace
- Solve given system of inconsistent equations using least squares approximation
- Determine linear regression model for given dataset 

## 1. Projection of a vector onto a subspace

In [None]:
# Subspace with 2 basis vectors
S = span([vector([1,0,0,1]),vector([0,1,0,1])])
print("Subspace:")
print(S)
print("")

# Vector b
b = vector([1,4,7,2])

# Compute projection matrix
A = matrix([[1,0],[0,1],[0,0],[1,1]])
print("Basis matrix:")
show(A)
print("")

P = A * (A.T * A).inverse() * A.transpose()
print("Projection matrix:")
show(P)
print("")

# Projection of b on subspace S
b_hat = P*b
print("Projection of b:")
show(b_hat)

In [None]:
# Subspace with 4 basis vectors - Vector space R^4
V = span([vector([1,0,0,1]),vector([0,1,0,1]),vector([1,1,0,0]),vector([0,0,1,1])])
print("Subspace:")
print(V)
print("")

# Vector b
b = vector([1,4,7,2])

# Compute projection matrix
A = matrix([[1,0,1,0],[0,1,1,0],[0,0,0,1],[1,1,0,1]])
print("Basis matrix:")
show(A)
print("")

P = A * (A.T * A).inverse() * A.transpose()
print("Projection Matrix:")
show(P)
print("")

# Projection of b on subspace S
b_hat = P*b
print("Projection of b:")
show(b_hat)

print("Is projection matrix P = I? ", P == identity_matrix(4))
print("Is b_hat = b?", b_hat == b)

## 2. Least squares approximation

In [None]:
A2 = matrix([[2,0],[-1,1],[0,2]])
b2 = vector([1,0,-1])

A2b2 = A2.augment(b2, true)
print("Augmented matrix:")
show(A2b2)
print("")

print("RREF:")
show(A2b2.rref())
print("")

# 0 = 1 row in RREF, system of equations inconsistent

# Least squares approximation
x_hat = (A2.transpose() * A2).inverse() * A2.transpose() * b2
show(x_hat)

# Error
e = b2 - A2*x_hat
print("Error:")
show(e)
print("Norm of error:", e.norm())

## 3. Linear regression
- The goal is to define the line $ y = \beta_0 + \beta_1 x$ that best defines the relationship between age (x) and height (y)
- __Recap__: Linear Regression using _projection_
    - We wrote this as $\boldsymbol{X\beta = y}$ (similar to $Ax=b$). 
        - $\boldsymbol{X} = $ The [Design Matrix](https://en.wikipedia.org/wiki/Design_matrix)
        - $\boldsymbol{y} = $ The dependent variable
        - $\boldsymbol{\beta} = $ The parameters to estimate
        - In this case, there is no single solution for $\beta_0$ and $\beta_1$ so that the line would go through all the points
        - Hence, we will look for best possible solution
        - We can solve for the best estimate of the parameters using $\boldsymbol{X^TX\hat\beta = X^Ty}$
        - $\boldsymbol{\hat\beta = (X^TX)^{-1}X^Ty}$

In [None]:
# Consider the data which relates age to height of children ages 18 to 29.
# The goal is to define the line y = mx + c that best defines the relationship between age (x) and height (y)
# Dataset gives the average height of the people at that age (18 to 29)
# Data in tuples (age,height):
# (18,76.1), (19,77), (20,78.1), (21,78.2), (22,78.8), (23,79.7), (24,79.9), (25,81.1), (26,81.2), (27,81.8), (28,82.8), (29,83.5)

# Read the data from csv file
import csv
reader=csv.reader(open('age_height.csv'), delimiter=',')
next(reader, None)  # skip the headers
data = matrix(RDF, [map(float, row) for row in reader])
print("Data:\n", data)

In [None]:
# Split the data into independent (x = data.column(0)) and dependent (y = data.column(1)) parts
# Build the design matrix [1, x]
dim = data.dimensions()
X = ones_matrix(RDF, dim[0], dim[1])
X[:,1] = data.column(0)
Y = matrix(data.column(1)).transpose()
print("Designer Matrix X:\n", X)
print("Dependent Variable Y:\n",Y)

We can solve for the best estimate of the parameters using $$\boldsymbol{X^TX\hat\beta = X^Ty}$$

$$\boldsymbol{\hat\beta = (X^TX)^{-1}X^Ty}$$

In [None]:
# We are ready to compute the parameter estimate using the matrix equation above
Beta = ( X.transpose() * X ).inverse() * X.transpose() * Y
print("Dimensions of Beta: ", Beta.dimensions())
# Make Beta a vector
Beta = vector(Beta)
print("Parameter Estimates:\n", Beta)

## Visualize our linear regression

In [None]:
# Plot the line
var('Age, Height')
Height = Beta[0] + Beta[1] * Age
a = plot(Height, (-1,29))

# add a plot of the data, in red
a += list_plot(data,color='red')
show(a)