<a href="https://colab.research.google.com/github/TSkinne4/MAT-421/blob/main/Module_d.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Section 1.1: Introduction

Linear algebra shows ujp in a variety of applications. This includes data science, physics, machine, and economics. Thus, understanding the basics of linear algebra is an important skill to have.

# Section 1.2: Elements of Linear Algebra
## Linear Spaces

Linear spaces are defined as a closed set of vectors. That is, for any two elements in the space, their sum is also in the space, and that any element multiplied by a scalar is also in the space. This can be rewritten as for $\mathbf{u}_1,\mathbf{u}_2\in U$ and $\alpha\in\mathbb{R}$,
$$\mathbf{u}_1+\mathbf{u}_2\in U$$
and
$$\alpha\mathbf{u}_1\in U$$

This also leads to the concept of linear subspaces. For higher order spaces, there are lower order subspaces that can be contained within them. For example, in three dimensional space defined by the x, y, and z axes, the x and y plane form a two dimensional subspace of the linear space. Likewise, each individual vector also form a one dimensional subspace by themselves.

### Span
The span of a set of vectors is the space defined by the linear combinations of these vectors. For example $\hat{x},\hat{y}$, and $\hat{z}$ span $\mathbb{R}^3$ as all vectors in this space can be written as a linear combination of the basis vectors. For a collection of vectors, $a_1,a_2,...,a_m$, we denote the space spanned by them as $\text{span}(a_1,a_2,...,a_m)$. This can be written
$$\text{span}(a_1,a_2,...,a_m)=\sum_{i=1}^m\beta_ia_i,$$
where $\beta_i\in\mathbb{R}$ for $i=1,2,...,m$.

The following function takes a list of coefficients and vectors and then returns the linear combination of the vectors with the coefficients. For any list of vectors, the span can be observed by all of the combinations of the coefficients.

In [16]:
import numpy as np

def span(beta,a):
  #Beta is a list of coefficients, a is a list of vectors
  result = np.zeros_like(a[0])
  for betai, ai in zip(beta,a):
    result += betai*ai
  return result

beta = [1,4,5]
a = np.array([[1,0,3],[1,2,1],[1,5,1]])
print("Hello")
print(span(beta,a))

Hello
[10 33 12]


##Column Space
When dealing with matrices, we often care about the column space of the matrix. This is the space spanned by the columns of a matrix. For a matrix $A$, it is denoted $\text{col}(A)$. For a $n\times m$ matrix, this space will be n-dimensional. If $a_1,a_2,...,a_m$ are the columns of $A$, then
  $$\text{col}(A)=\text{span}(a_1,a_2,...,a_m)$$

## Linear Independance and Dimension
A collection of vectors $a_1,a_2,...,a_m$ is linearly independant if they cannot be written as a linear combination of the other vectors in the collection. That is, for some element $a_k$,
$$a_k\notin\text{span}(a_1,a_2,...,a_{k-1},a_{k+1},...,a_m).$$
This can also be written as 
$$\sum_{i=1}^m \beta_ia_i=0\implies \beta_i=0 \text{ for all } i=1,2,...,m$$
This works as suppose we have a collection of indicies $J=\{j_1,j_2,...,j_k\}$ such that
$$\beta_{j_1}\mathbf{a}_{j_1}+\beta_{j_2}\mathbf{a}_{j_2}+...+\beta_{j_k}\mathbf{a}_{j_k} = \mathbf{0}$$
where $\beta_{j}\neq0$ for $j\in J$. Thus, we can see that we can write
$$\mathbf{a}_{j_1}=-\frac{\beta_{j_2}}{\beta_{j_1}}\mathbf{a}_{j_2}-...-\frac{\beta_{j_k}}{\beta_{j_1}}\mathbf{a}_{j_k}.$$
Thus, we can see that as we can write mathbf$\mathbf{a}_{j_1}$ as a combination of other vectors in the set, it is note linearly independant.

We have thus far mentioned the concept of dimension without rigourusly defining it. Dimension is the number of linearly independant vectors needed to define a subspace. One thing to note is that for $n$-dimentional space, we need exactly $n$ linearly independant vectors to dedefine it. Too few and we do not have enough, too many and one will not be linearly independant.

## Orthogonality

Two vectors $\mathbf{a}$ and $\mathbf{b}$ are orthogonal if
$$\mathbf{a}\cdot\mathbf{b}=0.$$
A vector is normal if
$$\mathbf{a}\cdot\mathbf{a}=1.$$

The following code shows two simple functions that can test for orthogonality and normality.

In [28]:
def is_orthogonal(a,b,tol):
  r = a*b
  r = np.sum(r)
  if r < tol:
    return True
  return False

def is_normal(a,tol):
  #a is a vector and tol is a tolerence
  if np.sum(a*a)- 1 < tol:
    return True
  return False

a = np.array([1,0,2])
b = np.array([0,1,0])
print(a,b, "are orthogonal? :",is_orthogonal(a,b,1e-10))
a = np.array([1,0,1])
b = np.array([0,1,1])
print(a,b, "are orthogonal? :",is_orthogonal(a,b,1e-10))
a = np.array([0,1,1])
b = np.array([0,1,1])/np.sqrt(2)
print(a, "is Normal? :",is_normal(a,1e-10))
print(b, "is Normal? :",is_normal(b,1e-10))

[1 0 2] [0 1 0] are orthogonal? : True
[1 0 1] [0 1 1] are orthogonal? : False
[0 1 1] is Normal? : False
[0.         0.70710678 0.70710678] is Normal? : True


###Orthonormal Bases

When dealing in many different applications, it is useful to have an orthonormal bases to work with. Orthonormality means that each vector is orthogonal to eachother and normalized with itself. This is the case with $\hat{\mathbf{x}}$, $\hat{\mathbf{y}}$, and $\hat{\mathbf{z}}$ when dealing with three dimensional space.  An important fact is that orthonormal sets of vectors are also all linearly independant. 

Now, suppose that we have a vector $\mathbf{u}$ that we want to write as a linear combination of basis vectors, we thus get that
  $$\mathbf{u}=\sum_{j=1}^m(\mathbf{u}\cdot\mathbf{q}_j)\mathbf{q}_j$$



In [30]:
def is_orthonormal(a,b,tol):
  if is_normal(a,tol) and is_normal(b,tol) and is_orthogonal(a,b,tol):
    return True
  return False

a = np.array([1,0,2])
b = np.array([0,1,0])
print(a,b, "are orthonormal? :",is_orthonormal(a,b,1e-10))
a = np.array([1,0,1])/np.sqrt(2)
b = np.array([0,1,0])
print(a,b, "are orthonormal? :",is_orthonormal(a,b,1e-10))


[1 0 2] [0 1 0] are orthonormal? : False
[0.70710678 0.         0.70710678] [0 1 0] are orthonormal? : True


### Gram-Schmidt Process

In the Gram-Schimdt process we find a set of orthonormal basis vectors for the space $\text{Span}(a_1,...,a_m)$, where $a_1,...,a_m$ are linearly independant. To find these, we get that
  $$\mathbf{b}_i=\mathbf{a}_i-\sum_{j=1}^{i-1}(\mathbf{a}_i\cdot\mathbf{q}_j)\mathbf{q}_j$$
and then
  $$\mathbf{q}_i=\frac{\mathbf{b}_i}{||\mathbf{b}_i||}$$



## Eigenvalues and Eigenvectors
An eigenvalue $\lambda$ and an eigenvector $\mathbf{x}$ are such that
  $$A\mathbf{x}=\lambda\mathbf{x},$$
where A is a square matrix. For an $n\times n$ matrix, there are at most n distinct eigenvalues and eigenvectors. As well, thei eigenvectors will be linearly independant. 


In [32]:
import numpy.linalg as la

A = np.array([[1,2],[4,5]])
L,v = la.eig(A)
print('Eigenvalues',L)
print('Eigenvectors',v)

Eigenvalues [-0.46410162  6.46410162]
Eigenvectors [[-0.80689822 -0.34372377]
 [ 0.59069049 -0.9390708 ]]


### Diagonalization
$\text{Diag}(\lambda_1,...,\lambda_n)$ refers to a matrix with diagonal elements $\lambda_1,...,\lambda_n$. Say we have a $n\times n$ matrix A, then there will be a matrix P such that
$$A=PDP^{-1}.$$
Letting $\mathbf{p}_1,...,\mathbf{p}_n$, we can get that
  $$A\mathbf{p}_i=\lambda_i\mathbf{p}_i.$$
If A is symmetric,
$$A=PDP^T=PDP^{-1}.$$
We also get that
$$A=\lambda_1\mathbf{v}_1\mathbf{v}_1^T+\lambda_2\mathbf{v}_2\mathbf{v}_2^T+...+\lambda_3\mathbf{v}_3\mathbf{v}_3^T$$

#Linear Regression
## QR Decomposition
  One way in which we can decompose a matrix is QR decomposition in which
  $$A=QR$$
Here, Q is a matrix ontained from the Gram-Schmidt process, and R is an upper triagular matrix containing the coefficients of the linear combinations of the columns of Q that create the collumns of A. This is useful in the least square problem. The following code shows how to do this in python using the linear algebra package



In [33]:
A = np.array([[3,1,5,3],[5,1,4,6],[9,6,1,3],[1,0,4,3]])
q,r = la.qr(A)
print('A:',A)
print('Q:',q)
print('R:',r)

A: [[3 1 5 3]
 [5 1 4 6]
 [9 6 1 3]
 [1 0 4 3]]
Q: [[-0.27854301 -0.27367145  0.63352558 -0.66795439]
 [-0.46423835 -0.75846088 -0.45093742  0.0766505 ]
 [-0.83562902  0.53952372 -0.03043136  0.09855065]
 [-0.09284767 -0.24239471  0.62799261  0.73365482]]
R: [[-10.77032961  -5.75655548  -4.45668812  -6.40648917]
 [  0.           2.20500997  -4.83225589  -4.4803926 ]
 [  0.           0.           3.84541728   0.98763595]
 [  0.           0.           0.           0.95265626]]


##Least Square problem

Let A be a $n\times m$ matrix and $\mathbf{b}$ be a vector with n elements. We wish to solve
$$A\mathbf{x}=\mathbf{b}.$$
We usually can just use the inverse of A, but if A is not square, then we cannot do this. Instead, we will try to estimate an $\mathbf{x}$ that minimizes
$$A\mathbf{x}-\mathbf{b}.$$
The solution satisfies
$$A^TA\mathbf{x}=A^T\mathbf{b}$$
By using QR decomposition, we get eventually get the equation into the form
$$R\mathbf{x}^*=Q^T\mathbf{b}.$$
As R is upper triangular, this is easily and quickly solvable.

##Linaer regression

If we start start with points $\{(\mathbf{x}_i,y_i)\}_{i=1}^n$, where $\mathbf{x}_i=(x_{i1},x_{i2},...,x_{id})$ and we want to fit a function to it, we can use linaer regression. We desire to find coefficitents $\beta_i$ such that
$$\sum_{i=1}^n(y_i-\hat{y}_i)^2$$
is minimized, where
  $$\hat{y}_i=\beta_0+\sum_{j=1}^d\beta_jx_{ij}$$
This can be formulated in terms of matricies
  $$\mathbf{y}=(y_1,y_2,...,y_n)$$
  $$\mathbf{\beta}=(\beta_0,\beta_1,...,\beta_d)$$
  $$A=\begin{pmatrix}1,\mathbf{x}_1^T\\1,\mathbf{x}_2^T\\\cdots\\1,\mathbf{x}_n^T\end{pmatrix}$$

At which point we are trying to minimize
$$||\mathbf{y}-A\mathbf{\beta}||^2,$$
which is the same as the least squares problem

In [None]:
id}xR