<a target="_blank" href="https://colab.research.google.com/github/BenjaminHerrera/MAT422/blob/main/HW_1.2.ipynb">
    <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# HW 1.2
# Benjamin Herrera
# 1 SEP 2024

# ⚠️ Run these commands prior to running anything

In [3]:
!pip install scipy
!pip install matplotlib
!pip install numpy



<br/>

## ⏹️ Linear Spaces

One can imagine the output of a linear combination (the multiplication of a set of vectors to a set of constants) as the linear subspace. In other words, the result of a linear combination.

To define a linear subspace, there are two things we have to keep in mind: (1) result of vector addition is in a subset of $V$ ($J$ where $J \subseteq V$) and (2) multiplying vectors of $J$ to a scalar still results in vectors in $J$

In other words, if we represent $j_i \isin J$, then $j_1 + j_2 \isin J$ and $\alpha j_1 \isin J$ where $\alpha \isin \Reals$.

Now we get to the notion of a span, the scalability reach of a set of vectors across some space. For example, given some vectors $s_1$ to $S_i$ that are in $V$, the span of these vectors can be defined as:

$$span(s_1, \dots, s_i) = \{ \sum_{k=1}^{i} \alpha_k s_k \}$$

where $\alpha_i \isin \Reals$.

This span can also be considered as a linear subspace $J$ (similar definition as above). Another interesting property is that this linear subspace also represents the span of $J$. Let's look a python implementation of this idea.

In [10]:
# Let's build an example list of four vectors
import numpy as np

v1 = np.array([1, 2, 3, 4])
v2 = np.array([5, 6, 7, 8])
v3 = np.array([9, 10, 11, 12])
v4 = np.array([13, 14, 15, 16])

# Now let's define some scalars
a1 = -69
a2 = 69
a3 = 420
a4 = -420

# Now let's calculate the linear combination of all of them.
# This will return a list of scalars, similar to the equation explained above
lc = a1 * v1 + a2 * v2 + a3 * v3 + a4 * v4

# Print the LC
print(lc)

[-1404 -1404 -1404 -1404]


In the above coding example, since the output of the linear combination are scalars with scalars in the set of reals, the linear combination is in the span of the three vectors. Had there been no scalars to get to what the linear combination is at this moment, then the linear combination is not in the span of the list of vectors.

We can also build an understanding of a columnspace with this knowledge of a span. Lets say for example we have an $n \times m$ matrix. If we grab the columns of this matrix, we get $n$ number of vectors. These vectors are a set of vectors and therefore, can create a span. This span will still be in $\Reals^n$.

Now, here's an issue. There can the possibility where we can have redundant vectors in our linear subspace. Sure, we call it a "set" of vectors, but matrixes are in essence a list of vectors. So how can we figure out if we have unique vectors you may ask? Well, we can just figure out if it's linear independent! How do we do that? Simple!

$$\forall i, j_i \notin span(\{j_k : k \neq i\})$$

where $j_1, \dots j_i \isin J$.

In other words, the scaling of a vector cannot reach all of the possible reaches of other vectors, making that vector unique!

you can also define a method via this definition

$$\sum_{k=1}^{n} \alpha_k u_k = 0$$

where $\alpha_k$ is not zero.

Perfect! we now can figure out if all of the vectors in a matrix are unique to one another (cannot touch other vector's reaches in euclidean space). But, is there a minimum set of vectors that can still touch all of euclidean space? Yes! And it's called bases. And it's really simple to define:

1. Basis of J spans J
2. Basis of J is linearly independent.

That's it! Also note, people will refer to basis vectors as $e_i$. Here's a cool thing about basis vectors: (1) any subspace can have multiple bases and (2) all bases of the same subspace, must have the same cardinality of elements.

Now we get onto the point of dimension! When we refer to the dimension of the column space of any subspace, we call that the rank of that subspace. Remember the two cool thins about basis vectors? Well, we can refer to the number of elements as the dimension of that subspace as $dim(J)$.

Let's see all of this in action.

In [24]:
# We'll define a function that uses the rank of a matrix and the number of
#   columns to see if the matrix is linearly independent.
def is_linearly_independent(matrix):
    rank = np.linalg.matrix_rank(matrix)
    return rank == matrix.shape[0]


# Here is an example of a linearly DEPENDENT matrix
m1 = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]])
print(is_linearly_independent(m1))

# Here is an example of a linearly INDEPENDENT matrix
m2 = np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1]])
print(is_linearly_independent(m2))

False
True


In the above example, we see that we are checking if the rank of the matrix matches to the number of columns in the matrix. If so, it is linear INDEPENDENT. If not, linearly DEPENDENT. We can see this to be the case in the non-identity matrix and the identity matrix. The identity matrix has linearly independency, because there is no way that the vectors in that matrix can reach the other vectors' reach if when the all scale. 

</br>

## ➕ Orthogonality 

One can understand orthogonality as vectors being perpendicular of a plane. In other words, a line that is in normal direction to a slice in euclidean space. But how do we define orthogonality? Simple!

But before we work at defining it, let's define the norm and inner product of two vectors $a$ and $b$ as:

$$< a , b > = a \cdot b = \sum_{j}^{n} a_j b_j$$

This is also essentially, the dot product which gets a scalar value that represents the similarity of two vectors.

To figure out if a list of vectors are orthonormal to each other, we say that: (1) for all vectors other than itself, a vector $a_i$ is orthonormal if it has a dot product of 0 across all vectors, and (2) the norm of itself is 1. If this is the case where the list of vectors are orthonormal, then the list is linearly independent. Below is a simple example of this.

In [31]:
# Let's use two vectors
v1 = np.array([(-23*np.sqrt(20129))/20129, (140*np.sqrt(20129))/20129])
v2 = np.array([(140*np.sqrt(20129))/20129, (23*np.sqrt(20129))/20129])

# Let's see the dot product between the two
print(np.dot(v1, v2))

# And here's the norm of each vector:
print("================")
print(np.linalg.norm(v1))
print(np.linalg.norm(v2))

0.0
0.9999999999999999
0.9999999999999999


In this example, we can see that the dot product between the two vectors are 0. And that the norms of each of them are almost to 1 (we'll round them and treat them as the value of 1).

Now, we can use this understanding to build the best approximation theorem. This is where we solve for:

$$\min_{q^* \isin Q} ||q^* - q||$$

where $q^* \isin Q$ and is the vector to find for. This can be found via $q^* = <u_1, q> u_1$ where $u_1 \isin U$. BY apply the Pythagorean theorem to this, we get $||q - \alpha u_1||^2 \geq ||q- q^*||^2$. We can also build the Cauchy-Schwarz Inequality which states that there his an upper bound of dot producting two vectors. It is simply stated as:

$$a, b \isin J, |<a, b>| \leq ||a|| ||b||$$

<br/>

## ⚙️ Gram-Schmidt Process

The Gram-Schmidt Process is an algorithm to find the orthonormal basis for a list of vectors. This is essentially finding the orthonormal basis of $span(j_1, \dots, j_n)$. The way to do this is pretty simple

$$q_i = u_i - \sum_{j=1}^{k-1} \textrm{proj}_{u_j}(u_k)$$

where $u_*, q_i \isin U$. We can also normalize the the orthonormal vectors by getting their unit vectors as such:

$$e_i = \frac{q_i}{||q_i||}$$

where $e_i$ is the unit version of the orthonormal vector. Here's an example of this.

In [34]:
# Define two vectors for example
v1 = np.array([69, 69])
v2 = np.array([420, 420])

# Normalize v1
v1_proc = v1 / np.linalg.norm(v1)

# Subtract proj of v2 on v1_proc to get v2_proc
proj = (np.dot(v2, v1_proc) / np.dot(v1_proc, v1_proc)) * v1_proc
v2_proc = v2 - proj

# Normalize v2
v2_proc = v2_proc / np.linalg.norm(v2_proc)

# Print out the transformed vectors
print("v1_proc", v1_proc)
print("v2_proc", v2_proc)

v1_proc [0.70710678 0.70710678]
v2_proc [nan nan]


  v2_proc = v2_proc / np.linalg.norm(v2_proc)


<br/>

## 📦 Eigenvalues and Eigenvector

One can see eigenvalues as the scale factor (with orientation) of one linear space to another. The the eigenvector complements this by providing a direction to scale from space A to space B. Assuming that $A\isin \Reals^{n \times n}$, we can make this definition

$$Ax = \lambda x$$

where $\lambda$ is the eigenvalue of $A$. This assumes, thought, that $x$ is non-zero. Here, we call $x$ as ain eigenvector. One other property to note is that $A$ has at max $n$ number of unique eigenvalues. This correlates to the dimension of $A$. 

To show that a matrix is diagonal, assume a matrix $B$ that is $diag(\lambda_1, \dots, \lambda_n)$. We can reconstruct $A$ as $PDP^{-1}$ for some matrix $P$. We can then reconfigure this to be $AP=PD$ which can derived into $A p_i = \lambda_i p_i$. This is now similar to the first definition of eigenvalues and eigenvectors. If this definition holds. We can state that $A$ is symmetric which to the following notions:

* every symmetirc matrix is orthogonally diagonizable
* $A$ has $d$ eigenvalues (this time duplicates are allowed)
* If the eigenvalues of $A$ are multiples of $g$, then the dimension of the eigenspace of $A$ is $g$.
* Eigenspaces are also orthogonal.
* P are orthonormal eigenvectors of $A$.

Here's an example of eigenvalues and eigenvectors:

In [37]:
# An example of a matrix
matrix = np.array([[69, 420], [420, 69]])

# Using numpy, we can extract the eigenvalues and the eigenvectors
values, vectors = np.linalg.eig(matrix)

# Display the eigenvectors and eigenvalues
print("matrix", matrix)
print("eigenvalues", values)
print("eigenvectors", vectors)

matrix [[ 69 420]
 [420  69]]
eigenvalues [ 489. -351.]
eigenvectors [[ 0.70710678 -0.70710678]
 [ 0.70710678  0.70710678]]
