## **Lecture 4. Linear Algebra and Systems of Linear Equations:** 
### **4.1 Basics of Linear Algebra** 


#### <font color="cyan">**Sets**</font>

$S = \{orange, apple, banana\}$ means S is the set containing three objects: "orange", "apple", and "banana". 


There are several standard sets related to numbers, for example **natural numbers**, **whole numbers**, **integers**, **rational numbers**, **irrational numbers**, **real numbers**, and **complex numbers**. A description of each set and the symbol used to denote them is shown in the following table.

| Set Name | Symbol  | Description  |
|------|------|------------|
|   Naturals  | $\mathbb{N}$| ${\mathbb{N}} = \{1, 2, 3, 4, \cdots\}$|
|   Wholes    | $\mathbb{W}$| ${\mathbb{W}} = \mathbb{N} \cup \{0\}$|
|   Integers  | $\mathbb{Z}$| ${\mathbb{Z}} = \mathbb{W} \cup \{-1, -2, -3, \cdots\}$|
|   Rationals | $\mathbb{Q}$| ${\mathbb{Q}} = \{\frac{p}{q} : p\in {\mathbb{Z}}, q\in {\mathbb{Z}} \backslash \{0\}\}$|
| Irrationals | $\mathbb{I}$| ${\mathbb{I}}$ is the set of real numbers not expressible as a fraction of integers.|
|   Reals     | $\mathbb{R}$| ${\mathbb{R}} = \mathbb{Q} \cup \mathbb{I}$|
|   Complex Numbers     | $\mathbb{C}$| ${\mathbb{C}} = \{a + bi : a,b\in {\mathbb{R}}, i = \sqrt{-1}\}$|

**Example:** Let $S$ be the set of all real $(x,y)$ pairs such that $x^2 + y^2 = 1$. Write $S$ using set notation.


$S = \{(x,y) : x,y \in {\mathbb{R}}, x^2 + y^2 = 1\}$

#### <font color="cyan">**Vectors**</font>

The set (vector space), ${\mathbb{R}}^n$, is the set of all $n$-tuples of real numbers. <br> 

In set notation this is ${\mathbb{R}}^n = \{(x_1, x_2, x_3, \cdots, x_n): x_1, x_2, x_3, \cdots, x_n \in {\mathbb{R}}\}$. <br> 

For example, the set ${\mathbb{R}}^3$ represents the set of real triples, $(x,y,z)$ coordinates, in three-dimensional space.

Object Equality in Linear Algebra and Python

In Linear Algebra, the vector is equal with the "same" direction. <br>

In Python, the object equality depends on <font color = "cyan">object type</font>. 


In [10]:
"""
Example 4.1:
mutable and immutable object
integer, float, Bollean, string and tuple is immutable object (type): memory address by `id()` 
"""

x = (1, 2, 3)
y = x
# x[2] = 4  # TypeError: 'tuple' object does not support item assignment

print(f"x is {x} in {id(x)}")
print(f"y is {y} in {id(y)} \n")
# different variables x, y point to the same memory address

x = (1, 2, 4)
print(f"x is {x} in {id(x)}")
print(f"y is {y} in {id(y)}")
# while x becomes a new object, whereas corresponding to a new memory address
# tuple is immutable 


x is (1, 2, 3) in 2525963990912
y is (1, 2, 3) in 2525963990912 

x is (1, 2, 4) in 2525960742720
y is (1, 2, 3) in 2525963990912


In [11]:
"""
Example 4.2:
list, dictionary is mutable object (type)
"""

x = [1, 2, 3]
y = x
x[2] = 4

print(f"x = {x} in {id(x)}")
print(f"y = {y} in {id(y)}\n")

# z = [1, 2, 4]
# print(x == z) # "==" comparision by value
# print(x is z) # "is" comparision by object (memory address)


x = [1, 2, 4] in 2525963837760
y = [1, 2, 4] in 2525963837760

True
False


In [17]:
"""
Example 4.3:
numpy.ndarray is mutable object (type), 
To avoid the aliasing as last example, with .copy() for deep copy
"""
import numpy as np

x = np.array([1, 2, 3])
y = x
x[2] = 4
print(f"x = {x} in {id(x)}")
print(f"y = {y} in {id(y)}\n")

# z = np.array([1, 2, 4])
# print(x == z) # "==" element-wise comparision by value
# print(x is z) # "is" comparision by object

############## Deep copy ######################
# x = np.array([1, 2, 3])

# y = x.copy()
# y**2 + 4*y

# x[2] = 4
# print(x)
# print(y)


x = [1 2 4] in 2525963808432
y = [1 2 4] in 2525963808432

[1 2 4]
[1 2 3]


<font color="magenta">**TRY IT!**</font> Create a row vector and column vector by `np.ndarray()`, and show the shape of the vectors.

In [18]:
"""
Example 4.4a:
Vectors/Matrices can be built by np.array(A), whereas A is a list of list in Python.
s.t. we can distinguish the column and row vector
"""

import numpy as np
vector_row = np.array([[1, -5, 3, 2, 4]])
vector_column = np.array([[1], 
                          [2], 
                          [3], 
                          [4]])
print(vector_row.shape)
print(vector_column.shape)

(1, 5)
(4, 1)


<font color="cyan">You can define 1D `np.array(L)`, whereas L = [1, 2, 3, 4] is a list, 

but you will soon notice that it doesn't contain information about row or column.</font> 

In [22]:
"""
Example 4.4b:
"""
x = np.array([1, 4, 7])
x_row = x[np.newaxis, :] # dimension broadcasting, shape (1, 3)
print(x_row)

[[1 4 7]]


#### **Norm of a vector**

The **norm** of a vector is a measure of its length. 

**$L_2$ norm** of a vector $v$ is denoted by $\Vert v \Vert_{2}$ and $\Vert v \Vert_{2} = \sqrt{\sum_i v_i^2}$. 

**$L_1$ norm** or "Manhattan Distance," is computed as $\Vert v \Vert_{1} = \sum_i |v_i|$, and is named after the grid-like road structure in New York City. 

**p-norm**, $L_p$, of a vector is $\Vert v \Vert_{p} = \sqrt[p]{(\sum_i v_i^p)}$. 

**$L_\infty$ norm** is the $p$-norm, where $p = \infty$. The $L_\infty$ norm is written as $||v||_\infty$ and it is equal to the maximum absolute value in $v$.

<font color="magenta">**TRY IT!**</font> Transpose the row vector we defined above into a column vector and calculate the $L_1$, $L_2$, and $L_\infty$ norm of it. Verify that the $L_\infty$ norm of a vector is equivalent to the maximum value of the elements in the vector.

In [24]:
"""
Example 4.5:
Norm of vector in Python, we import np.linalg.norm(v, 2)
"""
from numpy.linalg import norm

# column vector in 2-axis
# vector_row = np.array([[1, -5, 3, 2, 4]])　
new_vector = vector_row.T 
print(f"new vector in column form is \n {new_vector} \n")

norm_1 = norm(new_vector, 1)
norm_2 = norm(new_vector, 2)
norm_inf = norm(new_vector, np.inf)
print("L_1 is: %.1f"   %norm_1)
print("L_2 is: %.1f"   %norm_2)
print("L_inf is: %.1f" %norm_inf)

new vector in column form is 
 [[ 1]
 [-5]
 [ 3]
 [ 2]
 [ 4]] 

L_1 is: 15.0
L_2 is: 7.4
L_inf is: 5.0


#### **Vector addition:**
If $v$ and $w$ are vectors in ${\mathbb{R}}^n$, then $u = v + w$ is defined as $u_i = v_i + w_i$. 

#### **Scalar multiplication:**
If $\alpha$ is a scalar and $v$ is a vector, then $u = \alpha v$ is defined as $u_i = \alpha v_i$.

#### **Dot product:** 
For $v$ and $w$ $\in {\mathbb{R}}^n, d = v\cdot w$ is defined as $d = \sum_{i = 1}^{n} v_iw_i$. The **angle between two vectors**, $\theta$, is defined by the formula:

$$
v \cdot w = \Vert v \Vert_{2} \Vert w \Vert_{2} \cos{\theta}
$$

The dot product is a measure of how similarly directed the two vectors are. For example, the vectors (1,1) and (2,2) are parallel. If you compute the angle between them using the dot product, you will find that $\theta = 0$. If the angle between the vectors, $\theta = \pi/2$, then the vectors are said to be perpendicular or **orthogonal**, and the dot product is 0.

In [25]:
"""
Example 4.6:
Inner product and the angle between two vectors, 
note the difference betwwen np.dot(v, w) and np.dot(v_row, w_row.T) 
"""
from numpy import arccos, dot, pi

# 2-axis
v_row = np.array([[1, 0, 0]]) 
w_row = np.array([[0, 1, 0]])

# check np.dot(v,w)
theta = arccos(dot(v_row, v_row.T)/(norm(v_row)*norm(w_row)))
print(theta*180.0/pi)

# [[90.]] the scalar was represented as a spectial 1x1 matrix 
# type(theta) is <class 'numpy.ndarray'>

[[0.]]


#### **Cross product:** 
$v \times w$. It is defined by $v \times w = \Vert v \Vert_{2}\Vert w \Vert_{2}\sin{(\theta)} \textit{n}$, where $\theta$ is the angle between the $v$ and $w$ (which can be computed from the dot product) and **$n$** is a vector perpendicular to both $v$ and $w$ with unit length (i.e., the length is one). 

The geometric interpretation of the cross product is a vector perpendicular to both $v$ and $w$ with length equal to the area enclosed by the parallelogram created by the two vectors.

In [28]:
"""
Example 4.7:
Cross product and the new vector, note the shape of the new vector
"""
v_row = np.array([[0, 2, 0]])
w_row = np.array([[3, 0, 0]])
# v = v_row.flatten()
# w = w_row.flatten()
print(np.cross(v_row, w_row))

[[ 0  0 -6]]


#### **Linear Combination:**
Assuming that $S$ is a set in which addition and scalar multiplication are defined, a **linear combination** of $S$ is defined as
$$
S = \sum \alpha_i s_i,
$$

where $\alpha_i$ is any real number and $s_i$ is the $i^{\text{th}}$ object (vector in Linear Algebra) in $S$. Sometimes the $\alpha_i$ values are called the **coefficients** of $s_i$. 

Linear combinations can be used to describe numerous things. For example, a grocery bill can be written $\displaystyle{\sum c_i n_i}$, where $c_i$ is the cost of item $i$ and $n_i$ is the number of item $i$ purchased. Thus, the total cost is a linear combination of the items purchased.

In [30]:
"""
Example 4.8:
linear combination, whereas we mostly build 1-axis array as the vector, 
then apply broadcasting to extend the axis. 
"""
def lincomb(coef, vectors):
    n = len(vectors[0])
    comb = np.zeros(n)
    
    for i in range(len(vectors)):
        comb = comb + coef[i]*vectors[i]
    return comb

x = np.array([1, 2])
y = np.array([3, 4])
a, b = -0.5, 1.5
lincomb([a, b], [x, y]) 


array([4., 5.])

---



#### <font color="cyan">**Matrices**</font>

An ${m} \times {n}$ **matrix A** is a rectangular table of numbers consisting of $m$ rows and $n$ columns. 

#### **Norm of a Matrix:**

The Frobenius-norm of this vector can be write as:

$$\Vert A \Vert_{F} = \sqrt{(\sum_i^m \sum_j^n |a_{ij}|^2)}$$

You can calculate the matrix norm using the same `np.norm(A)` function as that for vector. 

#### **Matrix addition and scalar multiplication:**
 for matrices work the same way as for vectors. However, **matrix multiplication** between two matrices, $P$ and $Q$, is defined when $P$ is an ${m} \times {p}$ matrix and $Q$ is a ${p} \times {n}$ matrix. The result of $M = PQ$ is a matrix $M$ that is $m \times n$. The dimension with size $p$ is called the **inner matrix dimension**, and the inner matrix dimensions must match (i.e., the number of columns in $P$ and the number of rows in $Q$ must be the same) for matrix multiplication to be defined. The dimensions $m$ and $n$ are called the **outer matrix dimensions**. Formally, if $P$ is ${m} \times {p}$ and Q is ${p} \times {n}$, then $M = PQ$ is defined as

$$
A_{ij} = \sum_{k=1}^p P_{ik}Q_{kj}
$$

The product of two matrices $P$ and $Q$ in Python is achieved by using the `np.dot(P,Q)` method in Numpy. 

#### **Transpose:** 
The transpose is denoted by a superscript, $T$, such as $A^T$ is the transpose of matrix $M$. In Python, if `A` is a matrix, then `A.T` is its transpose.

<font color="magenta">**TRY IT!**</font> Let the Python matrices $P = [[1, 7], [2, 3], [5, 0]]$ and $Q = [[2, 6, 3, 1], [1, 2, 3, 4]]$. 

Compute the matrix product of $P$ and $Q$. Show that the product of $Q$ and $P$ will produce an error.

In [34]:
"""
Example 4.9:
Matrix multiplucation, note the inner matrix dimensions must match. 
"""
import numpy 

P = np.array([[1, 7], [2, 3], [5, 0]])
Q = np.array([[2, 6, 3, 1], [1, 2, 3, 4]])

print(f"P = \n {P} \n")
print(f"Q = \n {Q} \n")
print(f"PQ = \n {np.dot(P, Q)} \n")

# np.dot(Q, P) ValueError: shapes (2,4) and (3,2) not aligned

P = 
 [[1 7]
 [2 3]
 [5 0]] 

Q = 
 [[2 6 3 1]
 [1 2 3 4]] 

PQ = 
 [[ 9 20 24 29]
 [ 7 18 15 14]
 [10 30 15  5]] 



#### **Determinant:** 
An important property of square matrices. In the case of a $2 \times 2$ matrix, the determinant is:

$$
|A| = \begin{bmatrix}
a & b \\
c & d\\
\end{bmatrix} = ad - bc$$

Similarly, in the case of a $3 \times 3$ matrix, the determinant is:

$$
\begin{aligned}
|A| = \begin{bmatrix}
a & b & c \\
d & e & f \\
g & h & i \\
\end{bmatrix} & = & a\begin{bmatrix}
\Box &\Box  &\Box  \\
\Box & e & f \\
\Box & h & i \\
\end{bmatrix} - b\begin{bmatrix}
\Box &\Box  &\Box  \\
d & \Box & f \\
g & \Box & i \\
\end{bmatrix}+c\begin{bmatrix}
\Box &\Box  &\Box  \\
d & e & \Box \\
g & h & \Box \\
\end{bmatrix} \\
&&\\
& = & a\begin{bmatrix}
e & f \\
h & i \\
\end{bmatrix} - b\begin{bmatrix}
d & f \\
g & i \\
\end{bmatrix}+c\begin{bmatrix}
d & e \\
g & h \\
\end{bmatrix} \\ 
&&\\
& = & aei + bfg + cdh - ceg - bdi - afh
\end{aligned}$$

We can use similar approach to calculate the determinant for higher the dimension of the matrix, but it is much easier to calculate using Python `det(A)` from numpy.linalg package. 

#### **Identity matrix:** 
A square matrix with ones on the diagonal and zeros elsewhere. The identity matrix is usually denoted by $I$, and is analagous to the real number identity, 1. Using Python with dimension $n \times n$: `np.ones(n)`

<font color="magenta">**TRY IT!**</font> Use Python to find the determinant of the matrix $A = [[0, 2, 1, 3], [3, 2, 8, 1], [1, 0, 0, 3], [0, 3, 2, 1]]$. Use the `np.eye(4)` function to produce a ${4} \times {4}$ identity matrix, $I$. Multiply $M$ by $I$ to show that the result is $A$.

In [46]:
"""
Example 4.10:
Determinant of matrix A. 
"""
from numpy.linalg import det

# A.dtype
A = np.array([[0, 2, 1, 3], 
              [3, 2, 8, 1], 
              [1, 0, 0, 3],
              [0, 3, 2, 1]])

print(f"A: \n {A} \n")

print(f"Det(A): {det(A)} \n")

I = np.eye(4)
print(f"I: \n {I} \n")
print(f"A*I: \n {np.dot(A, I)} \n")

int32
A: 
 [[0 2 1 3]
 [3 2 8 1]
 [1 0 0 3]
 [0 3 2 1]] 

Det(A): -38.000000000000014 

I: 
 [[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]] 

A*I: 
 [[0. 2. 1. 3.]
 [3. 2. 8. 1.]
 [1. 0. 0. 3.]
 [0. 3. 2. 1.]] 



#### **Inverse:** 

Inverse of a square matrix $A$ is a matrix of the same size, $A^{-1}$, such that $A \cdot A^{-1} = I$. The inverse of a matrix is analagous to the inverse of real numbers. For example, the inverse of 3 is $\frac{1}{3}$ because $(3)(\frac{1}{3}) = 1$. A matrix is said to be **invertible** if it has an inverse. The inverse of a matrix is unique; that is, for an invertible matrix, there is only one inverse for that matrix. Inverse can be computed in Python using the function `np.linalg.inv(A)`.

For a $2 \times 2$ matrix, the analytic solution of the matrix inverse is:

$$
A^{-1} = \begin{bmatrix}
a & b \\
c & d\\
\end{bmatrix}^{-1} = \frac{1}{|A|}\begin{bmatrix}
d & -b \\
-c & a\\
\end{bmatrix}$$

The calculation of the matrix inverse for the **analytic solution** becomes complicated with increasing matrix dimension, **there are many other methods can make things easier, such as Gaussian elimination, Newton's method, Eigendecomposition and so on**. We will introduce some of these methods after we learn how to solve a system of linear equations, because the process is essentially the same. 

Recall that 0 has no inverse for multiplication in the real-numbers setting. Similarly, there are matrices that do not have inverses. These matrices are called **singular**. Matrices that do have an inverse are called **nonsingular or invertible**. One way to determine if a matrix is singular is by computing its determinant. If the determinant is 0, then the matrix is singular; if not, the matrix is nonsingular.

<font color="magenta">**TRY IT!**</font> The matrix $A$ (in the previous example) has a nonzero determinant. Compute the $A^{-1}$. Show that given a matrix $U = [[0, 1, 0], [0, 1, 0], [1, 0, 1]]$ has a determinant value of 0 and therefore has no inverse.

In [54]:
"""
Example 4.11:
Inverse matrix A^{-1}, where A is invertible. 
"""
from numpy.linalg import inv

print('Inv_A:\n', inv(A))
print("")

U = np.array([[0, 1, 0],
              [0, 1, 0],
              [1, 0, 1]])

print('det(U):\n', det(U))
print('Inv_U:\n', inv(U)) 
# U is singular

Inv_A:
 [[-1.57894737 -0.07894737  1.23684211  1.10526316]
 [-0.63157895 -0.13157895  0.39473684  0.84210526]
 [ 0.68421053  0.18421053 -0.55263158 -0.57894737]
 [ 0.52631579  0.02631579 -0.07894737 -0.36842105]]

det(U):
 0.0


LinAlgError: Singular matrix

#### **Numerical Stability about Inverse:** 
**ill-conditioned Matrix and Condition Number** `np.linalg.cond(A)`


A matrix that is close to being singular (i.e., the determinant is close to 0) is called **ill-conditioned**. Think of $0.1*x = 5$. 

Although ill-conditioned matrices have inverses, they are problematic numerically in the same way that dividing a number by a very, very small number is problematic. That is, it can result in computations that result in overflow, underflow, or numbers small enough to result in significant round-off errors. 

The **condition number** is a measure of how ill-conditioned a matrix is, and it can be computed using Numpy's function `cond(A)` from `np.linalg`
module. **The higher the condition number, the closer the matrix is to being singular.**

In [55]:
"""
Example 4.12:
Condition number of A, is used as the sigular merit.
The higher the condition number (lambda_max/lambda_min), the closer the matrix is to being singular.
In the future, if we apply iterative method to solve Ax = b, the convergence will be different.
"""
from numpy.linalg import cond, matrix_rank

A1 = np.array([[1, 1, 0],
               [0, 1, 0],
               [1, 0, 1]])

print('Condition number:\n', cond(A1))
print('Rank:\n', matrix_rank(A1))
print('')

A2 = np.array([[1,   1, 0],
               [1, 1.1, 0],  
               [1,   0, 1]])

print('Condition number:\n', cond(A2))
print('Rank:\n', matrix_rank(A2))
print('')

Condition number:
 4.048917339522305
Rank:
 3

Condition number:
 56.07458006794298
Rank:
 3



#### **Rank:** 

An ${m} \times {n}$ matrix $A$ is the number of linearly independent columns or rows of $A$, and is denoted by rank($A$). 

A matrix, $A$, is called full **row rank $rank(A)= m < n$.** if all of its columns are linearly independent. An **augmented matrix**. is a matrix, $A$, concatenated with a vector, $y$, and is written $[A,y]$. This is commonly read "$A$ augmented with $y$." You can use `np.concatenate((A,y), axis=1)` to concatenate the them. 

If $dim\{C([A,y])\} = {rank}(A) + 1$, then the vector, $y$, is "new" information. That is, it cannot be created as a linear combination of the columns in $A$. **The rank is an important property of matrices because of its relationship to solutions of linear equations, which is discussed in the last section of this chapter.**

<font color="magenta">**TRY IT!**</font> Matrix $A = [[1, 1, 0], [0, 1, 0], [1, 0, 1]]$, compute the condition number and rank for this matrix. If $y = [[1], [2], [1]]$, get the augmented matrix [A, y]. 

In [56]:
"""
Example 4.13:
Rank of A, is used as the merit for independence.
The higher the condition number (lambda_max/lambda_min), the closer the matrix is to being singular.
In the future, if we apply iterative method to solve Ax = b, the convergence will be different.
"""
from numpy.linalg import cond, matrix_rank

A = np.array([[1, 1, 0],
              [0, 1, 0],
              [1, 2, 0]])

print('Condition number:\n', cond(A))
print('Rank:\n', matrix_rank(A))
print('')

y = np.array([[1], [2], [1]])
A_y = np.concatenate((A, y), axis = 1)
print('Augmented matrix:\n', A_y)
print('Rank of Augmented matrix:\n', matrix_rank(A_y))



Condition number:
 inf
Rank:
 2

Augmented matrix:
 [[1 1 0 1]
 [0 1 0 2]
 [1 2 0 1]]
Rank of Augmented matrix:
 3


<!--NAVIGATION-->
< [Contents](Tien_chapter14.01-Basics-of-Linear-Algebra.ipynb) | [4.2 Linear Transformations](Tien_chapter14.02-Linear-Transformations.ipynb) >