<a href="https://colab.research.google.com/github/ckraju/fmml-jan/blob/main/SS_01_python_maths.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Intro to Linear Algebra

In [None]:
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from IPython.display import HTML

## System of Linear Equations...

- Systems of linear equations play a **central part** of linear algebra. Many problems can be formulated as systems of linear equations, and linear algebra gives us the tools for solving them.
- For intuitive explanation and proper visualization, we'll consider 2-D plane and system of two linear equations with two
variables which can be geometrically interpreted as the intersection of two lines. Every linear equation represents a line. <center><img src="https://drive.google.com/uc?export=view&id=11XSuesqkAEWgiBD7ZEj2VdfBIzoOQkYI" alt="lin_eq" height="350"></center>

- Let's solve this equation:
\begin{equation}4x_1 + 4x_2 = 5\\ 2x_1 - 4x_2 = 1\end{equation}
- On solving: one will get $x_1 = 1,\ x_2 = 1/4$
- Can we do better?
- For a systematic approach to solving systems of linear equations, we will introduce a useful compact notation.
\begin{equation}x_1\left[\begin{array}{c}
4 \\
2
\end{array}\right]+x_2\left[\begin{array}{c}
4 \\
-4
\end{array}\right]=\left[\begin{array}{c}
5 \\
1
\end{array}\right]\end{equation}

- In general it can be expressed as:
\begin{equation}x_1\left[\begin{array}{c}
a_{11} \\
\vdots \\
a_{m 1}
\end{array}\right]+x_2\left[\begin{array}{c}
a_{12} \\
\vdots \\
a_{m 2}
\end{array}\right]+\cdots+x_n\left[\begin{array}{c}
a_{1 n} \\
\vdots \\
a_{m n}
\end{array}\right]=\left[\begin{array}{c}
b_1 \\
\vdots \\
b_m
\end{array}\right]\end{equation}

- Now going one level more and compacting above representation will generate:
\begin{equation}\left[\begin{array}{ccc}
a_{11} & \cdots & a_{1 n} \\
\vdots & & \vdots \\
a_{m 1} & \cdots & a_{m n}
\end{array}\right]\left[\begin{array}{c}
x_1 \\
\vdots \\
x_n
\end{array}\right]=\left[\begin{array}{c}
b_1 \\
\vdots \\
b_m
\end{array}\right]\end{equation}

- Matrices play a central role in linear algebra. They can be used to compactly represent systems of linear equations, but they also represent linear functions (linear mappings) as we will see later.

- **Definition (Matrix)**. With $m, n \in \mathbb{N}$ a real-valued $(m, n)$ matrix $\boldsymbol{A}$ is an $m \cdot n$-tuple of elements $a_{i j}, i=1, \ldots, m, j=1, \ldots, n$, which is ordered according to a rectangular scheme consisting of $m$ rows and $n$ columns:
\begin{equation}\boldsymbol{A}=\left[\begin{array}{cccc}
a_{11} & a_{12} & \cdots & a_{1 n} \\
a_{21} & a_{22} & \cdots & a_{2 n} \\
\vdots & \vdots & & \vdots \\
a_{m 1} & a_{m 2} & \cdots & a_{m n}
\end{array}\right], \quad a_{i j} \in \mathbb{R}\end{equation}

- By convention $(1, n)$-matrices are called rows and $(m, 1)$-matrices are called columns. These special matrices are also called row/column vectors.



- **Vectors**: Usual notation we've seen for vectors in high-school: $\vec{x}$ and $\vec{y}$.
- They can be added ($\vec{x} + \vec{y} = \vec{z}$) or multiplied by a scalar ($\lambda\vec{x}$).

- Matrix **Addition**: The sum of two matrices $\boldsymbol{A} \in \mathbb{R}^{m \times n}, \boldsymbol{B} \in \mathbb{R}^{m \times n}$ is defined as the elementwise sum, i.e.,
\begin{equation}\boldsymbol{A}+\boldsymbol{B}:=\left[\begin{array}{ccc}
a_{11}+b_{11} & \cdots & a_{1 n}+b_{1 n} \\
\vdots & & \vdots \\
a_{m 1}+b_{m 1} & \cdots & a_{m n}+b_{m n}
\end{array}\right] \in \mathbb{R}^{m \times n}\end{equation}

- Matrix **Multiplication**: For matrices $\boldsymbol{A} \in \mathbb{R}^{m \times n}, \boldsymbol{B} \in \mathbb{R}^{n \times k}$, the elements $c_{i j}$ of the product $\boldsymbol{C}=\boldsymbol{A} \boldsymbol{B} \in \mathbb{R}^{m \times k}$ are computed as:
\begin{equation}c_{i j}=\sum_{l=1}^n a_{i l} b_{l j}, \quad i=1, \ldots, m, \quad j=1, \ldots, k\end{equation}

- With above example, this can be formulated as:
\begin{equation}\left[\begin{array}{cc}
4 & 4 \\
2 & -4
\end{array}\right]\left[\begin{array}{c}
x_1 \\
x_2
\end{array}\right]=\left[\begin{array}{c}
5 \\
1
\end{array}\right]\end{equation}

- Let's consider ONE more example:
\begin{equation}\begin{aligned}
& 2 x_1+3 x_2+5 x_3=1 \\
& 4 x_1-2 x_2-7 x_3=8 \\
& 9 x_1+5 x_2-3 x_3=2
\end{aligned}\end{equation}
And its compact form representation?

\begin{equation}\left[\begin{array}{ccc}
2 & 3 & 5 \\
4 & -2 & -7 \\
9 & 5 & -3
\end{array}\right]\left[\begin{array}{l}
x_1 \\
x_2 \\
x_3
\end{array}\right]=\left[\begin{array}{l}
1 \\
8 \\
2
\end{array}\right]\end{equation} Note that $x_1$ scales the first column, $x_2$ the second one, and $x_3$ the third one.

- Generally, a system of linear equations can be compactly represented in their matrix form as $\boldsymbol{A x}=\boldsymbol{b}$, and the product $\boldsymbol{A} \boldsymbol{x}$ is a (linear) combination of the columns of $\boldsymbol{A}$.



In [None]:
# Add two matrix and Multiply scalar to Matrix.
A = np.array([[2,3,5],[4,-2,-7],[9,5,-3]])
B = np.array([[1,7,2],[-1,2,5],[6,5,4]])
print(f"A = \n{A}")
print(f"B = \n{B}")

A_plus_B = A + B
print(f"A + B = \n{A_plus_B}")
print(f"4*A = \n{4*A}")

A = 
[[ 2  3  5]
 [ 4 -2 -7]
 [ 9  5 -3]]
B = 
[[ 1  7  2]
 [-1  2  5]
 [ 6  5  4]]
A + B = 
[[ 3 10  7]
 [ 3  0 -2]
 [15 10  1]]
4*A = 
[[  8  12  20]
 [ 16  -8 -28]
 [ 36  20 -12]]


## Matrix Inverse and Transpose

### Inverse
- The above form relates to the equation: $ax = b \implies x = b/a$.
- Can we do the same thing here? $\boldsymbol{x}=\boldsymbol{b}/\boldsymbol{A}$. But this is not a correct operation, as a vector couldn't be divided by a matrix (sounds weird right?).
- Hence, we need something in place of $1/\boldsymbol{A}$. And since $\boldsymbol{x}$ is vector, for sure $1/\boldsymbol{A}$ should be a matrix.
- As thus the notation $\boldsymbol{A}^{-1}:=$ that represents $1/\boldsymbol{A}$ arise.
- As multipying $a$ with $1/a$ (i.e., $a\dfrac{1}{a} = 1$) gives us $1$. Similarly, $\boldsymbol{A}\boldsymbol{A}^{-1} = \boldsymbol{I} = \boldsymbol{A}^{-1}\boldsymbol{A}$, where $\boldsymbol{I}$ is the identity matrix. $$\boldsymbol{I} = \left[\begin{array}{cccccc}
1 & 0 & \cdot & . & \cdot & 0 \\
0 & 1 & \cdot & \cdot & \cdot & 0 \\
\cdot & \cdot & 1 & & & \cdot \\
\cdot & \cdot & & 1 & & \cdot \\
. & \cdot & & & 1 & \cdot \\
0 & 0 & \cdot & \cdot & \cdot & 1
\end{array}\right]_{n \times n}$$

- Now how to compute inverses? <img src="https://em-content.zobj.net/source/animated-noto-color-emoji/356/thinking-face_1f914.gif" alt="curious-emoji" width="50">

- There are a lot of techniques, namely:
    1. Popular one: **Row-echelon form** ([Gaussian elimination](https://en.wikipedia.org/wiki/Gaussian_elimination))
    2. **Matrix of Minors and Cofactors**: E.g.,
$$
\boldsymbol{A}^{-1}=\frac{1}{a_{11} a_{22}-a_{12} a_{21}}\left[\begin{array}{cc}
a_{22} & -a_{12} \\
-a_{21} & a_{11}
\end{array}\right]
$$
    3. **LU Decomposition**: [Link](https://en.wikipedia.org/wiki/LU_decomposition)
    4. [**Eigenvalue Decomposition**](https://en.wikipedia.org/wiki/Eigendecomposition_of_a_matrix) (if matrix is diagonalizable)
    5. [**Singular Value Decomposition**](https://en.wikipedia.org/wiki/Singular_value_decomposition) (usually followed by machines for inverse computation).

In [None]:
# Solving system of linear equations as mentioned above:
# Example 1:
A = np.array([[4, 4], [2, -4]])
b = np.array([[5], [1]])
# x = A^-1 * b
x = np.linalg.inv(A) @ b
print(f"Vector x = \n{x}")

Vector x = 
[[1.  ]
 [0.25]]


In [None]:
# Second problem
A = np.array([[2,3,5], [4,-2,-7], [9,5,-3]])
b = np.array([[1], [8], [2]])
x = np.linalg.inv(A) @ b
print(f"Vector x = \n{x}")

Vector x = 
[[ 2.44537815]
 [-3.28571429]
 [ 1.19327731]]


### Transpose

- **Definition (Transpose)**. For $\boldsymbol{A} \in \mathbb{R}^{m \times n}$ the matrix $\boldsymbol{B} \in \mathbb{R}^{n \times m}$ with $b_{i j}=a_{j i}$ is called the transpose of $\boldsymbol{A}$. We write $\boldsymbol{B}=\boldsymbol{A}^{\top}$.
\begin{equation}\boldsymbol{A} = \left[\begin{array}{rrr}
1 & 5 & 9 \\
2 & 6 & 10 \\
3 & 7 & 11 \\
4 & 8 & 12
\end{array}\right],\ \boldsymbol{A}^T = \left[\begin{array}{rrrr}
1 & 2 & 3 & 4 \\
5 & 6 & 7 & 8 \\
9 & 10 & 11 & 12
\end{array}\right]\end{equation}

- **Definition (Symmetric Matrix)**. A matrix $\boldsymbol{A} \in \mathbb{R}^{n \times n}$ is symmetric if $\boldsymbol{A}=\boldsymbol{A}^{\top}$
- **Definition (Skew-Symmetric Matrix)**. $\boldsymbol{A}=-\boldsymbol{A}^{\top}$
$$\boldsymbol{A}_{\text{Symm}} = \left[\begin{array}{ccc}
1 & 1 & -1 \\
1 & 2 & 0 \\
-1 & 0 & 5
\end{array}\right],\ \boldsymbol{A}_{\text{SkewSymm}} = \left[\begin{array}{ccc}
0 & 1 & -2 \\
-1 & 0 & 3 \\
2 & -3 & 0
\end{array}\right]$$

In [None]:
# Tranposing Matrix and Vectors
A = np.array([[1,5,9], [2,6,10], [3,7,11], [4,8,12]])
print(f"A = \n{A}\n and A^T = \n{A.T}")

A = 
[[ 1  5  9]
 [ 2  6 10]
 [ 3  7 11]
 [ 4  8 12]]
 and A^T = 
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]


### Matrix as transfromation (Revisiting what we have started):

- Consider a $2\times 2$ matrix: $\boldsymbol{A}=\left[\begin{array}{cc}
3 & 1 \\
1 & 2
\end{array}\right]$.

- $\boldsymbol{A}\left[\begin{array}{c}
1 \\
0
\end{array}\right]=\left[\begin{array}{cc}
3 & 1 \\
1 & 2
\end{array}\right]\left[\begin{array}{c}
1 \\
0
\end{array}\right] = \left[\begin{array}{c}
3 \\
1
\end{array}\right]$, and $\left[\begin{array}{cc}
3 & 1 \\
1 & 2
\end{array}\right]\left[\begin{array}{c}
0 \\
1
\end{array}\right] = \left[\begin{array}{c}
1 \\
2
\end{array}\right]$

- $\boldsymbol{A}$ sends $\left[\begin{array}{c}
1 \\0
\end{array}\right]$ (x-axis) $ ⟶ \left[\begin{array}{c}
3 \\
1
\end{array}\right]$ and $\left[\begin{array}{c}
0 \\1
\end{array}\right]$ (y-axis) $ ⟶ \left[\begin{array}{c}
1 \\
2
\end{array}\right]$

- $\boldsymbol{A}\left[\begin{array}{c}
x_1 \\
x_2
\end{array}\right] = \boldsymbol{A}\left(x_1\left[\begin{array}{c}
1 \\
0
\end{array}\right] + x_2\left[\begin{array}{c}
0 \\
1
\end{array}\right]\right) = x_1\left(\boldsymbol{A}\left[\begin{array}{c}
1 \\
0
\end{array}\right]\right) + x_2\left(\boldsymbol{A}\left[\begin{array}{c}
0 \\
1
\end{array}\right]\right) = x_1\left[\begin{array}{c}
3 \\
1
\end{array}\right] + x_2\left[\begin{array}{c}
1 \\
2
\end{array}\right] = \left[\begin{array}{c}
3 x_1 + x_2 \\
x_1 + 2 x_2
\end{array}\right]$

<center><img src="https://drive.google.com/uc?export=view&id=1ZqkP5ZasljmbWL9rsaKBORkWaBD1-xzy" height="300"></center>

## Linear Independence (Intuition):

- High-school: Any vector in 2D represented as $⟶ x\hat{i} + y\hat{j}$, where this vector is a point in 2D plane with coordinates $= (x, y)$ <center><img src="https://drive.google.com/uc?export=view&id=1jpTgEjosaEaQ2DWzCSmWkueJOC2IABJm"></center>

- What are these $\hat{i}$ and $\hat{j}$?

- Informally: x-axis and y-axis. Formally: Unit vectors along x-axis and y-axis (or $\hat{i} = \left[\begin{array}{c}
1 \\
0
\end{array}\right],\ \hat{j} = \left[\begin{array}{c}
0 \\
1
\end{array}\right]$).

- ***Can we write $\hat{j}$ in terms of $\hat{i}$***? **NEVER**

- And this phenomenon is popularly called: ***Linear Independence***. Informally, if none of the vectors from a set of vectors can be expressed/represented in terms of the rest, then we exhibit the property of linear independence.

- Questions (are they linearly dependent or independent):
    - $4\hat{i} + 3\hat{j}$ and $\hat{i}$
    - $4\hat{i}$ and $\hat{i}$
    - $4\hat{i} + 3\hat{j}$ and $\hat{j}$
    - $3\hat{j}$ and $\hat{i}+\hat{j}$
    - \begin{equation}\left[\begin{array}{c} 1 \\ 2\end{array}\right]\text{and}\left[\begin{array}{c} 2 \\ 4\end{array}\right]\end{equation}
- This independence is essential, as we saw, to represent any vector in this 2D plane (or 2D space!). And this is what we call ***basis vectors*** (basis $←$ base/foundation of 2D space)

- Definition: A set of linearly independent vectors that spans the whole space.

- Ask yourself a question: *Can two linearly dependent vectors span the 2D space? Take example $4\hat{i}$ and $\hat{i}$*
<center><img src="https://drive.google.com/uc?export=view&id=1YbiQP6hBKELM0jyCwXeXB8BIbzzskfnU" alt="" height="300"></center>

  

## Inner Product

### Dot-product

- Revisit to High-school: Given two vectors $\vec{a}$ and $\vec{b}$, dot-product between them $ = \vec{a}.\vec{b} = \|\vec{a}\|\|\vec{b}\|\cos\theta \stackrel{?}{=} a_1.b_1 + a_2.b_2 + \ldots + a_n.b_n = \boldsymbol{a}^T\boldsymbol{b}$.
- E.g., $\left[\begin{array}{l}
2 \\
7 \\
1
\end{array}\right] \cdot\left[\begin{array}{l}
8 \\
2 \\
8
\end{array}\right] = 2\times 8 + 7\times 2 + 1\times 8 = 38$

- What does that mean? </center> <center><img src="https://drive.google.com/uc?export=view&id=1dgweXEDRVmvHjkZBS92qDiVOxKEcD0ij" width="450"></center>  Falsh a light from top and get the projection of $\vec{a}$ onto $\vec{b}$ ($\|\vec{a}\|\cos\theta$) followed by multiplying it with $\|\vec{b}\|$. Note: $\cos\theta = \dfrac{\text{base}}{\text{hypotenuse}} = \dfrac{\text{base}}{\|\vec{a}\|}$; $\implies \text{base} = \|\vec{a}\|\cos\theta$ <center><img src="https://drive.google.com/uc?export=view&id=1MnHDDunb34XL0n2i6OBaC8YlKsSJw3vb" width="300">

- What will happen if one of the vetcor is perpendiclar to another? (e.g., $\vec{a}\perp\vec{b}$)

- What will be the dot-product of a vector with itself?
$\vec{a}.\vec{a} = a_1^2 + a_2^2 + \ldots + a_n^2 = \|\vec{a}\|\|\vec{a}\|\cos 0^o = \|\vec{a}\|^2$

- [Fun fact]: Did you mark something?\
$\|\vec{a}\|^2 = a_1^2 + a_2^2 + \ldots + a_n^2$. This is the definition for magnitude of a vector, which intuitively follows *Pythagorean* theorem to give the most famous notion of distance in real world, the ***Euclidean distance***.

- *For more intuition and how this $\|\vec{a}\|\|\vec{b}\|\cos\theta \stackrel{?}{=} a_1.b_1 + a_2.b_2 + \ldots + a_n.b_n$, follow this NICE [video](https://www.youtube.com/watch?v=LyGKycYT2v0)*

In [None]:
# Dot Product
a = np.array([[2], [7], [1]])
b = np.array([[8], [2], [8]])
a_dot_b = a.T @ b
print(f"a.b = {a_dot_b}")

# Magnitude of a
a_dot_a = a.T @ a
print(f"Magnitude of vector a = {np.sqrt(a_dot_a)}")
print(f"Magnitude computed using np.linalg.norm for a = {np.linalg.norm(a)}")

a.b = [[38]]
Magnitude of vector a = [[7.34846923]]
Magnitude computed using np.linalg.norm for a = 7.3484692283495345


### Inner Product - $⟨.,.⟩$

- What we have just seen now, is one special form of inner product.
- In general, it allows the introduction of intuitive geometrical concepts, such as the length of a vector and the angle or distance between two vectors.
- To generalize such concepts, we require a definite set of mathemaical properties and a well-defined vector-space (e.g., 2D space). For more info, follow this [link](https://en.wikipedia.org/wiki/Inner_product_space).
- For our use case, we will consider the simple definition which we already witnessed:
    1. $⟨.,.⟩: V → \mathbb{R}$, Take two vectors and maps them onto a real number. (E.g., $\vec{a}.\vec{b}$ is scalar)
    2. $⟨\vec{a},\vec{a}⟩\geq 0$, The vector magnitude is always positive (more correctly, non-negative).
    3. $⟨\vec{a},\vec{b}⟩ = ⟨\vec{b},\vec{a}⟩$, Order doesn't matter (Commutative).

- Example: Consider $V = \mathbb{R}^2$. If we define, $$⟨\boldsymbol{x}, \boldsymbol{y}⟩ = x_1y_1 - (x_1y_2 + x_2y_1) + 2 x_2y_2$$, then $⟨.,.⟩$ is an inner product that's different from the dot-product.

- What is $\|\boldsymbol{x}\| = \sqrt{⟨\boldsymbol{x}, \boldsymbol{x}⟩}$ with above inner-product rule?

- Infact, every such inner-product operation induce its own norm (*the fancy mathematical term for magnitude*). Along with the given vector-space, they together forms, "*normed linear/vector space*" to which *Hilbert* space is a subspace.

- Next, we'll discuss on norms which are not induced by inner-product (except one). *Feel free to skip it.*


### [OPTIONAL] Norm - The Magnitude Calculator


- Who said that adding up squared components is the distance. Since our world is *euclidean* or *cartesian* in nature, squaring things makes sense.

- $\|\vec{a}\|^p = a_1^p + a_2^p + \ldots + a_n^p$ can also be termed as a way of looking at magnitude of $\vec{a}$, i.e., $\|\vec{a}\|_p = \left(a_1^p + a_2^p + \ldots + a_n^p\right)^{1/p}$. And mathematically, this is termed as $L^p$ norm (or *Minkowski Distance*).

- Similarly, *Euclidean* norm is called as $L^2$ norm, while $L^1$ is called as *Manhattan* norm.

- Such way of computing distance or magnitude, induce a mathematical space on its own, which we simply called [$L^p$ space](https://en.wikipedia.org/wiki/Lp_space).

## Similarity Measure

- How similar or dissimilar the vectors are can be  gauged through their relative distance in the space.
### Distance as Similarity

- **Euclidean Distance ($L^2$)**: Length of a segment connecting the two points. The most obvious way of representing distance between two points. Formula $ = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2}$ <center><img src="https://miro.medium.com/v2/resize:fit:828/format:webp/1*I_CIWl4fQfsKHSYk90kgpw.png" width="300" alt="euclidean-dist"></center>
- **Manhattan distance ($L^1$)**: also known as city block distance or L1 distance measures the distance between two points by summing the absolute differences of their Cartesian coordinates. It represents the distance traveled along orthogonal axes in a grid-like pattern and is commonly used in image processing and clustering algorithms. Formula $ = |x_2 - x_1| + |y_2 - y_1|$ <center><img src="https://miro.medium.com/v2/resize:fit:828/format:webp/1*zEc7etVDTIjYtIlK3N2-4Q.png" width="300" alt="euclidean-dist"></center>

### Angle as Similarity

- **Cosine similarity $\left(\cos\theta = \dfrac{\vec{a}.\vec{b}}{\|\vec{a}\|\|\vec{b}\|}\right)$**: Cosine similarity measures the cosine of the angle between two vectors. It ranges from -1 to 1, where 1 indicates perfect similarity, 0 indicates no similarity, and -1 indicates perfect dissimilarity. Cosine similarity is widely used in text mining and information retrieval tasks. It is advantageous because even if the two similar documents are far apart by the Euclidean distance (due to the size of the document), chances are they may still be oriented closer together. <center><img src="https://www.oreilly.com/api/v2/epubs/9781788295758/files/assets/2b4a7a82-ad4c-4b2a-b808-e423a334de6f.png" width="300" alt="euclidean-dist"></center>

In [None]:
# Similarity measure between two vectors.
a = np.array([[2], [7], [1]])
b = np.array([[8], [2], [8]])

# Euclidean distance.
e_ab = np.sqrt(np.sum((b-a)**2))
print(f"Euclidean distance between a and b = {e_ab}")
print(f"Verify: Same as L2 norm of b-a vector = {np.linalg.norm(b-a)}")

# Manhattan distance.
m_ab = np.sum(np.abs(b-a))
print(f"Manhattan distance between a and b = {m_ab}")
print(f"Verify: Same as L1 norm of b-a vector = {np.linalg.norm(b-a, ord=1)}")

# Cosine similarity.
c_ab = (a.T @ b)/(np.linalg.norm(a)*np.linalg.norm(b))
print(f"Cosine Similarity between a and b = {c_ab}")
print(f"Verify: From sklearn package = {cosine_similarity(a.T, b.T)}")

Euclidean distance between a and b = 10.488088481701515
Verify: Same as L2 norm of b-a vector = 10.488088481701515
Manhattan distance between a and b = 18
Verify: Same as L1 norm of b-a vector = 18.0
Cosine Similarity between a and b = [[0.4500904]]
Verify: From sklearn package = [[0.4500904]]


## Real-Valued function of $\boldsymbol{A}$
### Determinant of square-matrix  $\boldsymbol{A}$

$$
\operatorname{det}(\boldsymbol{A}):=\left|\begin{array}{cccc}
a_{11} & a_{12} & \ldots & a_{1 n} \\
a_{21} & a_{22} & \ldots & a_{2 n} \\
\vdots & & \ddots & \vdots \\
a_{n 1} & a_{n 2} & \ldots & a_{n n}
\end{array}\right| .
$$

- How to compute for a $2\times 2$ matrix?
$$
\operatorname{det}(\boldsymbol{A})=\left|\begin{array}{ll}
a_{11} & a_{12} \\
a_{21} & a_{22}
\end{array}\right|=a_{11} a_{22}-a_{12} a_{21} .
$$

- E.g., $\left|\begin{array}{ll}
2 & 1 \\
0 & 2
\end{array}\right|=2\times 2 - 0\times 1 = 4$

- But what's the REAL intuition behind determinant?\
Let's decode it step by step:
    1. We all know what matrix transformation is right? <center><img src="https://miro.medium.com/v2/resize:fit:1100/format:webp/1*ZvFXLtqy2_iGGs4rHNxhNg.png"></center>
    2. Look more closely, what's it doing: <center><img src="https://miro.medium.com/v2/resize:fit:1100/format:webp/1*zDbJZGxtTi5lqrGebgmseQ.png"></center>
    3. Exactly! It looks like our chosen matrix stretches space apart. Whatever area in the input space we choose, it seems that after the transformation the area gets bigger. This is precisely what the determinant is! ***The determinant of a matrix is the factor by which areas are scaled by this matrix.***
    4. Because matrices are linear transformations it is enough to know the scaling factor for one single area to know the scaling factor for all areas. <center><img src="https://miro.medium.com/v2/resize:fit:1100/format:webp/1*QE93C-F8Pa0thOf9Up-BXQ.png"></center>
    5. The rectangle inscribed by the pink and blue unit vectors and has an area of 1. After applying our matrix transformation, this rectangle has turned into a parallelogram with base 2 and height 2. So it has an area of 4. This means, that our matrix scales areas by a factor of 4. Therefore, the determinant of our matrix is 4. NEAT, right?
    6. BUT determinants can be **negative** right. How to deal about that? If a matrix has a negative determinant, then it just means that space reversed its orientation. <center><img src="https://miro.medium.com/v2/resize:fit:1100/format:webp/1*r9UQamEXvdOfAGHB71dLSA.png"></center>
    7. More formally, The determinant is the signed volume of the parallelepiped formed by the columns of the matrix.
    8. Now a BIGGGG one: `If a matrix has a determinant of 0 it is non-invertible.` WHY? Any idea?
    <!-- This means that the matrix scales all areas by a factor of 0, which in turn means that all areas become 0 after the transformation. This can only happen if the matrix squishes the whole space into a lower dimension. For example, the two-dimensional space would be squished into a single line or point and such a transformation cannot be undone. -->

- Properties of Determinant:
    1. The determinant of a matrix product is the product of the corresponding  determinants, $\operatorname{det}(\boldsymbol{A B})=\operatorname{det}(\boldsymbol{A}) \operatorname{det}(\boldsymbol{B})$.
    2. Determinants are invariant to transposition, i.e., $\operatorname{det}(\boldsymbol{A})=\operatorname{det}\left(\boldsymbol{A}^{\top}\right)$.
    3. If $\boldsymbol{A}$ is regular (invertible), then $\operatorname{det}\left(\boldsymbol{A}^{-1}\right)=\dfrac{1}{\operatorname{det}(\boldsymbol{A})}$.
    4. Multiplication of a column/row with $\lambda \in \mathbb{R}$ scales $\operatorname{det}(\boldsymbol{A})$ by $\lambda$. In particular, $\operatorname{det}(\lambda \boldsymbol{A})=\lambda^{n} \operatorname{det}(\boldsymbol{A})$.
    5. Swapping two rows/columns changes the sign of $\operatorname{det}(\boldsymbol{A})$.

To know more and gain beautiful intuition, follow this [link](https://www.youtube.com/watch?v=Ip3X9LOh2dk)

In [None]:
HTML('<iframe width="560" height="315" src="https://www.youtube.com/embed/Ip3X9LOh2dk?start=47" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>')

In [None]:
# Compute determinant of matrix...
A = np.array([[2,3,5], [4,-2,-7], [9,5,-3]])
print(f"A = \n{A}")

# Det
detA = np.linalg.det(A)
print(f"Det(A) = {detA:.3f}")

# Verifying properties...
B = np.array([[1,3,5], [4,-1,-7], [8,5,-3]])
print(f"1. Det(A) = {detA:.3f}, Det(B) = {np.linalg.det(B):.3f}, Det(AB) = {np.linalg.det(A @ B):.3f}")
print(f"2. Det(A) = {detA:.3f}, Det(A^T) = {np.linalg.det(A.T):.3f}")
print(f"3. Det(A^-1) = {np.linalg.det(np.linalg.inv(A)):.3f}, 1/Det(A) =  {1/detA:.3f}")
print(f"4. Det(2*A) = {np.linalg.det(2*A):.3f}, (2^3)*Det(A) = {(2**3)*detA:.3f}")
# Swapping first and second row
A[[0,1]] = A[[1,0]]
print(f"5. Det(A) = {detA:.3f}, swapping 1st and 2nd row = {np.linalg.det(A):.3f}")

A = 
[[ 2  3  5]
 [ 4 -2 -7]
 [ 9  5 -3]]
Det(A) = 119.000
1. Det(A) = 119.000, Det(B) = 46.000, Det(AB) = 5474.000
2. Det(A) = 119.000, Det(A^T) = 119.000
3. Det(A^-1) = 0.008, 1/Det(A) =  0.008
4. Det(2*A) = 952.000, (2^3)*Det(A) = 952.000
5. Det(A) = 119.000, swapping 1st and 2nd row = -119.000


### [OPTIONAL]: Trace of $\boldsymbol{A}$

**Definition**: The trace of a square matrix $\boldsymbol{A} \in \mathbb{R}^{n \times n}$ is defined as $$
\operatorname{tr}(\boldsymbol{A}):=\sum_{i=1}^{n} a_{i i}
$$ i.e. , the trace is the sum of the principal diagonal elements of $\boldsymbol{A}$.

The trace satisfies the following properties:
- $\operatorname{tr}(\boldsymbol{A}+\boldsymbol{B})=\operatorname{tr}(\boldsymbol{A})+\operatorname{tr}(\boldsymbol{B})$ for $\boldsymbol{A}, \boldsymbol{B} \in \mathbb{R}^{n \times n}$
- $\operatorname{tr}(\alpha \boldsymbol{A})=\alpha \operatorname{tr}(\boldsymbol{A}), \alpha \in \mathbb{R}$ for $\boldsymbol{A} \in \mathbb{R}^{n \times n}$
- $\operatorname{tr}\left(\boldsymbol{I}_{n}\right)=n$
- $\operatorname{tr}(\boldsymbol{A B})=\operatorname{tr}(\boldsymbol{B} \boldsymbol{A})$ for $\boldsymbol{A} \in \mathbb{R}^{n \times k}, \boldsymbol{B} \in \mathbb{R}^{k \times n}$

In [None]:
# Compute trace of Matrix...
A = np.array([[2,3,5], [4,-2,-7], [9,5,-3]])
print(f"A = \n{A}")
B = np.array([[1,3,5], [4,1,-7], [8,5,-3]])
print(f"B = \n{B}")

# Trace...
trA = np.sum(np.diag(A))
trB = np.sum(np.diag(B))
print(f"Trace(A) = {trA}\nTrace(B) = {trB}")

# Verify properties...
print(f"1. Trace(A+B) = {np.trace(A+B)}, and Trace(A) + Trace(B) = {trA + trB}")
print(f"2. Trace(2*A) = {np.trace(2*A)}, and 2*Trace(A) = {2*trA}")
print(f"3. Trace(I_3) = {np.trace(np.eye(3))}")
print(f"4. Trace(AB) = {np.trace(A @ B)}, and Trace(BA) = {np.trace(B @ A)}")

A = 
[[ 2  3  5]
 [ 4 -2 -7]
 [ 9  5 -3]]
B = 
[[ 1  3  5]
 [ 4  1 -7]
 [ 8  5 -3]]
Trace(A) = -3
Trace(B) = -1
1. Trace(A+B) = -4, and Trace(A) + Trace(B) = -4
2. Trace(2*A) = -6, and 2*Trace(A) = -6
3. Trace(I_3) = 3.0
4. Trace(AB) = 48, and Trace(BA) = 48


## Eigen-Values and Eigen-Vectors

$$
\boldsymbol{A x}=\lambda \boldsymbol{x} .
$$
where,
1. $\lambda$ is an eigenvalue of $\boldsymbol{A} \in \mathbb{R}^{n \times n}$.
2. And $\boldsymbol{x} \in \mathbb{R}^{n} \backslash\{\mathbf{0}\}$ is such a vector that satisfies $\boldsymbol{A x}=\lambda \boldsymbol{x}$, or equivalently, $(\boldsymbol{A}$ $\left.\lambda \boldsymbol{I}_{n}\right) \boldsymbol{x}=\mathbf{0}$ can be solved non-trivially, i.e., $\boldsymbol{x} \neq \mathbf{0}$.

- First of all what is *eigen*? Eigen is a German word meaning "characteristic", "self", or "own". WHY? such words?
<!-- It is because after transformation vector got knocked off from its span. But there are some, who remains in their span or same line (direction can also change) -->

In [None]:
HTML('<iframe width="560" height="315" src="https://www.youtube.com/embed/PFDu9oVAE-g?start=81" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>')

**How to solve for eigen-values and eigen-vectors?**

$$
\begin{aligned}
\boldsymbol{A} \boldsymbol{x}=\lambda \boldsymbol{x} & \Longleftrightarrow \boldsymbol{A} \boldsymbol{x}-\lambda \boldsymbol{x}=\mathbf{0} \\
& \Longleftrightarrow(\boldsymbol{A}-\lambda \boldsymbol{I}) \boldsymbol{x}=\mathbf{0}
\end{aligned}
$$

- Now consider this as a SLE, where $\lambda$ as well as co-ordinates of $\boldsymbol{x}$ are still unknown. And we need to solve for them. This is what we popularly call `Characteristic equation`

- Let's do one with our code.

In [None]:
# Compute eigen-values and eigen-vectors (space = R^3)
# Note: We'll throughout this session will deal with real spaces ONLY.
A = np.array([[-1, -2, 2], [4,3,-4], [0,-2,1]])
print(f"A = \n{A}")

# Compute eigen values and vectors
eigenvalues, eigenvectors = np.linalg.eig(A)
print(f"Eigen-values of A:\n{eigenvalues}")
print(f"Eigen-vectors of A:\n{eigenvectors}")

# Verify:
print(f"A*v1 = \n{A @ eigenvectors[:,0:1]},\nand 3*v1 = \n{3*eigenvectors[:,0:1]}\n\n")
print(f"A*v2 = \n{A @ eigenvectors[:,1:2]},\nand 1*v2 = \n{1*eigenvectors[:,1:2]}\n\n")
print(f"A*v3 = \n{A @ eigenvectors[:,2:3]},\nand -1*v3 = \n{-1*eigenvectors[:,2:3]}\n\n")

A = 
[[-1 -2  2]
 [ 4  3 -4]
 [ 0 -2  1]]
Eigen-values of A:
[ 3.  1. -1.]
Eigen-vectors of A:
[[-5.77350269e-01 -7.07106781e-01  0.00000000e+00]
 [ 5.77350269e-01 -2.56395025e-16  7.07106781e-01]
 [-5.77350269e-01 -7.07106781e-01  7.07106781e-01]]
A*v1 = 
[[-1.73205081]
 [ 1.73205081]
 [-1.73205081]],
and 3*v1 = 
[[-1.73205081]
 [ 1.73205081]
 [-1.73205081]]


A*v2 = 
[[-7.07106781e-01]
 [-8.88178420e-16]
 [-7.07106781e-01]],
and 1*v2 = 
[[-7.07106781e-01]
 [-2.56395025e-16]
 [-7.07106781e-01]]


A*v3 = 
[[-4.44089210e-16]
 [-7.07106781e-01]
 [-7.07106781e-01]],
and -1*v3 = 
[[-0.        ]
 [-0.70710678]
 [-0.70710678]]




# Probability and Statistics

Contents:
1. Joint probability
2. Conditional probability
3. Marginal probability

## Joint Distribution: $P(X, Y)$

- Consider the discrete joint probability below (satisfies the total sum $=1$)

| P(X,Y) 	| Y = 1 	| Y = 2 	| Y = 3 	|      	|
|:------:	|:-----:	|:-----:	|:-----:	|:----:	|
|  X = 1 	|  1/6  	|  1/6  	|   0   	|  1/3 	|
|  X = 2 	|   0   	|  1/6  	|  1/4  	| 5/12 	|
|  X = 3 	|  1/12 	|   0   	|  1/6  	|  1/4 	|
|       	|  1/4  	|  1/3  	|  5/12 	|   1  	|

- When both $X$ and $Y$, combinedly follows *Gaussian* distribution. <center><img src="https://drive.google.com/uc?export=view&id=1jo1Pq52orFeMBmMXmre6RaVHHI2SuDo9" width="400"></center>
<center><img src="https://drive.google.com/uc?export=view&id=1SWNWacOP9uvHfaDGCSEejhBE2pQmrFxB" width="400"></center>

## Marginal Distribution: $P(X)$ or $P(Y)$

|  P(X,Y)  	|  Y = 1  	|  Y = 2  	|   Y = 3  	| **P(X)** 	|
|:--------:	|:-------:	|:-------:	|:--------:	|:--------:	|
|   X = 1  	|   1/6   	|   1/6   	|     0    	|  **1/3** 	|
|   X = 2  	|    0    	|   1/6   	|    1/4   	| **5/12** 	|
|   X = 3  	|   1/12  	|    0    	|    1/6   	|  **1/4** 	|
| **P(Y)** 	| **1/4** 	| **1/3** 	| **5/12** 	|   **1**  	|

- Why margin? because they appear at the margin!😸
- In terms of Joint Gaussian distribution, the projection of joint-distribution on x-z plane and y-z plane.
- In terms of Venn Diagrams: $Area(A)$ and $Area(B)$.

## Conditional Distribution: $P(X|Y)$ or $P(Y|X)$
- In terms of discrete distribution: What is $P(X|Y = 3)$?

|  P(X,Y)  	     |   Y = 3  	|
|:-------------: |:--------:	|
|   P(X = 1, Y=3)|     0    	|
|   P(X = 2, Y=3)|    1/4   	|
|   P(X = 3, Y=3)|    1/6   	|
| P(Y = 3)       | **5/12** 	|

> We haven't answered the above question yet. After next illustration will do

- In terms of Venn, <center><img src="https://drive.google.com/uc?export=view&id=1lSGMslKAphvdFU6E5jptJBPHjEGv_y-h" width="400"></center>

- Now, $P(X|Y = 3)$ is as follows which sums up to $1$.

|  P(X,Y)  	     |   Y = 3  	|
|:-------------: |:--------:	|
|   P(X = 1, Y=3)|     0    	|
|   P(X = 2, Y=3)|    3/5   	|
|   P(X = 3, Y=3)|    2/5   	|




# Extra Resources:

1. https://bvanderlei.github.io/jupyter-guide-to-linear-algebra/
2. https://notebook.community/dcavar/python-tutorial-for-ipython/notebooks/Linear%20Algebra
3. http://rlhick.people.wm.edu/stories/linear-algebra-python-basics.html
4. https://colab.research.google.com/github/jonkrohn/ML-foundations/blob/master/notebooks/1-intro-to-linear-algebra.ipynb
5. https://pythonnumericalmethods.berkeley.edu/notebooks/chapter14.01-Basics-of-Linear-Algebra.html

<!-- # ## Linear Algebra
# - System of Linear equations
# - Formulting matrix-vector product and solve for them...
# - connecting it to intuition of line intersection and how matrix transforms one vector to another.
# - Linear independence intuition - via - 2D plane and two vectors.
# - Inner products - interms of geometry
# - Norms - originate from inner products.
# - Angles -> Cosine similarity
# - Orthogonality -> Least pojection
# - Determinant - Physical intuition
# - Traces - "
# - Lastly - What are eigenvalues and eigenvectors - Via intuition

# ## Prob and Stats
# - What are conditional, marginal, joint. -->

<!-- [link text](https://)# **Introduction to Python** -->

<!-- Python is a high-level, dynamically typed multiparadigm programming language. Python code is often said to be almost like pseudocode, since it allows you to express very powerful ideas in very few lines of code while being very readable.


Python is a great general-purpose programming language on its own, but with the help of a few popular libraries (numpy, scipy, matplotlib) it becomes a powerful environment for scientific computing. -->

<!-- In this tutorial, we will cover:

* Basic data types
* Conditional and Control statements
* Containers (Lists, Dictionaries, Sets, Tuples)
* Functions, Classes -->
