# <b> Line passing through two vectors

Let there be two vectors:  
$v = [a, b, c]$  
$u = [x, y, z]$  

<font color=red><b>Line passing through "v" in the direction of "u"</font></b>  
<h3>$\boxed{v + \alpha u}$

# <b>Orthogonal Matrices

![image.png](attachment:ea9f7feb-0722-4720-a46f-1ad3c3a7ef12.png)

# <b>Gradient -> <font color=red>Rate of Change is most rapid

<img src="https://th.bing.com/th/id/OIP.gKmqAwvmMy1bAm63Apz0SAHaE8?rs=1&pid=ImgDetMain" width="400" height="400">

<h3> Gradient is the <font color=red>Steepest Path</font> 

![image.png](attachment:67b17add-5190-4f94-b5e0-fe38ac998eee.png)

![image.png](attachment:f99f9589-dbf4-4d2c-9131-dd95ebe84d54.png)

The function $f$ increases most rapidly in the direction of the gradient $\nabla f$.  

The gradient is $\nabla = [2x, 4y, -6z]$  

At the point $P_0 = [1, 1, 1]$, the gradient of $f$ is $\nabla (1,1, 1) = \langle 2, 4, -6 \rangle$.  

The rate of change of $f$ in the direction where $f$ increases most rapidly at $P_0$ is given by the magnitude of the gradient at $P_0$, which is:  

$||\nabla f(P_0)|| = \sqrt{(2)^2 + (4)^2 + (-6)^2} = \sqrt{4 + 16 + 36} = \sqrt{56}$  

So, the rate of change of $f$ in the direction where $f$ increases most rapidly at $P_0$ is $\sqrt{56}$.

# <b> Direction of Steepest Ascent

![image.png](attachment:f66e89d6-2686-4abb-ada0-42ef0b677517.png)

# <b>Directional Derivative

$$v^T \cdot \nabla (GivenPoint) = \begin{bmatrix} a & b & c\end{bmatrix} \begin{bmatrix} \frac{\partial f}{\partial x} \\ \frac{\partial f}{\partial y} \\ \frac{\partial f}{\partial z} \end{bmatrix} = \text{Scalar Dot Product}$$

1. There will be a function <b>f</b>
2. There will be a point/vector P
3. And there will be another point/vector in whose direction you need to compute the directional derivative.

![image.png](attachment:7281400e-7389-40e0-8cca-95ac85facfed.png)  
![image.png](attachment:24c8090e-96ce-416e-9b92-ed559fcaa13a.png)

![image.png](attachment:a2f075c1-ae2f-40b3-88b7-6aae13290a88.png)  
![image.png](attachment:5231e130-6e2f-4cce-bfd1-e4e8ed6df9cf.png)  
![image.png](attachment:01c1bafa-c7a9-4e3f-8c4c-625bf06841a5.png)

# **Gradient Descent**

### <h2> Why choose Gradient Descent over Direct Differentiation?

Gradient descent is often preferred over directly differentiating the function and finding the points of local/global maxima or minima for several reasons:

1. **Computational Efficiency**: For high-dimensional data, finding the derivative of the function and solving for the critical points can be computationally expensive or even infeasible. Gradient descent, on the other hand, is an iterative method that makes small adjustments based on the local gradient, which can be more efficient.

2. **Non-Differentiable Functions**: Not all functions are differentiable everywhere. Gradient descent can still be applied to these functions using subgradients.

3. **Online Learning**: In some machine learning tasks, the data arrives sequentially (online learning). In these cases, the model needs to update as new data comes in. Gradient descent naturally supports this, while solving the derivative equation does not.

### $\boxed{x_{n+1} = x_n - \eta ∇f}$

### <u>**Gradient or Curl of a Function**  $$∇f = \begin{bmatrix}{\frac{\partial f}{\partial x}} \\\ {\frac{\partial f}{\partial y}} \\\ {\frac{\partial f}{\partial z}} \end{bmatrix}$$

### <h3> <b>Properties

![image.png](attachment:f4ebfd39-2814-42f0-9411-4402691e86e4.png)

![image.png](attachment:2ae0fbf9-9910-4b73-bc14-f1508136be5f.png)

##### <h3>Code

    """
    Gradient Descent Algorithm Implementation

    Parameters:
    func: The function to minimize
    grad_func: The gradient of the function
    initial_guess: The initial point to start the gradient descent
    step_size: The step size for each iteration
    k: The number of iterations to perform

    Returns:
    The point after k iterations
    """

In [None]:
!mamba install sympy -y

In [6]:
from sympy import symbols, diff, lambdify
import numpy as np

# Define your variables
x, y = symbols('x y')

# Define your function
func = 8*x**2 + 2*x*y - y**2

# Compute the gradient
grad_expr = [diff(func, var) for var in (x, y)]

# Convert symbolic expressions to functions
func = lambdify((x, y), func, 'numpy')
grad_func = [lambdify((x, y), expr, 'numpy') for expr in grad_expr]

def gradient_descent_kth(func, grad_func, initial_guess, step_size, k):

    current_point = np.array(initial_guess)
    for i in range(k):
        grad = np.array([g(*current_point) for g in grad_func])
        next_point = current_point - step_size * grad
        current_point = next_point
    return current_point

initial_guess = [0, 1]
step_size = 0.5
k = 2
kth_point = gradient_descent_kth(func, grad_func, initial_guess, step_size, k)
print(f"The point after {k} iterations is {kth_point}")


The point after 2 iterations is [5. 5.]


#### **Example 1**

![image.png](attachment:image.png)

$\boxed{x_{n+1} = x_n - \eta ∇f}$  

$x_n = \begin{bmatrix}1\\\ 1\\\ 1 \end{bmatrix}$  

$\eta = 1$  

$∇f = \begin{bmatrix}f_{x}\\\ f_{y}\\\ f_{z}\end{bmatrix} = \begin{bmatrix}2x_1 -2x_2 - 2x_3\\\ 2x_2 -2x_1 - 2x_3\\\ 2x_3 -2x_2 - 2x_1\end{bmatrix}$  

$∇f = \begin{bmatrix}2(x_1 -x_2 - x_3)\\\ 2(x_2 -x_1 - x_3)\\\ 2(x_3 -x_2 - x_1)\end{bmatrix}$  

$∇f = \begin{bmatrix}2(1 -1 - 1)\\\ 2(1 - 1 - 1)\\\ 2(1 - 1 - 1)\end{bmatrix} = \begin{bmatrix}-2\\\ -2\\\ -2\end{bmatrix}$

$x_{n+1} = x_n - \eta ∇f$  

$x_{n+1} = \begin{bmatrix}1\\\ 1\\\ 1 \end{bmatrix} - 1 \cdot \begin{bmatrix}-2\\\ -2\\\ -2\end{bmatrix}$  

$x_{n+1} = \begin{bmatrix}3\\\ 3\\\ 3 \end{bmatrix}$

##### **Value at** $\boxed{x_{n+1} = (3, 3, 3)}$

$f(3, 3, 3) = 9 + 9 + 9 - 18 - 18 - 18  = -27$

#### <b> Example 2

The value of the $f(x_1, x_2) = 4x_1^2 - 4x_1x_2 + 2x_2^2$ with an initial guess of $(2, 3)$ after two iterations of gradient descent algorithm will be .............  

Take the step $\eta = \frac{1}{t}$, where t = no. of iterations.

$$x_{n+1} = x_n - \eta \nabla f$$

# **Remember Con**$\textcolor{brown}{'CAVE'}$

![image.png](attachment:image.png)

# **Lagrangian Optimization**

### $$\boxed{∇f = \lambda ∇g}$$

## <u>**Example 1**

### A box (cuboid shaped) is to made out of the cardboard with the total area of 24 $cm^2$. Find the maximum volume occupied by the box.

<b>ASK: "WHAT IS SAME FOR BOTH THE CARDBOARD AND THE BOX --> AREA"</b>  

Given Area of Cardboard = 24 $cm^2$  

The area of box should be = Area of the cardboard  
$2xy + 2xz + 2yz = 24$  
$$xy + xz + yz = 12$$  


$$\text{Volume of box (to be maximized)} = xyz$$  

#### Let **any** of the two functions be **f(x) or g(x)**

Let  
$f(x) = xyz$  
$g(x) = xy + xz + yz$

#### **$The$ Lagrangian**  $\boxed{∇f = \lambda ∇g}$

$\begin{bmatrix}yz\\\ xz\\\ xy\end{bmatrix} = \lambda \cdot \begin{bmatrix}y + z\\\ x + z\\\ x + y\end{bmatrix}$

#### **Find $\lambda$ equations**

$yz = \lambda (y + z)$  

$\lambda = \frac{yz}{y + z}$...........................(1) 

$xz = \lambda (x + z)$  

$\lambda = \frac{xz}{x + z}$...........................(2)

$xy = \lambda (x + y)$  

$\lambda = \frac{xy}{x + y}$...........................(3)

#### **Equating the equations**

##### <u> 1 & 3

$\frac{yz}{y + z} = \frac{xy}{x + y}$  

$\frac{z}{y + z} = \frac{x}{x + y}$  

$xz + yz = xy + xz$  

$yz = xy$  

$\textbf{\textcolor{red}{z = x}}$  

#### <u> 2 & 3 

$\frac{xz}{x + z} = \frac{xy}{x + y}$  

$\frac{z}{x + z} = \frac{y}{x + y}$  

$xz + yz = xy + yz$  

$xz = xy$  

$\textbf{\textcolor{red}{z = y}}$

#### **Equating all**

$\textbf{\textcolor{red}{x = y = z}}$

### **Substituting in g(x)**

$xy + yz + zx = 12$  

$x^2 + x^2 + x^2 = 12$  

$3x^2 = 12$  

$x^2 = 4$  

$x = +2$

### **Maximum Volume**

$$Volume = xyz = 2 \cdot 2 \cdot 2 = \textcolor{blue}8$$

## <u>**Example 2**

![image.png](attachment:image.png)

### **Given**

##### **Equation of Ellipse**- $\boxed{\frac{x^2}{2} + y^2 = 1}$

##### **All we need to know Area of rectangle inscribed in a ellipse = $\boxed{4xy}$**

![image.png](attachment:image.png)

### **Apply Lagrangian**

$∇f(x) = \lambda ∇g(x)$  

$\begin{bmatrix}x\\ 2y \end{bmatrix} = \lambda \cdot \begin{bmatrix}4y\\ 4x \end{bmatrix}$  

$x = \lambda \cdot 4y$  

$x = \lambda \cdot 4y$  

$\frac{x}{4y} = \lambda$  

$2y = \lambda \cdot 4x$  

$\frac{y}{2x} = \lambda$  


$\therefore \frac{y}{2x} = \frac{x}{4y}$  

$\therefore \boxed{x = \sqrt2 y}$

### **Substituting $\boxed{x = \sqrt2 y }\text{ in }\boxed{\frac{x^2}{2} + y^2 = 1}$**

$\frac{2y^2}{2} + y^2 = 1$  

$y^2 + y^2 = 1$  

$2y^2 = 1$  

$\therefore y = \frac{1}{\sqrt{2}}$  

### **Substitute to find the Area**

$$Area = 4xy = 4 \cdotp 1 \cdotp \frac{1}{\sqrt{2}}$$
$$Area = 5.66 \cdot \frac{1}{\sqrt2} = 2.83$$

#### $\textbf{\textcolor{red}{Just remember }} \boxed{\text{Area of Rectangle in Ellipse} = 4xy}$

### **Example 3**

![image.png](attachment:image.png)

### Solution

We know the range is $-4 < b < 4$  

$\frac{\delta}{\delta x}(x^2 + bxy + 4y^2 - x) = 2x + by - 1$

$2x + by - 1 = 0$  

$b = \frac{1-2x}{y}$  

$-4 < \frac{1 - 2x}{y}$  

$4y > 2x - 1$  

$\frac{\delta}{\delta y}(x^2 + bxy + 4y^2) = bx + 8y$

$bx + 8y = 0$  

$b = \frac{-8y}{x}$  

#### **Equating both the b's**

$\frac{-8y}{x} = \frac{1-2x}{y}$  

$\frac{8y}{x} = \frac{2x-1}{y}$  

$2x^2 - x= 8y^2$  

Substituting in $b = \frac{-8y}{x}$

$b^2 = \frac{64y^2}{x^2}$

$b^2x^2 = 64y^2$  

$b^2x^2 = 8(8y^2)$  

$b^2x^2 = 8(2x^2 - x)$  

$b^2x^2 = 16x^2 - 8x$  

$(16-b^2)x^2 - 8x = 0$  

$x = \frac{-(-8) \pm \sqrt{64}}{2(16-b^2)}$

$x = \frac{8 \pm 8}{2(16 - b^2)}$  

$x = \frac{2}{2} \cdot \frac{4 \pm 4}{16-b^2}$

$x = \frac{8}{16-b^2}$  

or

$x = 0$

#### $$\therefore x = \frac{8}{16-b^2}$$

But $x \ne 0$  

$\because b = \frac{-8y}{x}$

But $y = \frac{-xb}{8}$

$\therefore y = \frac{\frac{-8}{16-b^2} \cdot b}{8}$  

#### $$\therefore y = \frac{-b}{16-b^2}$$

### **Substituting in Equation**

#### $$x^2 + bxy + 4y^2 - x$$

#### $\frac{1}{16-b^2} \cdot (64 + b(-8b) + 4b^2) - \frac{8}{16-b}$

#### $\frac{(64 - 4b^2)}{(16-b)^2} - \frac{8}{16-b}$  

#### $\frac{64 - 4b^2 - 8(16+b)}{16-b^2}$  



#### $\frac{64 - 4b^2 - 128 - 8b}{16-b^2}$  

#### $\frac{-4b^2 - 8b - 64}{16-b^2}$

$-4 \cdot \frac{(b^2 + 2b + 16)}{16 - b^2}$  

$-4 \cdot \frac{(b+8)(b-6)}{(16-b)^2}$

____________

_________

# **Any Approximation - Using Taylor Series**

$\frac{y-y_1}{x-x_1} = m$

$y = y_1 + m(x-x_1)$

![image.png](attachment:image.png)

### **Example**

![image.png](attachment:image.png)

#### **Answer: 151**

$f(x) = f(a) + f'(a)(x - a) + f''(a)\frac{(x-a)^2}{2!}$

$f(10) = f(20) + f'(20)(10 - 20) + f''(20)\frac{(10-20)^2}{2}$

$f(10) = 1 + 10 \cdot -10 + 5 \cdot \frac{100}{2}$

$f(10) = 1 - 100 + 250$

$f(10) = 151$

# **Covariance Matrix - Variance:Eigenvalue**

<b>Properties</b>
1. Symmetric - $\boxed{C = C^T}$
2. Semi-definite or Definite
3. Eigenvalues $\ge 0$
4. Energy = $x^TCx \ge 0$

<font color=blue>The below formula is for "NON_Centered Datasets"

### $$C = \frac{1}{n} \sum (x_i-\textcolor{red}{x_{mean}}) \cdot (x_i-\textcolor{red}{x_{mean}})^T$$  
### $$\textcolor{red}{x_{mean} = \frac{1}{n}\sum_{i = 0}^n x_i}$$

### **Working Rule**

 1. Calculate Mean of Datapoints
2. $(x_i - x_{mean}) \cdot (x_i - x_{mean})^T$  
3. $\frac{1}{n} \sum (x_i - x_{mean}) \cdot (x_i - x_{mean})^T$

$N^{th} \text{ Principal Component Projected Variance = Nth Eigenvector}$

## **Total Variance** = Sum of **Eigenvalues of Covariance Matrix**

### **Example 1**  

>#### **Datapoints**
![image-2.png](attachment:image-2.png)  

>#### **1st Eigenvalue = Projected Variance along 1st Principal Component**
![image-3.png](attachment:image-3.png)  

### **Example 2**

![image.png](attachment:image.png)

#### **Answer : We need to find the largest Eigenvalue of the Covariance Matrix**

### **Compute Mean**

$x_{mean} = \frac{\begin{bmatrix}1\\ 2\end{bmatrix} + \begin{bmatrix}0\\ 0\end{bmatrix} + \begin{bmatrix}2\\ 1\end{bmatrix}}{3} = \begin{bmatrix}1\\ 1 \end{bmatrix}$

### **Compute $x \cdot x^T$**

$(x_3 - x_{mean}) \cdot (x_3 - x_{mean})^T = \begin{bmatrix}1\\ 0\end{bmatrix} \cdot \begin{bmatrix}1 & 0\end{bmatrix} = \begin{bmatrix}1 & 0\\0 & 0\end{bmatrix}$

$(x_1 - x_{mean}) \cdot (x_1 - x_{mean})^T = \begin{bmatrix}-1\\ -1\end{bmatrix} \cdot \begin{bmatrix}-1 & -1\end{bmatrix} = \begin{bmatrix}1 & 1\\1 & 1\end{bmatrix}$

$(x_2 - x_{mean}) \cdot (x_2 - x_{mean})^T = \begin{bmatrix}0\\ 1\end{bmatrix} \cdot \begin{bmatrix}0 & 1\end{bmatrix} = \begin{bmatrix}0 & 0\\0 & 1\end{bmatrix}$

### **Compute the Covariance Matrix**

$$C = \frac{1}{n} \sum (x_i-x_{mean}) \cdot (x_i-x_{mean})^T$$

$C = \frac{1}{3} \cdot \begin{bmatrix}1 & 1\\1 & 1\end{bmatrix} + \begin{bmatrix}0 & 0\\0 & 1\end{bmatrix} + \begin{bmatrix}1 & 0\\0 & 0\end{bmatrix}$

$C = \begin{bmatrix}2/3 & 1/3\\1/3 & 2/3\end{bmatrix} = \begin{bmatrix}0.67 & 0.33\\0.33 & 0.67\end{bmatrix}$

$|C- \lambda I| = \begin{vmatrix}0.67 - \lambda & 0.33\\0.33 & 0.67 - \lambda\end{vmatrix} = 0$

$(0.67 - \lambda)^2 - 0.33^2 = 0$  

$(0.67 - \lambda)^2 = 0.33^2$  

#### **Taking square-root on both sides**

$(0.67 - \lambda) = \textcolor{red}\pm 0.33$  

$1.00 = 0.67 \textcolor{red}+ 0.33 = \lambda _1$  

$0.34 = 0.67 \textcolor{red}- 0.33 = \lambda _2$  

$$\lambda _1 > lambda _2$$

$$Answer = \textcolor{green}{\lambda _1} = \textcolor{green}{1.00}$$

## **Total Variance** = Sum of **Eigenvalues of Covariance Matrix**

![image.png](attachment:image.png)

$$\text{Total Variance }= \sum \text{Eigenvalues of Covariance Matrix}$$

$\text{Given }C = \begin{bmatrix}
1 & 1 & 1\\\
1 & 1 & 1\\\
1 & 1 & 1
\end{bmatrix}$  


$
C - λI = \begin{bmatrix}
1-λ & 1 & 1\\\
1 & 1-λ & 1\\\
1 & 1 & 1-λ
\end{bmatrix}
$

<u>**Determinant**</u>  
$
(1-λ)[(1-λ)^2 - 1] - 1[(1-λ) - 1] + 1[1 - (1-λ)] = 0
$  

$
(1-λ)(1 + λ^2 - 2λ - 1) + λ + λ = 0
$  

$
(1-λ)(λ^2 - 2λ) +2λ = 0
$  

$
λ^2 - λ^3 - 2λ + 2λ^2 + 2λ = 0
$  

$
-λ^3 + 3λ^2 = 0
$

$
λ^3 - 3λ^2 = 0  
$

$
λ^2(λ - 3) = 0  
$

$$
λ = 3, 0, 0
$$

$$\text{Total Variance }= \sum \text{Eigenvalues of Covariance Matrix}$$
$$\text{Total Variance }= 3 + 0 + 0 = 3$$

## <u>**Computation of Covariance Matrix**

$$C = \frac{1}{n} \sum (x_i-x_{mean}) \cdot (x_i-x_{mean})^T$$

### <u> **Working Rule**

1. Calculate Mean of Datapoints
2. $(x_i - x_{mean}) \cdot (x_i - x_{mean})^T$  
3. $\frac{1}{n} \sum (x_i - x_{mean}) \cdot (x_i - x_{mean})^T$

Given Datapoints:

![image.png](attachment:image.png)

### **Compute Mean**

$x_{mean} = \frac{\begin{bmatrix}1\\ 1\end{bmatrix} + \begin{bmatrix}2\\ 3\end{bmatrix} + \begin{bmatrix}3\\ 2\end{bmatrix}}{3} = \begin{bmatrix}2\\ 2 \end{bmatrix}$

### **Compute $x \cdot x^T$**

$(x_1 - x_{mean}) \cdot (x_1 - x_{mean})^T = \begin{bmatrix}-1\\ -1\end{bmatrix} \cdot \begin{bmatrix}-1 & -1\end{bmatrix} = \begin{bmatrix}1 & 1\\1 & 1\end{bmatrix}$

$(x_2 - x_{mean}) \cdot (x_2 - x_{mean})^T = \begin{bmatrix}0\\ 1\end{bmatrix} \cdot \begin{bmatrix}0 & 1\end{bmatrix} = \begin{bmatrix}0 & 0\\0 & 1\end{bmatrix}$

$(x_3 - x_{mean}) \cdot (x_3 - x_{mean})^T = \begin{bmatrix}1\\ 0\end{bmatrix} \cdot \begin{bmatrix}1 & 0\end{bmatrix} = \begin{bmatrix}1 & 0\\0 & 0\end{bmatrix}$

### **Compute the Covariance Matrix**

$$C = \frac{1}{n} \sum (x_i-x_{mean}) \cdot (x_i-x_{mean})^T$$

$C = \frac{1}{3} \cdot \begin{bmatrix}1 & 1\\1 & 1\end{bmatrix} + \begin{bmatrix}0 & 0\\0 & 1\end{bmatrix} + \begin{bmatrix}1 & 0\\0 & 0\end{bmatrix}$

$C = \begin{bmatrix}2/3 & 1/3\\1/3 & 2/3\end{bmatrix}$

# **PCA**

### <b>Important tips to solve problems:

#### <b>Note: Usually, we can analytically determine the 1st principal component by <font color="red">DRAWING</font> the plot of the given points 

### <b>Properties of PCA

![image.png](attachment:5b9a5764-74a6-47dd-9792-40e80c58ca24.png)

![image.png](attachment:image.png)

#### <b>PCA always consider the low variance components in the data as noise 
#### <b>And recommend us to throw away those components. 
#### <b>But, sometimes those components play a major role in a supervised learning task.  

<font color="blue"> We choose only a few components and discard the the rest. We choose $\sigma _1$ and $\sigma _2$. We leave the rest thinking that they are of less importance and that there is less variance about that component. But they can contain "some" important information:  
![image.png](attachment:ba8d4376-4417-4721-9c39-427d7c2d5a1c.png)


# <b>SVD

## Geometry of SVD

![image.png](attachment:b605edbe-8018-4bc8-82b4-084cb3313625.png)

# **Symmetric Matrices**

![image.png](attachment:image.png)

![image.png](attachment:e4ddb1c1-8821-405f-a594-f0ae90693fc0.png)  

$Q$ is a Orthogonal Matrix  
$\lambda$ is a Diagonal Matrix  
$S$ is a Symmetric Matrix

![image.png](attachment:52e14e31-cbd9-41be-9a30-13eb61a15250.png)

# **Complex Matrices** 

## **Important Properties**

1. **Absolute Value and Modulus**:   
   - $\textbf{v = a + ib}$
   - $|v| = \sqrt{a^2 + b^2}$
   - This is used while finding  the norm of a eigenvector of a unitary matrix. Unitary matrices are by definition normalized.
  <br>
2. **Complex Conjugate**:
   - $v = a + ib$ 
   - $\text{Complex Conjugate }\overline{v} = a - ib$  
  <br>
  
3. **NORM or Length of a Vector**:
>
   $\|v\| = \sqrt{\textcolor{red}|v_1\textcolor{red}|^2 + ... + \textcolor{red}|v_n\textcolor{red}|^2}$
>
   $\text{\textcolor{red}{Modulus is present!!!}}$  
>
**Example 1:**  

$v = \begin{bmatrix}1\\\ i\\\ 1\end{bmatrix}$  
>
   $|v| = \sqrt{1^2 + \textcolor{red}{|i|^2} + 1} = \sqrt{3}$
>  
<br>

**Example 2:**  

$v = \begin{bmatrix}1\\ 1+i \\ 1\end{bmatrix}$:  

$||v|| = \sqrt{1^2 + |1+i|^2 + 1^2} = \sqrt{1 + (1^2 + 1^2) + 1} = \sqrt{1 + 2 + 1} = \sqrt{4}$  

$|a+ib| = \sqrt{a^2 + b^2}$

<br>

4. **Conjugate Transpose**:
   - Conjugate transpose: $A^* = A^H = \overline{A^T}$  
<br>

5. **Dot Product and Inner Product**:
   - For Complex Matrices: $u^Hv = \overline{u_1}v_1 + ... + \overline{u_n}v_n$  
<br>

6. **Orthogonality/Perpendicular** :
   - $u^Hv = \overline{u^T}v= 0$  
<br>

7.  **Symmetric = Hermitian Matrices**:
       - Hermitian matrices: $S = \overline{S^T} = S^H = S^*$  
<br>

8.  **Diagonalization** or Eigenvalue Decomposition:
    - $A = Q\Lambda Q^{-1} = Q\Lambda Q^T$ (real $\Lambda$)
    - $S = U\Lambda U^{-1} = U\Lambda U^{\textcolor{red}{H}}$ (real $\Lambda$)  
<br>

9.  **Unitary Matrices** or **Orthogonal Matrices**:
    - If Unitary, **Inverse = Hermitian**
    - $U^H = U^{-1}$  
    - $U^H U = I$
<br>


#### Other properties:

## **Hermitian Matrix**

 $ A^H = A$

### **Computing $A^H$**

1. **Transpose the matrix**: Swap the row and column indices of each element, effectively reflecting the elements across the main diagonal.

2. **Take the complex conjugate of each element**: Replace each element `a + bi` with its complex conjugate `a - bi`.

**Note:**  
1. $A^H$ can also be written as $A^*$  
2. $A^H = \bar{A}^T = A^* = \text{ Hermitian }$  
3. $\text{Hermitian } \not= \text{ Hermitian 'matrix'}$


### **Hermitian Matrix Properties:**

1. Real Eigenvalues
2. Real Diagonal Elements
3. Orthogonal/Perpendicular Eigenvectors
4. $\overline a_{ij} = a_{ij}$
5. Determinant always <font color = "blue"> $real$
   


## **Unitary Matrices**

>**Properties**
>
>>$A^HA = I$
>>
>>$A^{-1} = A^H$
>>

![image.png](attachment:image.png)

# **Defitiveness Table**

![image.png](attachment:e147381f-3946-48f6-96c3-0bac92a96c14.png)

If nxn (as given),  
Then RANK = n

![image.png](attachment:1da6b94d-6a9d-4b90-a1d0-1cb147bea59f.png)![image.png](attachment:d7078730-9325-4b6d-b087-b484b0749bbc.png)

$f(x,y) = ax^2 + 2bxy + cy^2$  

$f (x, y) = v^TAv$ ` = 1x1 matrix`  

$A = \begin{bmatrix} a & b\\ c & d\end{bmatrix}$  

$v = \begin{bmatrix} x\\ y\end{bmatrix}$


![image.png](attachment:image.png)

![image.png](attachment:image.png)

### **Hacks**

$$ac-b^2 \ge 0 \text{ , always}$$
1. If $ac-b^2$ = `0`, it has to be `Semi-Definite +ve/-ve`
2. Else, it has to be `Definite +ve/-ve`

3. `a>0` for Positive Semi/Full
4. `a<0` for Negative Semi/Full

### Properties of Positive Definite Matrices

![image.png](attachment:image.png)

## **Stationary Points**

$f(x,y) = ax^2 + 2bxy + cy^2$  

$f (x, y) = v^TAv$ ` = 1x1 matrix`  

$A = \begin{bmatrix} a & b\\ b & c\end{bmatrix}$  

$v = \begin{bmatrix} x\\ y\end{bmatrix}$


![image.png](attachment:image.png)

![image.png](attachment:image.png)

## **PCA**

Total Variance = $\sum \text{Eigenvalues of Covariance Matrix}$

## **Covariance Matrix**

>Remember that Covariance matrix is $\text{\textcolor{green}{square matrix}}\text{\textcolor{blue}{-symmetric matrix}}$  
>$\text{\textcolor{red}{number of features}}$ = number of rows = number of columns

Covariance Matrix:  

$$C = \frac{1}{\text{no. of datapoints}}(x_1.x_1^T + x_2.x_2^T + x_3x_3^T + \text{ ... }+x_nx_n^T)$$ 

Let there be three datapoints:  

$x_1 = \begin{bmatrix}-1\\ -1\end{bmatrix} , x_2 = \begin{bmatrix}0\\ 0\end{bmatrix}, x_3 = \begin{bmatrix}1\\ 1\end{bmatrix}$

Covariance Matrix:  

$C = \frac{1}{\text{no. of datapoints}}(x_1.x_1^T + x_2.x_2^T + x_3x_3^T)$ 

$C = \frac{1}{3}(\begin{bmatrix}-1\\ -1\end{bmatrix}.\begin{bmatrix}-1& -1\end{bmatrix}+\begin{bmatrix}0\\ 0\end{bmatrix}.\begin{bmatrix}0& 0\end{bmatrix}+\begin{bmatrix}1\\ 1\end{bmatrix}.\begin{bmatrix}1& 1\end{bmatrix})$

$C = \frac{1}{3}(\begin{bmatrix}1 & 1\\ 1 & 1\end{bmatrix}+\begin{bmatrix}0 & 0\\ 0 & 0\end{bmatrix}+\begin{bmatrix}1 & 1\\ 1 & 1\end{bmatrix})$

$C = \frac{1}{3}\begin{bmatrix}2 & 2\\ 2 & 2\end{bmatrix}$

$C = \begin{bmatrix}2/3 & 2/3\\ 2/3 & 2/3\end{bmatrix}$

# **Meaning of Minimization**

## Meaning of minimization

![image.png](attachment:image.png)

>`"Minimize this function globally"`   

>> `= Finding "Global Minimum"`

>> `= Finding the 'x for which, f(x) is smallest`

# **Synthetic Division**

![image.png](attachment:image.png)

### **Example**

#### **Equation**

![image.png](attachment:image.png)

#### **One of the roots**

![image.png](attachment:image.png)

#### **Applying Synthetic Division**

![image.png](attachment:image.png)

![image.png](attachment:image.png)

# **Finding** $\text{Closest points}$ - $\text{Minimum Distances}$

#### $\text{Determine the points on } y=x^2+1 \text{ that's closest to } (0,2)$

### **Answer**

Let **x** be a point on the parabola closest to (0, 2).  

$\therefore Point = (x, x^2+1)$

### **We want to minimize distance**

$Distance = \sqrt{(x_1-x_2)^2 + (y_1-y_2)^2}$  

$D = \sqrt{(x-0)^2 + ((x^2 + 1) - 2)^2}$  

$D = \sqrt{x^2 + x^4 + 1 - 2x^2}$  

$D = \sqrt{x^4 - x^2 + 1}$

### Differentiating w.r.t x

$\frac{d}{dx}D  = \frac{1}{2}(x^4 - x^2 + 1)^{-1/2} \cdot (4x^3 -2x) = 0$  

$x = 0$  
$x = \frac{1}{2}$  
$x = -\frac{1}{2}$  


### Solving $x^4 - x^2 + 1 = 0$

Let $x^2 = m$:  

$m^2 - m + 1 = 0$  

$m = \frac{-(-1) \pm \sqrt{(-1)^2 - 4.(1).(1)}}{2(1)}$  

$m = \frac{1 \pm \sqrt{1 - 4}}{2}$  
$m = \text{\textcolor{red}{imaginary}}$  

$x = \sqrt{\text{\textcolor{red}{imaginary}}}$

### But it is **given** that **x is positive**

$\therefore \text{For } x = 0$  

$Distance = \sqrt{0^4 - 0^2 +1} = 1$   

$\therefore \text{For } x = +\frac{1}{\sqrt{2}} = 0.71$  

$Distance = \sqrt{0.71^4 - 0.71^2 +1} = 0.75$  

$\therefore \text{For } x = -\frac{1}{\sqrt{2}} = -0.71$  

$Distance = \sqrt{(-0.71)^4 - (-0.71)^2 +1} = 0.75$   

### Thus the points are:

$P = (x, x^2 + 1)$  

$P = (0.71, 0.504 + 1) = (0.71, 1.50)$  

$P = (-0.71, 0.504 + 1) = (0.71, 1.50)$