# Matrix Properties in Linear Algebra


## Frobenius Norm

The **Frobenius norm** quantifies the size of a matrix in a way that is analogous to the $L_2$ norm for vectors. For a matrix $X$ with elements $x_{ij}$, the Frobenius norm is defined as:

$$
\|X\|_F = \sqrt{\sum_{i,j} x_{ij}^2}
$$

*Interpretation:** This norm represents the Euclidean distance when the matrix is treated as a long vector containing all its entries.

**Example:** For the matrix 

$$X = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix},$$

the Frobenius norm is:

$$
\|X\|_F = \sqrt{1^2 + 2^2 + 3^2 + 4^2} = \sqrt{30} \approx 5.477.
$$

### Python Example (NumPy)
```python
import numpy as np

X = np.array([[1, 2],
              [3, 4]])
frobenius_norm = np.linalg.norm(X, 'fro')
print("Frobenius norm:", frobenius_norm)
```

---

## Matrix Multiplication

Matrix multiplication is a fundamental operation in linear algebra and is extensively used in machine learning.

Given:
- A matrix $A$ of size $M \times N$, and
- A matrix $B$ of size $N \times P$,

their product $C = AB$ is defined by:

$$
c_{ik} = \sum_{j=1}^{N} a_{ij} \, b_{jk},
$$

where the resulting matrix $C$ has dimensions $M \times P$.

### Key Points

- **Matrix by Vector:** A vector can be seen as a matrix with one column.
- **Non-Commutativity:** In general, $AB \neq BA$. Even when both products are defined (typically when matrices are square), the results can differ.
- **Element-wise Process:** Each element in the resulting matrix is computed as a sum of products of corresponding elements.

### Python Example (Matrix-by-Vector Multiplication with NumPy)
```python
import numpy as np

A = np.array([[3, 4],
              [5, 6],
              [7, 8]])
B = np.array([[1],
              [2]])
C = np.dot(A, B)  # Alternatively, use A @ B
print("Result of matrix by vector multiplication:\n", C)
```

### Python Example (Matrix-by-Matrix Multiplication)
```python
import numpy as np

A = np.array([[3, 4],
              [5, 6],
              [7, 8]])
B = np.array([[1, 9],
              [2, 0]])
C = np.dot(A, B)
print("Result of matrix by matrix multiplication:\n", C)
```

*Explanation:*  
For the first column of $C$, for example, the calculation is:
- Row 1: $3 \times 1 + 4 \times 2 = 3 + 8 = 11$
- Row 2: $5 \times 1 + 6 \times 2 = 5 + 12 = 17$
- Row 3: $7 \times 1 + 8 \times 2 = 7 + 16 = 23$

---

## Special Matrices

### Symmetric Matrices

A matrix $S$ is **symmetric** if it equals its transpose:

$$
S = S^T.
$$

- **Requirement:** $S$ must be a square matrix.
- **Property:** The elements are mirrored across the main diagonal.

#### Python Example (NumPy)
```python
import numpy as np

S = np.array([[1, 2, 3],
              [2, 4, 5],
              [3, 5, 6]])
print("Matrix S:\n", S)
print("Transpose of S:\n", S.T)
```

### Identity Matrices

An **identity matrix** $I_n$ is a diagonal matrix with ones on the main diagonal and zeros elsewhere:

$$
I_n = \begin{bmatrix}
1 & 0 & \cdots & 0 \\
0 & 1 & \cdots & 0 \\
\vdots & \vdots & \ddots & \vdots \\
0 & 0 & \cdots & 1
\end{bmatrix}.
$$

- **Property:** Multiplying any vector $v$ by $I_n$ returns $v$, i.e., $I_n v = v$.

#### Python Example (PyTorch)
```python
import torch

I = torch.eye(3)  # 3x3 identity matrix
v = torch.tensor([25, 2, 5], dtype=torch.float32)
result = torch.matmul(I, v)
print("Multiplication with identity matrix:\n", result)
```

### Diagonal Matrices

A **diagonal matrix** has nonzero elements only along its main diagonal:

$$
D = \operatorname{diag}(d_1, d_2, \dots, d_n).
$$

- **Efficient Operations:** Both multiplication and inversion (if no diagonal element is zero) are computationally efficient.

**Inversion:** The inverse is given by:

$$D^{-1} = \operatorname{diag}\left(\frac{1}{d_1}, \frac{1}{d_2}, \dots, \frac{1}{d_n}\right),$$

provided that $d_i \neq 0$ for all $i$.

---

## Matrix Inversion

Matrix inversion is a powerful tool for solving systems of linear equations. For a square matrix $X$, its inverse $X^{-1}$ satisfies:

$$
X^{-1} X = I.
$$

### Solving Linear Systems

For a system given by:

$$
y = Xw,
$$

if $X^{-1}$ exists, the weight vector $w$ can be determined by:

$$
w = X^{-1} y.
$$

#### Python Example (NumPy)
```python
import numpy as np

# Define a 2x2 system for simplicity.
X = np.array([[2, -5],
              [-3, 4]])
y = np.array([400000, -700000])  # e.g., house prices in hundreds of thousands

X_inv = np.linalg.inv(X)
w = np.dot(X_inv, y)
print("Solved weights w:", w)

# Verify the solution: y should equal X @ w
y_pred = np.dot(X, w)
print("Predicted y:", y_pred)
```

### Limitations

- **Singularity:** A matrix cannot be inverted if it is singular (i.e., its columns are linearly dependent).
- **Square Requirement:** Only square matrices (equal number of rows and columns) can have an inverse.
- **Overdetermined/Underdetermined Systems:** In these cases, standard inversion is not applicable; methods such as the Moore-Penrose pseudoinverse are used.

---

## Orthogonal Matrices

An **orthogonal matrix** $A$ is composed entirely of orthonormal columns (and rows):

$$
A^T A = I.
$$

**Key Property:** The transpose of an orthogonal matrix is its inverse:

$$
A^T = A^{-1}.
$$

- **Efficiency:** Calculating the inverse of an orthogonal matrix is computationally inexpensive—simply compute the transpose.

### Verifying Orthogonality

For an orthogonal matrix, the dot product between any two distinct columns is zero and each column has a unit norm:
- For columns $\mathbf{a}_i$ and $\mathbf{a}_j$ (with $i \neq j$):

$$
\langle \mathbf{a}_i, \mathbf{a}_j \rangle = 0.
$$

- For each column:

$$
\|\mathbf{a}_i\|_2 = 1.
$$

#### Python Example (PyTorch)
```python
import torch

# Define a matrix K (using float values)
K = torch.tensor([[ 2/3,  -2/3,  1/3],
                  [ 1/3,   2/3,  2/3],
                  [ 2/3,   1/3, -2/3]], dtype=torch.float32)

# Compute K^T * K; for an orthogonal matrix, this should be the identity.
identity_approx = torch.matmul(K.t(), K)
print("K^T * K (should be close to identity):\n", identity_approx)
```

---

## Applications in Machine Learning

### Regression and Weight Estimation

Matrix operations are foundational in linear regression. Given:

$$
y = Xw,
$$

where:
- $y$ is a vector of outcomes (e.g., house prices),
- $X$ is the feature matrix (including a column of ones for the intercept), and
- $w$ is the weight vector (parameters to be estimated),

if $X^{-1}$ exists, we solve for $w$ using:

$$
w = X^{-1} y.
$$

For overdetermined systems (more equations than unknowns), the Moore-Penrose pseudoinverse is often used.

### Deep Learning

Matrix multiplication is ubiquitous in deep learning:
- **Forward Pass:** Input vectors are multiplied by weight matrices.
- **Backpropagation:** Derivatives (gradients) are computed using matrix operations.
Even in high-level libraries (e.g., PyTorch, TensorFlow), matrix multiplication happens behind the scenes.

#### Python Example for Verifying Orthogonality (Identity Matrix)
```python
import numpy as np

I3 = np.eye(3)
col1 = I3[:, 0]
col2 = I3[:, 1]
col3 = I3[:, 2]

print("Dot product of col1 and col2:", np.dot(col1, col2))
print("Norm of col1:", np.linalg.norm(col1))
```