# **Chapter 2 - Linear Algebra in Deep Learning**

**Linear algebra is fundamental in deep learning**, as neural networks rely on vectors, matrices, and tensors for computations. Below, we rewrite key concepts using a **Python-style notation**, making it easier to translate them into code.

### **Linear Algebra in Deep Learning (Python Approach)**  

---

### **1. Scalars (Single Values)**  
A scalar is a **single number**. In Python, it is typically represented as an integer (`int`) or floating-point (`float`).

```python
a = 5  # Scalar example (integer)
b = 3.14  # Scalar example (floating-point)
```
A scalar can be considered a **1×1 matrix**, and it is its own transpose:  
\[
a = a^T
\]

---

### **2. Vectors (1D Arrays)**
A **vector** is an **ordered list of numbers**. It can be thought of as a **point in space**, where each element represents a coordinate.  
A vector with \( n \) elements belongs to $$\displaystyle A \in \mathbb{R}^{m \times n} $$. 

Mathematically, a column vector is represented as:
\[
x = 
\begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}
\]

**Python representation (using NumPy):**
```python
import numpy as np

x = np.array([x1, x2, x3])  # Row vector (default representation in NumPy)
x_col = x.reshape(-1, 1)  # Convert to column vector
```
Indexing elements:
```python
x[0]  # First element (equivalent to x_1)
x[S]  # Selecting a subset of elements using a set S
```

**Operations:**
- **Vector Addition**: `c = a + b` (element-wise sum)
- **Scalar Multiplication**: `c = k * a` (each element is multiplied by `k`)

---

### **3. Matrices (2D Arrays)**

A **matrix** is a **2D array of numbers** with \( m \) rows and \( n \) columns, denoted as $$\displaystyle A \in \mathbb{R}^{m \times n}$$.

Example matrix:
\[
A = 
\begin{bmatrix} 
A_{1,1} & A_{1,2} \\ 
A_{2,1} & A_{2,2} \\ 
A_{3,1} & A_{3,2} 
\end{bmatrix}
\]

**Python representation:**
```python
A = np.array([[A11, A12],
              [A21, A22],
              [A31, A32]])  # 3x2 matrix
```
**Indexing Elements:**
```python
A[0, 0]  # First element (A_1,1)
A[i, :]  # Row i
A[:, j]  # Column j
```

#### **Matrix Operations**
- **Addition/Subtraction**: `C = A + B` (element-wise)
- **Scalar Multiplication**: `D = a * B + c`
- **Matrix Transpose**: `A.T`  
\[
(A^T)_{i,j} = A_{j,i}
\]
```python
A_transpose = A.T
```
Example:
```python
A = np.array([[1, 2], [3, 4], [5, 6]])
A_T = A.T  # Transpose
```

\[
A^T =
\begin{bmatrix} 
A_{1,1} & A_{2,1} & A_{3,1} \\ 
A_{1,2} & A_{2,2} & A_{3,2} 
\end{bmatrix}
\]

---

### **4. Tensors (Multidimensional Arrays)**
A **tensor** is a **generalization** of a matrix to higher dimensions.

For example, a **3D tensor** (3×3×3) is represented as:
```python
T = np.random.rand(3, 3, 3)  # 3D tensor
T[i, j, k]  # Accessing the (i, j, k) element
```
Each dimension corresponds to an **axis** in the tensor.

---

### **5. Broadcasting (Matrix + Vector Operations)**
Deep learning frameworks like NumPy, TensorFlow, and PyTorch **automatically broadcast** vectors to match matrix dimensions.

\[
C = A + b
\]
where `b` is a **vector**, and it's **implicitly copied** across all rows of `A` before addition.

**Python example:**
```python
A = np.array([[1, 2], [3, 4], [5, 6]])  # 3x2 matrix
b = np.array([10, 20])  # 1x2 vector

C = A + b  # Broadcasting b across each row
```
This eliminates the need for explicitly reshaping `b` before adding it.

---

### **Key Takeaways**

| **Concept** | **Mathematical Representation** | **Python Equivalent** |
|------------|--------------------------------|----------------------|
| **Scalar** | $ a \in \mathbb{R} $ | `a = 5` |
| **Vector** | $ x \in \mathbb{R}^n $ | `x = np.array([x1, x2, x3])` |
| **Matrix** | $$ A \in \mathbb{R}^{m \times n} $$ | `A = np.array([[A11, A12], [A21, A22]])` |
| **Transpose** | $ A^T $ | `A.T` |
| **Tensor** | $ A_{i,j,k} $ | `T = np.random.rand(3,3,3)` |
| **Broadcasting** | $ C = A + b $ | `C = A + b` |

- **Scalar** → A single number, an isolated value. Example: `a = 5`  
- **Vector** → An ordered list of numbers, representing a point in space. Example: `x = np.array([x1, x2, x3])`  
- **Matrix** → A table of numbers with rows and columns, used for linear transformations. Example: `A = np.array([[A11, A12], [A21, A22]])`  
- **Transpose** → Swaps rows and columns of a matrix. Example: `A.T`  
- **Tensor** → A multi-dimensional array, a generalization of vectors and matrices. Example: `T = np.random.rand(3,3,3)`  
- **Broadcasting** → A rule that allows operations between arrays of different sizes by automatically adjusting dimensions. Example: `C = A + b`

### **Matrix and Vector Multiplication in Python**  

Matrix and vector multiplication is fundamental in deep learning, as it allows for efficient transformations and computations. Below, we summarize the key concepts and rewrite the formulas using a **Python-based approach**.

---

### **1. Matrix Multiplication**  

The **matrix product** of two matrices **A** and **B** results in a third matrix **C**, defined as:  
\[
C = AB
\]
For this multiplication to be valid:
- **A** must have the same number of columns as **B** has rows.
- If **A** is of shape **(m × n)** and **B** is of shape **(n × p)**, then **C** will have shape **(m × p)**.

Mathematically:
\[
$C_{i,j} = \sum_k A_{i,k} B_{k,j}$
\]



**Python Implementation:**
```python
import numpy as np

A = np.array([[1, 2], [3, 4]])  # 2x2 matrix
B = np.array([[5, 6], [7, 8]])  # 2x2 matrix

C = np.dot(A, B)  # Matrix multiplication
```
Alternatively, using the `@` operator (Python 3.5+):
```python
C = A @ B
```

---

### **2. Element-wise (Hadamard) Product**  
Unlike standard matrix multiplication, the **Hadamard product** is an **element-wise** multiplication:

\[
$C = A \odot B$
\]

**Python Implementation:**
```python
C = A * B  # Element-wise multiplication
```
---

### **3. Dot Product of Two Vectors**  
The **dot product** is a special case of matrix multiplication, where two vectors **x** and **y** (of the same dimension) result in a **scalar**:

\[
$x^T y = \sum_i x_i y_i$
\]

**Python Implementation:**
```python
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])

dot_product = np.dot(x, y)  # Equivalent to x.T @ y
```
Since the result is a **scalar**, the dot product is **commutative**:
\[
x^T y = y^T x
\]

---

### **4. Properties of Matrix Multiplication**
Matrix multiplication follows certain algebraic properties:

- **Distributive Property**:  
\[
A(B + C) = AB + AC
\]
```python
C = A @ (B + B)  # Equivalent to A @ B + A @ B
```

- **Associative Property**:  
\[
A(BC) = (AB)C
\]
```python
D = A @ (B @ C)  # Same as (A @ B) @ C
```

- **Transpose Property**:  
\[
(AB)^T = B^T A^T
\]
```python
C_T = (A @ B).T
C_T_check = B.T @ A.T  # Should be equal to C_T
```
---

### **5. Solving Linear Systems (Ax = b)**
A system of linear equations is represented as:

\[
Ax = b
\]

Where:
- **A** is an $m \times n$ matrix (coefficients),
- **b** is a **known** vector,
- **x** is an **unknown** vector we want to solve for.

Expanded:

\[
$A_{1,1}x_1 + A_{1,2}x_2 + \dots + A_{1,n}x_n = b_1$
\]
\[
$A_{2,1}x_1 + A_{2,2}x_2 + \dots + A_{2,n}x_n = b_2$
\]

**Python Solution using `numpy.linalg.solve`:**
```python
A = np.array([[2, 1], [1, 3]])  # 2x2 coefficient matrix
b = np.array([8, 13])  # Known values

x = np.linalg.solve(A, b)  # Solve for x
```

---

### **Key Takeaways**
| **Concept** | **Mathematical Representation** | **Python Equivalent** |
|------------|--------------------------------|----------------------|
| **Matrix Multiplication** | \( C = AB \) | `C = A @ B` |
| **Element-wise Multiplication (Hadamard Product)** | \( C = A \odot B \) | `C = A * B` |
| **Dot Product of Vectors** | \( x^T y \) | `np.dot(x, y)` |
| **Distributive Property** | \( A(B + C) = AB + AC \) | `A @ (B + C) == A @ B + A @ C` |
| **Associative Property** | \( A(BC) = (AB)C \) | `A @ (B @ C) == (A @ B) @ C` |
| **Transpose Property** | \( (AB)^T = B^T A^T \) | `(A @ B).T == B.T @ A.T` |
| **Solving Linear Equations** | \( Ax = b \) | `np.linalg.solve(A, b)` |

---

### **Conclusion**
Matrix and vector multiplication is at the core of deep learning, where:
- **Matrix operations** define transformations,
- **Dot products** are used in optimization,
- **Solving systems** helps in parameter estimation.

In [6]:
import numpy as np

# Define matrices and vectors for the exercise
A = np.array([[1, 2], [3, 4]])  # 2x2 matrix
B = np.array([[5, 6], [7, 8]])  # 2x2 matrix
x = np.array([1, 2])  # 1D vector
y = np.array([3, 4])  # 1D vector

# Matrix multiplication
C = A @ B

# Element-wise (Hadamard) product
hadamard_product = A * B

# Dot product of two vectors
dot_product = np.dot(x, y)

# Verify distributive property: A(B + B) == A @ B + A @ B
distributive_test = np.allclose(A @ (B + B), A @ B + A @ B)

# Verify associative property: A(BC) == (AB)C
D = np.array([[2, 3], [4, 5]])  # Extra matrix for associative test
associative_test = np.allclose(A @ (B @ D), (A @ B) @ D)

# Transpose property: (AB)^T == B^T A^T
transpose_test = np.allclose((A @ B).T, B.T @ A.T)

# Solving a linear system Ax = b
A_sys = np.array([[2, 1], [1, 3]])  # Coefficient matrix
b_sys = np.array([8, 13])  # Known values
x_solution = np.linalg.solve(A_sys, b_sys)

In [8]:
# Display results
import pandas as pd
results = {
    "Matrix Multiplication (C=AB)": C,
    "Hadamard Product (Element-wise)": hadamard_product,
    "Dot Product (x^Ty)": dot_product,
    "Distributive Property Verified": distributive_test,
    "Associative Property Verified": associative_test,
    "Transpose Property Verified": transpose_test,
    "Solution of Ax=b": x_solution
}

results

{'Matrix Multiplication (C=AB)': array([[19, 22],
        [43, 50]]),
 'Hadamard Product (Element-wise)': array([[ 5, 12],
        [21, 32]]),
 'Dot Product (x^Ty)': 11,
 'Distributive Property Verified': True,
 'Associative Property Verified': True,
 'Transpose Property Verified': True,
 'Solution of Ax=b': array([2.2, 3.6])}

In [13]:
%pip install ace_tools_open

Collecting ace_tools_open
  Downloading ace_tools_open-0.1.0-py3-none-any.whl.metadata (1.1 kB)
Collecting itables (from ace_tools_open)
  Downloading itables-2.2.5-py3-none-any.whl.metadata (8.4 kB)
Downloading ace_tools_open-0.1.0-py3-none-any.whl (3.0 kB)
Downloading itables-2.2.5-py3-none-any.whl (1.4 MB)
   ---------------------------------------- 0.0/1.4 MB ? eta -:--:--
   ---------------------------------------- 1.4/1.4 MB 10.5 MB/s eta 0:00:00
Installing collected packages: itables, ace_tools_open
Successfully installed ace_tools_open-0.1.0 itables-2.2.5
Note: you may need to restart the kernel to use updated packages.


In [None]:
# Display results in a structured table format
import ace_tools_open as tools

# Formatting results for display
formatted_results = {
    "Matrix Multiplication (C=AB)": str(C.tolist()),
    "Hadamard Product (Element-wise)": str(hadamard_product.tolist()),
    "Dot Product (x^Ty)": dot_product,
    "Distributive Property Verified": distributive_test,
    "Associative Property Verified": associative_test,
    "Transpose Property Verified": transpose_test,
    "Solution of Ax=b": str(x_solution.tolist())
}

# Convert results into a structured DataFrame with strings for display compatibility
df_results = pd.DataFrame(list(formatted_results.items()), columns=['Operation', 'Result'])

# Display the results
tools.display_dataframe_to_user(name="Matrix and Vector Operations Results", dataframe=df_results)

Matrix and Vector Operations Results


Operation,Result
Loading ITables v2.2.5 from the internet... (need help?),


### **How Matrix and Vector Multiplication is Used in Deep Learning**

1. **Neural Network Computation**  
   - Matrix multiplication is essential for forward propagation, where **inputs are transformed by weight matrices**.  
   - Each layer in a neural network performs **\( C = A @ B \)**, where **A** is the input matrix and **B** represents learned weights.

2. **Optimization & Learning**  
   - The **dot product** helps compute gradients in **backpropagation**, adjusting weights to minimize loss.  
   - Matrix transposition \( (AB)^T = B^T A^T \) is frequently used in computing derivatives.

3. **Feature Extraction & Transformations**  
   - **Hadamard products** (\( C = A \odot B \)) appear in element-wise operations like activation functions and feature-wise scaling.

4. **Solving Systems in AI Models**  
   - Many machine learning algorithms solve systems of equations **Ax = b**, using methods like `np.linalg.solve(A, b)`, particularly in **linear regression** and **model parameter estimation**.

5. **Efficiency & Parallelization**  
   - **Batch processing** in deep learning relies on efficient matrix operations, leveraging **GPUs** for parallelized multiplication.

Matrix operations **enable deep learning models to process, transform, and optimize data**, forming the mathematical foundation of modern AI.

In [20]:
# Neural Network Computation
import torch 
from torch import nn
import ace_tools_open as tools

# Simple Neural Network class
class SimpleNeuralNetwork(nn.Module):
    def __init__(self, input_size, output_size):
        super(SimpleNeuralNetwork, self).__init__()
        self.linear = nn.Linear(input_size, output_size)

    def forward(self, x):
        # Matrix Multiplication
        out = self.linear(x)
        return out
    
# Example data
input_size = 3
output_size = 2
input_data = torch.tensor

input_data = torch.tensor([[1.0, 2.0, 3.0]])  # Input Matrix (A)

model = SimpleNeuralNetwork(input_size, output_size)

# Perform forward propagation
output = model.forward(input_data)

# print("Input:", input_data)
# print("Output:", output)

# # Access learned weights (B)
# weights = model.linear.weight.data
# print("\nAccess learned weights (B):")
# print(weights)

# # Access learned bias
# bias = model.linear.bias.data
# print("\nAccess learned bias:")
# print(bias)

# # Manual calculation of matrix multiplication
# manual_output = torch.mm(input_data, weights.T) + bias
# print("\nManual calculation:")
# print(manual_output)



# Formatting results for display
formatted_results = {
    "Input:": input_data,
    "Output:": output,
    "Access learned weights (B):": str(model.linear.weight.data),
    "Access learned bias:": str(model.linear.bias.data),
    "Manual calculation:": str(torch.mm(input_data, model.linear.weight.data.T) + model.linear.bias.data),
}

# Convert results into a structured DataFrame with strings for display compatibility
df_results = pd.DataFrame(list(formatted_results.items()), columns=['Operation', 'Result'])

# Display the results
tools.display_dataframe_to_user(name="Matrix and Vector Operations Results", dataframe=df_results)

Matrix and Vector Operations Results


Operation,Result
Loading ITables v2.2.5 from the internet... (need help?),


### **Matrix Inverses, Linear Dependence, and Span in Python**

In deep learning and linear algebra, the concept of **matrix inversion** and **linear dependence** is essential for solving equations, optimizing models, and understanding transformations. Below, we summarize these key concepts and implement them using Python.

---

## **1. Matrix Inverse & Solving Equations**
To solve the equation:  
\[
$Ax = b$
\]
we can express the solution using the **inverse of A**:
\[
$x = A^{-1} b$
\]
This method works **only if** the inverse of **A** exists.

**Python Implementation:**
```python
import numpy as np

A = np.array([[2, 3], [1, 4]])  # 2x2 invertible matrix
b = np.array([8, 11])  # Known vector

# Compute the inverse of A
A_inv = np.linalg.inv(A)

# Solve for x using matrix inversion
x = A_inv @ b
```
⚠ **Note:**  
In practice, **using the inverse is not recommended** due to numerical precision issues. Instead, use `np.linalg.solve(A, b)`, which is more stable.

```python
x = np.linalg.solve(A, b)  # Preferred way to solve Ax = b
```
---

## **2. Linear Dependence & Span**
A **set of vectors is linearly dependent** if one of them can be written as a **linear combination** of the others.

\[
$Ax = \sum_i x_i A_{:,i}$
\]

The **span** of a set of vectors is the set of all points that can be obtained by a **linear combination** of those vectors:

\[
$\sum_i c_i v^{(i)}$
\]

**Python Implementation (Checking Linear Dependence):**
```python
A = np.array([[1, 2], [2, 4]])  # Two identical columns → linearly dependent
rank_A = np.linalg.matrix_rank(A)  # Rank of the matrix

# If rank < number of columns, the matrix is linearly dependent
is_dependent = rank_A < A.shape[1]
```
💡 **Key Idea**:  
If the number of **linearly independent** columns **is less than the total columns**, the matrix **cannot span the entire space**.

---

## **3. Conditions for a Matrix to Have an Inverse**
For **A** to be invertible:
- **It must be square**: \( m = n \)
- **Its columns must be linearly independent** (full rank)

If **A** is **singular** (non-invertible), solving \( Ax = b \) requires an alternative approach like **pseudo-inverse**.

```python
# Compute the pseudo-inverse (Moore-Penrose)
A_pinv = np.linalg.pinv(A)
x_pseudo = A_pinv @ b
```

---

## **4. Right and Left Inverses**
A matrix **inverse** satisfies:

\[
$A A^{-1} = I$
\]

For **square matrices**, the **left** and **right** inverse are the same.

**Python Implementation:**
```python
I = A_inv @ A  # Identity matrix
is_identity = np.allclose(I, np.eye(A.shape[0]))  # Check if A_inv * A = I
```

---

### **Key Takeaways**
| **Concept** | **Mathematical Representation** | **Python Equivalent** |
|------------|--------------------------------|----------------------|
| **Matrix Inverse** | \( A^{-1} \) | `np.linalg.inv(A)` |
| **Solving Ax = b** | \( x = A^{-1}b \) | `np.linalg.solve(A, b)` |
| **Linear Dependence** | \( \text{rank}(A) < n \) | `np.linalg.matrix_rank(A) < A.shape[1]` |
| **Pseudo-inverse** | \( A^+ \) | `np.linalg.pinv(A)` |
| **Right & Left Inverse** | \( A A^{-1} = I \) | `A_inv @ A == np.eye(n)` |

---

### **Conclusion**
Understanding **matrix inversion, dependence, and span** is crucial for **solving equations, optimizing neural networks, and understanding data transformations** in deep learning. While matrix inverses are important in theory, numerical methods like **least squares solutions** and **pseudo-inverses** are often preferred in real-world applications.

In [21]:
# Exercise Imagine you have a simple dataset with two inputs (x1, x2) and one output (y). We want to build a linear neural network to predict y data x1 and x2.

# Step 1: Data Generation
# Let's create a synthetic dataset with linear dependence between input and output.
# Linear dependence and span affect the stability and learning ability of models.
import numpy as np

X = np.array([[1, 2], [2, 3], [3, 4], [4, 5]]) # Input: 4 Samples 2 Features
y = np.array([3, 5, 7, 9]) # Output: 4 Samples 

In [26]:
# Step 2: Building the Linear Neural Network

# We use NumPy to build a "manual" linear neural network.

# Initialize the weights
weights = np.array([5, 5]) # Initial weights

# Prediction function
def predict(X, weights):
    return np.dot(X, weights) # Matrix multiplication (input * weights)

# Perform prediction
predictions = predict(X, weights)
print("Initial predictions:", predictions)

Initial predictions: [15 25 35 45]


In [27]:
# Step 3: Calculate Error and Update Weights

# We calculate the error and update the weights using the gradient.

# Cost function (Mean Squared Error)
def mse(y_true, y_pred):
    return np.mean((y_true - y_pred) ** 2)

# Calculate the initial error
error = mse(y, predictions)
print("Initial error:", error)

# Calculate the gradient (derivative of the error with respect to the weights)
gradient = 2 * np.dot(X.T, (predictions - y)) / len(y)

# Update the weights (gradient descent)
learning_rate = 0.01
weights = weights - learning_rate * gradient

# Make a new prediction and calculate the error
new_predictions = predict(X, weights)
new_error = mse(y, new_predictions)
print("New predictions:", new_predictions)
print("New error:", new_error)

Initial error: 656.0
New predictions: [ 9.84 16.56 23.28 30.  ]
New error: 221.6144


In [None]:
# Step 4: Linear Dependence and Rank Analysis

# Let's analyze the linear dependence of the input data.

# Calculate the rank of the input matrix
rank_X = np.linalg.matrix_rank(X)
print("Rank of input matrix:", rank_X)

# Check for linear dependence
is_linearly_dependent = rank_X < X.shape[1]
# If rank < number of columns, the matrix is linearly dependent
print("Linear dependence:", is_linearly_dependent)

Rank of input matrix: 2
Linear dependence: False


In [None]:
# Step 5: Inverse and Pseudo-Inverse (Optional)

# Let's try to solve the system Ax = y using the inverse and pseudo-inverse.

try: # Try to compute the inverse (may not exist)
    X_inv = np.linalg.inv(X)
    x_solution = np.dot(X_inv, y)
    print("Solution with inverse:", x_solution)
except np.linalg.LinAlgError:
    print("The input matrix is ​​not invertible.")

# Compute the pseudo-inverse
X_pinv = np.linalg.pinv(X)
x_pseudo_solution = np.dot(X_pinv, y)
print("Solution with pseudo-inverse:", x_pseudo_solution)