# **Chapter 2 - Linear Algebra in Deep Learning**

**Linear algebra is fundamental in deep learning**, as neural networks rely on vectors, matrices, and tensors for computations. Below, we rewrite key concepts using a **Python-style notation**, making it easier to translate them into code.

### **Fundamentals of Linear Algebra for Deep Learning (Python Approach)**

Linear algebra is essential for deep learning since neural networks process information using **scalars, vectors, matrices, and tensors**. This chapter covers the key elements necessary to understand neural networks.

---

## **1. Fundamental Elements of Linear Algebra**
Linear algebra involves different types of mathematical objects:

### **Scalars (Single Numbers)**
A **scalar** is a **single numerical value**, written in italics (e.g., $s \in \mathbb{R}$).
- Example: $s = 5$, $t = 3.14$
- **Python Implementation:**
```python
s = 5  # Integer
t = 3.14  # Floating-point number
```

---

### **Vectors (1D Arrays)**
A **vector** is an **ordered sequence** of numbers, represented as a column:

$$
x =
\begin{bmatrix}
x_1 \\
x_2 \\
\vdots \\
x_n
\end{bmatrix}
$$

**Vectors represent points in space and are fundamental in deep learning models.**
- **Python Implementation:**
```python
import numpy as np
x = np.array([1, 2, 3])  # 1D vector
```

🔹 **Vector Operations:**
- **Addition**: `x + y`
- **Scalar Multiplication**: `a * x`
- **Dot Product**: `np.dot(x, y)`

---

### **Matrices (2D Arrays)**
A **matrix** is a **table of numbers**, defined by **rows** and **columns**:

$$
A =
\begin{bmatrix}
A_{1,1} & A_{1,2} \\
A_{2,1} & A_{2,2}
\end{bmatrix}
$$

**Matrices are used to represent weights and linear transformations in neural networks.**

- **Python Implementation:**
```python
A = np.array([[1, 2], [3, 4]])  # 2x2 matrix
```

🔹 **Matrix Operations:**
- **Transpose**: `A.T`
- **Matrix Multiplication**: `np.dot(A, B)` or `A @ B`
- **Element Access:**
  ```python
  A[0, :]  # First row
  A[:, 1]  # Second column
  ```

---

### **Tensors (Multidimensional Arrays)**
A **tensor** is a generalization of matrices to higher dimensions.

Example of a **3D tensor**:
$$
T = \begin{bmatrix}
\begin{bmatrix} T_{1,1,1} & T_{1,1,2} \\ T_{1,2,1} & T_{1,2,2} \end{bmatrix} \\
\begin{bmatrix} T_{2,1,1} & T_{2,1,2} \\ T_{2,2,1} & T_{2,2,2} \end{bmatrix}
\end{bmatrix}
$$

- **Python Implementation:**
```python
T = np.random.rand(3, 3, 3)  # Random 3D tensor
```

**Tensors are widely used to represent images and multidimensional data in deep learning.**

---

## **2. Fundamental Operations in Neural Networks**

| **Operation** | **Mathematical Representation** | **Python Equivalent** |
|------------------------|---------------------------------|-----------------------|
| **Vector Addition** | $(x + y)$                       | `x + y`               |
| **Dot Product** | $(x^T y)$                       | `np.dot(x, y)`        |
| **Matrix-Vector Product** | $(A x)$                         | `A @ x`               |
| **Matrix-Matrix Product** | $(A B)$                         | `A @ B`               |
| **Transpose** | $(A^T)$                         | `A.T`                 |
<!-- | **Vector Norm (L2)** | $(||x||_2)$                     | `np.linalg.norm(x)`  | -->

---

### **Conclusion**
Linear algebra is **the mathematical foundation of deep learning**, as neural networks use:
- **Scalars** for parameters and biases.
- **Vectors** for inputs and neuron weights.
- **Matrices** for linear transformations.
- **Tensors** for multidimensional data representation (e.g., images, sequences).
```

In [6]:
import numpy as np

# Define matrices and vectors for the exercise
A = np.array([[1, 2], [3, 4]])  # 2x2 matrix
B = np.array([[5, 6], [7, 8]])  # 2x2 matrix
x = np.array([1, 2])  # 1D vector
y = np.array([3, 4])  # 1D vector

# Matrix multiplication
C = A @ B

# Element-wise (Hadamard) product
hadamard_product = A * B

# Dot product of two vectors
dot_product = np.dot(x, y)

# Verify distributive property: A(B + B) == A @ B + A @ B
distributive_test = np.allclose(A @ (B + B), A @ B + A @ B)

# Verify associative property: A(BC) == (AB)C
D = np.array([[2, 3], [4, 5]])  # Extra matrix for associative test
associative_test = np.allclose(A @ (B @ D), (A @ B) @ D)

# Transpose property: (AB)^T == B^T A^T
transpose_test = np.allclose((A @ B).T, B.T @ A.T)

# Solving a linear system Ax = b
A_sys = np.array([[2, 1], [1, 3]])  # Coefficient matrix
b_sys = np.array([8, 13])  # Known values
x_solution = np.linalg.solve(A_sys, b_sys)

In [8]:
# Display results
import pandas as pd
results = {
    "Matrix Multiplication (C=AB)": C,
    "Hadamard Product (Element-wise)": hadamard_product,
    "Dot Product (x^Ty)": dot_product,
    "Distributive Property Verified": distributive_test,
    "Associative Property Verified": associative_test,
    "Transpose Property Verified": transpose_test,
    "Solution of Ax=b": x_solution
}

results

{'Matrix Multiplication (C=AB)': array([[19, 22],
        [43, 50]]),
 'Hadamard Product (Element-wise)': array([[ 5, 12],
        [21, 32]]),
 'Dot Product (x^Ty)': 11,
 'Distributive Property Verified': True,
 'Associative Property Verified': True,
 'Transpose Property Verified': True,
 'Solution of Ax=b': array([2.2, 3.6])}

In [13]:
%pip install ace_tools_open

Collecting ace_tools_open
  Downloading ace_tools_open-0.1.0-py3-none-any.whl.metadata (1.1 kB)
Collecting itables (from ace_tools_open)
  Downloading itables-2.2.5-py3-none-any.whl.metadata (8.4 kB)
Downloading ace_tools_open-0.1.0-py3-none-any.whl (3.0 kB)
Downloading itables-2.2.5-py3-none-any.whl (1.4 MB)
   ---------------------------------------- 0.0/1.4 MB ? eta -:--:--
   ---------------------------------------- 1.4/1.4 MB 10.5 MB/s eta 0:00:00
Installing collected packages: itables, ace_tools_open
Successfully installed ace_tools_open-0.1.0 itables-2.2.5
Note: you may need to restart the kernel to use updated packages.


In [None]:
# Display results in a structured table format
import ace_tools_open as tools

# Formatting results for display
formatted_results = {
    "Matrix Multiplication (C=AB)": str(C.tolist()),
    "Hadamard Product (Element-wise)": str(hadamard_product.tolist()),
    "Dot Product (x^Ty)": dot_product,
    "Distributive Property Verified": distributive_test,
    "Associative Property Verified": associative_test,
    "Transpose Property Verified": transpose_test,
    "Solution of Ax=b": str(x_solution.tolist())
}

# Convert results into a structured DataFrame with strings for display compatibility
df_results = pd.DataFrame(list(formatted_results.items()), columns=['Operation', 'Result'])

# Display the results
tools.display_dataframe_to_user(name="Matrix and Vector Operations Results", dataframe=df_results)

Matrix and Vector Operations Results


Operation,Result
Loading ITables v2.2.5 from the internet... (need help?),


In [20]:
# Neural Network Computation
import torch 
from torch import nn
import ace_tools_open as tools

# Simple Neural Network class
class SimpleNeuralNetwork(nn.Module):
    def __init__(self, input_size, output_size):
        super(SimpleNeuralNetwork, self).__init__()
        self.linear = nn.Linear(input_size, output_size)

    def forward(self, x):
        # Matrix Multiplication
        out = self.linear(x)
        return out
    
# Example data
input_size = 3
output_size = 2
input_data = torch.tensor

input_data = torch.tensor([[1.0, 2.0, 3.0]])  # Input Matrix (A)

model = SimpleNeuralNetwork(input_size, output_size)

# Perform forward propagation
output = model.forward(input_data)

# print("Input:", input_data)
# print("Output:", output)

# # Access learned weights (B)
# weights = model.linear.weight.data
# print("\nAccess learned weights (B):")
# print(weights)

# # Access learned bias
# bias = model.linear.bias.data
# print("\nAccess learned bias:")
# print(bias)

# # Manual calculation of matrix multiplication
# manual_output = torch.mm(input_data, weights.T) + bias
# print("\nManual calculation:")
# print(manual_output)



# Formatting results for display
formatted_results = {
    "Input:": input_data,
    "Output:": output,
    "Access learned weights (B):": str(model.linear.weight.data),
    "Access learned bias:": str(model.linear.bias.data),
    "Manual calculation:": str(torch.mm(input_data, model.linear.weight.data.T) + model.linear.bias.data),
}

# Convert results into a structured DataFrame with strings for display compatibility
df_results = pd.DataFrame(list(formatted_results.items()), columns=['Operation', 'Result'])

# Display the results
tools.display_dataframe_to_user(name="Matrix and Vector Operations Results", dataframe=df_results)

Matrix and Vector Operations Results


Operation,Result
Loading ITables v2.2.5 from the internet... (need help?),


In [21]:
# Exercise Imagine you have a simple dataset with two inputs (x1, x2) and one output (y). We want to build a linear neural network to predict y data x1 and x2.

# Step 1: Data Generation
# Let's create a synthetic dataset with linear dependence between input and output.
# Linear dependence and span affect the stability and learning ability of models.
import numpy as np

X = np.array([[1, 2], [2, 3], [3, 4], [4, 5]]) # Input: 4 Samples 2 Features
y = np.array([3, 5, 7, 9]) # Output: 4 Samples 

In [26]:
# Step 2: Building the Linear Neural Network

# We use NumPy to build a "manual" linear neural network.

# Initialize the weights
weights = np.array([5, 5]) # Initial weights

# Prediction function
def predict(X, weights):
    return np.dot(X, weights) # Matrix multiplication (input * weights)

# Perform prediction
predictions = predict(X, weights)
print("Initial predictions:", predictions)

Initial predictions: [15 25 35 45]


In [27]:
# Step 3: Calculate Error and Update Weights

# We calculate the error and update the weights using the gradient.

# Cost function (Mean Squared Error)
def mse(y_true, y_pred):
    return np.mean((y_true - y_pred) ** 2)

# Calculate the initial error
error = mse(y, predictions)
print("Initial error:", error)

# Calculate the gradient (derivative of the error with respect to the weights)
gradient = 2 * np.dot(X.T, (predictions - y)) / len(y)

# Update the weights (gradient descent)
learning_rate = 0.01
weights = weights - learning_rate * gradient

# Make a new prediction and calculate the error
new_predictions = predict(X, weights)
new_error = mse(y, new_predictions)
print("New predictions:", new_predictions)
print("New error:", new_error)

Initial error: 656.0
New predictions: [ 9.84 16.56 23.28 30.  ]
New error: 221.6144


In [None]:
# Step 4: Linear Dependence and Rank Analysis

# Let's analyze the linear dependence of the input data.

# Calculate the rank of the input matrix
rank_X = np.linalg.matrix_rank(X)
print("Rank of input matrix:", rank_X)

# Check for linear dependence
is_linearly_dependent = rank_X < X.shape[1]
# If rank < number of columns, the matrix is linearly dependent
print("Linear dependence:", is_linearly_dependent)

Rank of input matrix: 2
Linear dependence: False


In [29]:
# Step 5: Inverse and Pseudo-Inverse (Optional)

# Let's try to solve the system Ax = y using the inverse and pseudo-inverse.

try: # Try to compute the inverse (may not exist)
    X_inv = np.linalg.inv(X)
    x_solution = np.dot(X_inv, y)
    print("Solution with inverse:", x_solution)
except np.linalg.LinAlgError:
    print("The input matrix is ​​not invertible.")

# Compute the pseudo-inverse
X_pinv = np.linalg.pinv(X)
x_pseudo_solution = np.dot(X_pinv, y)
print("Solution with pseudo-inverse:", x_pseudo_solution)

The input matrix is ​​not invertible.
Solution with pseudo-inverse: [1. 1.]


In [None]:
# Exercise

# Imagine you have a dataset with data representing product features (e.g. reviews, ratings, etc.) and you want to predict an overall rating. 
# Due to the nature of the data, the input matrix may not be square or invertible. 
# We will use Keras to build a neural network and the pseudo-inverse to handle this situation.

In [39]:
# Step 1: Data Generation

# We generate a non-square synthetic dataset.
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import Dense, Input

In [40]:
X = np.random.rand(100, 5) # 100 samples 5 features
y = np.random.rand(100, 1) # 100 samples 1 features

In [None]:
# Step 2: Building the Neural Network with Keras

# Let's build a simple neural network with a linear layer

model = Sequential()
model.add(Input(shape=(5,)))
model.add(Dense(64, activation="relu"))
model.add(Dense(32, activation="relu"))
model.add(Dense(1, activation="sigmoid"))

# Compile the model
model.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])
model.summary()

# - **Building the model:** We use a sequential model (`Sequential`) to build a neural network with three dense layers.
# - **Input Layer:** Accepts data with 5 features.
# - **First Dense Layer:** 64 neurons with ReLU activation.
# - **Second Dense Layer:** 32 neurons with ReLU activation.SVD
# - **Output Layer:** 1 neuron with sigmoid activation for binary classification.
# - **Building the model:** We use the Adam optimizer and the `binary_crossentropy` loss function, monitoring the accuracy during training.

In [43]:
# Train the model 
model.fit(X, y, epochs=50, verbose=0)

<keras.src.callbacks.history.History at 0x1c2bd178a40>

In [44]:
# Extract linear layer weights
weights = model.layers[0].get_weights()[0] # Weights
bias = model.layers[0].get_weights()[1] # Bias

# Compute pseudo-inverse of weights
weights_pinv = np.linalg.pinv(weights)

# Check dimensions
print("Dimensions of weights:", weights.shape)
print("Dimensions of pseudo-inverse of weights:", weights_pinv.shape)

# Example of using pseudo-inverse (optional)

Dimensions of weights: (5, 64)
Dimensions of pseudo-inverse of weights: (64, 5)


In [45]:
# Step 4: SVD Analysis (Optional)

# We perform SVD on the weight matrix to analyze singular values.

# Run SVD
U, D, Vt = np.linalg.svd(weights)

# Print singular values
print("Singular values:", D)

# Analyze singular values ​​to understand feature importance

Singular values: [1.5693597 1.4813324 1.3776999 1.3325217 1.2056307]
