# Chapter 2 Linear Algebra and Matrix Theory | 机器学习线性代数与矩阵理论

## 2.1 Vectors and Their Operations | 向量及其运算


### 2.1.1 Basic Concepts | 基本概念

<span style="color:lightblue"><strong>Matrix addition:</strong></span> Two matrices $\mathbf{A}$ and $\mathbf{B}$ of the same dimension can be added by adding their corresponding elements.

$$
\mathbf{C} = \mathbf{A} + \mathbf{B}, \quad \text{where} \quad c_{ij} = a_{ij} + b_{ij}.
$$

Here, each entry $c_{ij}$ in the resulting matrix $\mathbf{C}$ is the sum of entries $a_{ij}$ and $b_{ij}$ from matrices $\mathbf{A}$ and $\mathbf{B}$.

In [5]:
import numpy as np

# Define two matrices
A = np.array([[1, 2],
              [3, 4]])

B = np.array([[5, 6],
              [7, 8]])

# Add the matrices
C = A + B

# Print results
print("Matrix A:")
print(A)

print("\nMatrix B:")
print(B)

print("\nMatrix A + Matrix B:")
print(C)

Matrix A:
[[1 2]
 [3 4]]

Matrix B:
[[5 6]
 [7 8]]

Matrix A + Matrix B:
[[ 6  8]
 [10 12]]


<span style="color:lightblue"><strong>Vector subtraction:</strong></span>  The difference between two vectors $\mathbf{u}$ and $\mathbf{v}$ is obtained by subtracting their corresponding components.

$$
\mathbf{u} - \mathbf{v} = [u_1 - v_1, \; u_2 - v_2, \; \ldots, \; u_n - v_n].
$$

Geometrically, $\mathbf{u} - \mathbf{v}$ represents the vector from the tip of $\mathbf{v}$ to the tip of $\mathbf{u}$.

In [7]:
import numpy as np

# Define two vectors
u = np.array([3, 5])
v = np.array([1, 2])

# Subtract vectors
diff = u - v

# Print results
print("Vector u:")
print(u)

print("\nVector v:")
print(v)

print("\nVector u - v:")
print(diff)

Vector u:
[3 5]

Vector v:
[1 2]

Vector u - v:
[2 3]


<span style="color:lightblue"><strong>Matrix Transpose:</strong></span> The transpose of a matrix $\mathbf{A}$, denoted as $\mathbf{A}^T$, is formed by turning rows into columns and columns into rows.

$$
\mathbf{A} = \begin{bmatrix}
a_{11} & a_{12} \\
a_{21} & a_{22}
\end{bmatrix}
\quad
\Longrightarrow
\quad
\mathbf{A}^T = \begin{bmatrix}
a_{11} & a_{21} \\
a_{12} & a_{22}
\end{bmatrix}.
$$

In general, for matrix $\mathbf{A} = [a_{ij}]$, its transpose $\mathbf{A}^T = [a_{ji}]$.

In [6]:
import numpy as np

# Define a matrix
A = np.array([[1, 2, 3],
              [4, 5, 6]])

# Calculate its transpose
A_T = A.T

# Print results
print("Matrix A:")
print(A)

print("\nTranspose of Matrix A (A^T):")
print(A_T)

Matrix A:
[[1 2 3]
 [4 5 6]]

Transpose of Matrix A (A^T):
[[1 4]
 [2 5]
 [3 6]]


<span style="color:lightblue"><strong>Matrix multiplication (inner product):</strong></span> The inner product of two matrices $\mathbf{A}$ and $\mathbf{B}$ is defined if the number of columns in $\mathbf{A}$ equals the number of rows in $\mathbf{B}$.

If $\mathbf{A}$ is an $m \times n$ matrix and $\mathbf{B}$ is an $n \times p$ matrix, then their product $\mathbf{C} = \mathbf{A}\mathbf{B}$ is an $m \times p$ matrix where

$$
c_{ij} = \sum_{k=1}^{n} a_{ik} b_{kj}.
$$

Here, each element $c_{ij}$ is computed by taking the dot product of the $i$-th row of $\mathbf{A}$ with the $j$-th column of $\mathbf{B}$.

In [8]:
import numpy as np

# Define two matrices
A = np.array([[1, 2],
              [3, 4]])

B = np.array([[5, 6],
              [7, 8]])

# Perform matrix multiplication
C = np.dot(A, B)
# Alternatively, C = A @ B

# Print results
print("Matrix A:")
print(A)

print("\nMatrix B:")
print(B)

print("\nMatrix A * Matrix B:")
print(C)

Matrix A:
[[1 2]
 [3 4]]

Matrix B:
[[5 6]
 [7 8]]

Matrix A * Matrix B:
[[19 22]
 [43 50]]


<span style="color:lightblue"><strong>Inner product properties:</strong></span>

1. **Symmetry:**  
   $$
   x^T y = y^T x
   $$

2. **Distributivity over addition:**  
   $$
   (x + y)^T z = x^T z + y^T z
   $$

3. **Scalar multiplication:**  
   $$
   (k x)^T y = k x^T y
   $$

4. **Distributivity over addition (alternative form):**  
   $$
   z^T (x + y) = z^T x + z^T y
   $$

These properties show that the inner product is **symmetric**, **linear in each argument**, and **distributive** over vector addition.

In [9]:
import numpy as np

# Define vectors and scalar
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])
z = np.array([7, 8, 9])
k = 2

# 1. Symmetry: x^T y = y^T x
dot_xy = np.dot(x, y)
dot_yx = np.dot(y, x)

print("1. Symmetry:")
print("x^T y =", dot_xy)
print("y^T x =", dot_yx)
print("Are they equal?", np.allclose(dot_xy, dot_yx))

print("\n" + "-"*40)

# 2. Distributivity over addition: (x + y)^T z = x^T z + y^T z
left = np.dot(x + y, z)
right = np.dot(x, z) + np.dot(y, z)

print("2. Distributivity over addition:")
print("(x + y)^T z =", left)
print("x^T z + y^T z =", right)
print("Are they equal?", np.allclose(left, right))

print("\n" + "-"*40)

# 3. Scalar multiplication: (k x)^T y = k x^T y
left = np.dot(k * x, y)
right = k * np.dot(x, y)

print("3. Scalar multiplication:")
print("(k x)^T y =", left)
print("k x^T y =", right)
print("Are they equal?", np.allclose(left, right))

print("\n" + "-"*40)

# 4. Distributivity over addition (alternative form): z^T (x + y) = z^T x + z^T y
left = np.dot(z, x + y)
right = np.dot(z, x) + np.dot(z, y)

print("4. Distributivity over addition (alternative form):")
print("z^T (x + y) =", left)
print("z^T x + z^T y =", right)
print("Are they equal?", np.allclose(left, right))

1. Symmetry:
x^T y = 32
y^T x = 32
Are they equal? True

----------------------------------------
2. Distributivity over addition:
(x + y)^T z = 172
x^T z + y^T z = 172
Are they equal? True

----------------------------------------
3. Scalar multiplication:
(k x)^T y = 64
k x^T y = 64
Are they equal? True

----------------------------------------
4. Distributivity over addition (alternative form):
z^T (x + y) = 172
z^T x + z^T y = 172
Are they equal? True


### 2.1.3 Vector Norms | 向量的范数

The **norm** of a vector (also called its magnitude or length) generalizes the concept of length to different p-values.

---

#### **General L-p norm:**

For a vector $x$, the L-p norm is defined as:

$$
\|x\|_p = \left( \sum_{i=1}^{n} |x_i|^p \right)^{1/p}.
$$

Here, $p$ is a positive integer. Common choices are **p=1 (L1 norm)** and **p=2 (L2 norm)**.

---

#### **L1 norm (Manhattan norm):**

The L1 norm is the sum of the absolute values of all components.

$$
\|x\|_1 = \sum_{i=1}^{n} |x_i|.
$$

For example, if

$$
x = [1, -1, 2],
$$

then

$$
\|x\|_1 = |1| + |-1| + |2| = 4.
$$

---

#### **L2 norm (Euclidean norm):**

Also known as the length or magnitude of a vector:

$$
\|x\|_2 = \sqrt{ \sum_{i=1}^{n} x_i^2 }.
$$

It is often written simply as $\|x\|$.

For the vector above:

$$
\|x\|_2 = \sqrt{1^2 + (-1)^2 + 2^2}
= \sqrt{1 + 1 + 4}
= \sqrt{6}.
$$

---

A vector with **L2 norm = 1** is called a **unit vector**.

In [11]:
import numpy as np

# Define vector x
x = np.array([1, -1, 2])

# Compute L1 norm
l1_norm = np.linalg.norm(x, ord=1)

# Compute L2 norm
l2_norm = np.linalg.norm(x, ord=2)

print("Vector x:", x)
print("\nL1 norm of x:", l1_norm)
print("L2 norm of x (Euclidean norm):", l2_norm)

Vector x: [ 1 -1  2]

L1 norm of x: 4.0
L2 norm of x (Euclidean norm): 2.449489742783178


<span style="color:lightblue"><strong>Relationship between inner product and L2 norm:</strong></span>

The equation

$$
x^T x = \|x\|_2^2
$$

means that the **inner product of a vector with itself** equals the **square of its L2 norm (Euclidean norm)**.

- **Left side ($x^T x$):** The dot product of $x$ with itself, computed as $\sum_{i=1}^{n} x_i^2$.
- **Right side ($\|x\|_2^2$):** The square of the Euclidean length of $x$.

In words: the **sum of squares of the vector components** equals the **square of its length**.

In [10]:
import numpy as np

# Define vector x
x = np.array([1, -1, 2])

# Compute inner product x^T x
inner_product = np.dot(x, x)

# Compute L2 norm squared
l2_norm_squared = np.linalg.norm(x, ord=2)**2

print("x^T x =", inner_product)
print("||x||_2^2 =", l2_norm_squared)
print("Are they equal?", np.allclose(inner_product, l2_norm_squared))

x^T x = 6
||x||_2^2 = 5.999999999999999
Are they equal? True


#### **Unit vector normalization:**

For any non-zero vector $x$, we can create a **unit vector** (normalize it) by dividing by its L2 norm:

$$
\frac{x}{\|x\|}.
$$

For example, for

$$
x = [1, -1, 2],
$$

its normalized (unit) vector is:

$$
\left[
\frac{1}{\sqrt{6}},
\frac{-1}{\sqrt{6}},
\frac{2}{\sqrt{6}}
\right].
$$

---

#### **L-infinity norm ($p = \infty$):**

When $p = \infty$, the L-p norm becomes the **L-infinity norm**, defined as:

$$
\|x\|_\infty = \max |x_i|.
$$

That is, it equals the **maximum absolute value among the vector components**.

For the above vector,

$$
\|x\|_\infty = \max(|1|, |-1|, |2|) = 2.
$$

---

#### **L-infinity as the limit of L-p:**

Mathematically,

$$
\|x\|_\infty = \lim_{p \to +\infty}
\left( \sum_{i=1}^{n} |x_i|^p \right)^{1/p}.
$$

In [None]:
import numpy as np

# Define vector x
x = np.array([1, -1, 2])

# Compute L-infinity norm
l_inf_norm = np.linalg.norm(x, ord=np.inf)

# Compute unit vector (normalize x)
l2_norm = np.linalg.norm(x, ord=2)
x_unit = x / l2_norm

print("Vector x:", x)
print("\nL-infinity norm (max absolute value):", l_inf_norm)
print("Unit vector (normalized x):", x_unit)