# 4. Vectors

## 4.1 Define a vector

We can represent a vector in Python as a NumPy array. A NumPy array can be created from a list of numbers. For example, below we define a vector with a length of 3 and the integer values 1, 2, and 3.

In [73]:
# create a vector
from numpy import array
# define vector
v = array([1, 2, 3])
print(v)

[1 2 3]


## 4.2 Operations with vectors

#### 4.2.1 Vector Addition

- **Basic Operation**: It is indicated that two vectors of equal length can be added together. This means that if you have two vectors, (a) and (b), you can add (b) to (a) to obtain a new vector (c).

- **Result of Addition**: The resulting vector (c) has the same length as the original vectors (a) and (b). Each element of this new vector (c) is calculated as the sum of the corresponding elements of (a) and (b) at the same indices.

- **Mathematical Expression**: The operation can be simply expressed as $c = a + b$. This breaks down element by element as follows:

$$
c = (a_1 + b_1, a_2 + b_2, a_3 + b_3)
$$

Or, expressed another way, for each element of the resulting vector (c), it is calculated by summing the corresponding elements of (a) and (b):

$$
\begin{aligned}
c[0] &= a[0] + b[0] \\
c[1] &= a[1] + b[1] \\
c[2] &= a[2] + b[2]
\end{aligned}
$$

In [74]:
# vector addition
from numpy import array
# define first vector
a = array([1, 2, 3])
print(a)
# define second vector
b = array([1, 2, 3])
print(b)
# add vectors
c = a + b
print(c)

[1 2 3]
[1 2 3]
[2 4 6]


#### 4.4.2 Vector Subtraction

- **Basic Operation**: It is indicated that one vector can be subtracted from another vector of equal length. This means that if you have two vectors, (a) and (b), you can subtract (b) from (a) to obtain a new vector (c).

- **Result of Subtraction**: The resulting vector (c) has the same length as the original vectors (a) and (b). Each element of this new vector (c) is calculated as the subtraction of the corresponding elements of (a) and (b) at the same indices.

- **Mathematical Expression**: The operation can be simply expressed as $c = a - b$. This breaks down element by element as follows:

$$
c = (a_1 - b_1, a_2 - b_2, a_3 - b_3)
$$

Or, expressed another way, for each element of the resulting vector (c), it is calculated by subtracting the corresponding elements of (a) and (b):

$$
\begin{aligned}
c[0] &= a[0] - b[0] \\
c[1] &= a[1] - b[1] \\
c[2] &= a[2] - b[2]
\end{aligned}
$$

In [75]:
# vector subtraction
from numpy import array
# define first vector
a = array([1, 2, 3])
print(a)
# define second vector
b = array([0.5, 0.5, 0.5])
print(b)
# subtract vectors
c = a - b
print(c)

[1 2 3]
[0.5 0.5 0.5]
[0.5 1.5 2.5]


#### 4.4.3 Vector multiplication

When you have two vectors, say (a) and (b), of the same length, you can multiply them element-wise. This means that you multiply each element of vector (a) by the corresponding element in the same position of vector (b). The result is a new vector (c) of the same length as the original vectors, where each element (c[i]) is the product of (a[i] \times b[i]).

The operation is mathematically described as follows:

- For two vectors (a) and (b), the resulting vector (c) is calculated as:

$$
c = a \times b
$$

- This is done element-wise, resulting in a new vector (c) of the same length:

$$
c = (a_1 \times b_1, a_2 \times b_2, a_3 \times b_3)
$$

- Or more simply:

$$
c = (a_1 b_1, a_2 b_2, a_3 b_3)
$$

- Another way to see it is by assigning each product to the corresponding position in the resulting vector:

$$
\begin{aligned}
c[0] &= a[0] \times b[0] \\
c[1] &= a[1] \times b[1] \\
c[2] &= a[2] \times b[2]
\end{aligned}
$$

In [76]:
# vector multiplication
from numpy import array
# define first vector
a = array([1, 2, 3])
print(a)
# define second vector
b = array([1, 2, 3])
print(b)
# multiply vectors
c = a * b
print(c)

[1 2 3]
[1 2 3]
[1 4 9]


#### 4.4.4 Vector division

Given two vectors (a) and (b) of equal length, element-wise division is performed by taking each element of vector (a) and dividing it by the corresponding element of vector (b), resulting in a new vector (c) of the same length. Mathematically, this is represented as:

$$
c = \left( \frac{a_1}{b_1}, \frac{a_2}{b_2}, \frac{a_3}{b_3}, \ldots, \frac{a_n}{b_n} \right)
$$

Where $(a_i)$ and $(b_i)$ are the elements of vectors (a) and (b) respectively, and $(n)$ is the length of the vectors.

In [77]:
# vector division
from numpy import array
# define first vector
a = array([1, 2, 3])
print(a)
# define second vector
b = array([1, 2, 3])
print(b)
# divide vectors
c = a / b
print(c)

[1 2 3]
[1 2 3]
[1. 1. 1.]


## 4.3 Dot Product

The dot product, also known as the scalar product, is a mathematical operation that takes two vectors of equal length and returns a single scalar number. This operation is fundamental in various areas of mathematics, physics, and engineering, especially in the field of linear algebra and vector analysis.

The operation is performed by correspondingly multiplying the elements of the two vectors and then summing those products. Mathematically, if we have two vectors (a) and (b), the dot product is calculated as:

$$
c = a \cdot b = (a_1 \times b_1 + a_2 \times b_2 + a_3 \times b_3)
$$

Or more compactly for vectors of any length (n):

$$
c = \sum_{i=1}^{n} a_i b_i
$$

### Applications and Uses

1. **Vector Projections**: The dot product is used to calculate the projection of one vector onto another. This is useful in physics for decomposing forces and in computer graphics for calculating shadows and reflections.

2. **Determining Orthogonality**: Two vectors are orthogonal (perpendicular) if their dot product is zero. This property is fundamental in defining orthogonal bases in vector spaces.

3. **Angle Calculation**: The dot product is used along with the norm of the vectors to calculate the angle between them using the formula:

$$
\cos(\theta) = \frac{a \cdot b}{|a| |b|}
$$

4. **Applications in Machine Learning**: In machine learning, the dot product is used to calculate weighted sums of features, especially in linear models like linear regression and neural networks.

5. **Vector Decomposition**: It allows decomposing a vector into parallel and perpendicular components with respect to another vector, which is useful in force analysis, for example.

In [78]:
# Define the first vector
a = [1, 2, 3]
print(a)
# Define the second vector
b = [1, 2, 3]
print(b)
# Multiply vectors and sum the results to obtain the dot product
c = a[0]*b[0] + a[1]*b[1] + a[2]*b[2]
print(c)

[1, 2, 3]
[1, 2, 3]
14


In Python, the NumPy library offers an efficient way to calculate the dot product between two vectors using the ```dot()``` function.

In [79]:
# vector dot product
from numpy import array
# define first vector
a = array([1, 2, 3])
print(a)
# define second vector
b = array([1, 2, 3])
print(b)
# multiply vectors
c = a.dot(b)
print(c)

[1 2 3]
[1 2 3]
14


## 4.4 Multiplication of a Vector by a Scalar

The multiplication of a vector by a scalar is performed by multiplying each element of the vector by the scalar. This results in a new vector whose length is the same as the original vector, but each of its elements has been scaled by (s). Mathematically, if we have a vector (v) and a scalar (s), the resulting vector (c) is calculated as:

$$
c = s \times v
$$

Or more specifically, if the vector (v) has elements $(v_1, v_2, v_3)$, then the scaled vector (c) will have elements:

$$
c = (s \times v_1, s \times v_2, s \times v_3)
$$

This can be expressed element by element as:

$$
\begin{aligned}
c[0] &= v[0] \times s \\
c[1] &= v[1] \times s \\
c[2] &= v[2] \times s
\end{aligned}
$$

### Applications and Uses in Mathematics

The multiplication of a vector by a scalar has various applications and uses in mathematics, physics, and other sciences:

- **Change of Magnitude**: It allows changing the magnitude of a vector without altering its direction. This is useful in physics to model phenomena such as acceleration or force, where the magnitude of a vector needs to be adjusted while keeping its direction constant.

- **Linear Transformations**: It is fundamental in the study of linear transformations and matrices. Scalar multiplication is a basic operation in linear algebra, used in the definition of vector spaces and their properties.

- **Graphics and Visualization**: In computer graphics, the multiplication of vectors by scalars is used to scale objects, adjusting their size without changing their shape.

- **Data Adjustment**: In data analysis and statistics, it is often necessary to adjust the values of a dataset by multiplying them by a scalar factor to normalize them or to make comparisons on a common scale.

In [80]:
# vector-scalar multiplication
from numpy import array
# define vector
a = array([1, 2, 3])
print(a)
# define scalar
s = 0.5
print(s)
# multiplication
c = s * a
print(c)

[1 2 3]
0.5
[0.5 1.  1.5]


# 5. Vector Norm

Calculating the length or magnitude of vectors is often necessary, either directly as a regularization method in machine learning, or as part of broader vector or matrix operations. In this tutorial, you will discover the different ways to calculate the lengths or magnitudes of vectors, known as the vector norm.

## 5.1 Vector norm

Calculating the size or length of a vector is often necessary, either directly or as part of a broader vector or vector-matrix operation. The length of the vector is known as the vector norm or the magnitude of the vector.

> The length of a vector is a non-negative number that describes the extent of the vector in space, and is sometimes referred to as the magnitude of the vector or the norm.
>
> — page 112, _no bullshit guide to linear algebra_, 2017.

The length of the vector is always a positive number, except for a vector of all zero values. It is calculated using some measure that summarizes the distance of the vector from the origin of the vector space. For example, the origin of a vector space for a vector with 3 elements is $(0, 0, 0)$. Notations are used to represent the vector norm in broader calculations, and the type of vector norm calculation almost always has its own unique notation. Let's take a look at some common vector norm calculations used in machine learning.

## 5.2 Vector $L^1$ Norm

The text fragment describes the calculation of the length of a vector using the $L^1$ norm, also known as the taxicab norm or Manhattan norm. The notation for the $L^1$ norm of a vector is $\|v\|_1$, where the 1 is a subscript. This method calculates the length of a vector as the sum of the absolute values of its components.

### Operation

The $L^1$ norm is calculated by summing the absolute values of the elements of the vector. Mathematically, if you have a vector \(v\) with components $|a|_1, |a|_2, |a|_3$, the $L^1$ norm is calculated as:

$$
\|v\|_1 = |a_1| + |a_2| + |a_3|
$$

The absolute value, denoted by \(|a|\), of a scalar number \(a\) is its value without regard to the sign. This means that both positive and negative values are treated as positive for the purposes of this calculation.

### Applications and Uses in Mathematics

The $L^1$ norm has several important applications in mathematics, data science, and machine learning:

- **Distance Measurement:** The Manhattan norm is used to measure distance in grids that allow only vertical and horizontal movements, such as a city grid divided into blocks.

- **Regularization in Machine Learning:** In machine learning, the $L^1$ norm is used as a regularization technique known as Lasso (Least Absolute Shrinkage and Selection Operator). Lasso regularization can help simplify models by forcing the sum of the absolute values of the model's coefficients to be less than a fixed value, which can make some coefficients exactly zero and thus eliminate some features from the model.

- **Discrimination of Null and Non-Null Elements:** In certain machine learning applications, it is important to distinguish between elements that are exactly zero and those that are small but non-zero. The $L^1$ norm is useful in these cases because it grows at the same rate in all locations while maintaining mathematical simplicity.

In [81]:
# Define a vector with 3 elements
v = [1, -2, 3]

# Manually calculate the absolute value of each element
abs_v0 = v[0] if v[0] >= 0 else -v[0]
abs_v1 = v[1] if v[1] >= 0 else -v[1]
abs_v2 = v[2] if v[2] >= 0 else -v[2]

# Manually calculate the L^1 norm by summing the absolute values
l1_norm = abs_v0 + abs_v1 + abs_v2

print("Absolute values:", abs_v0, abs_v1, abs_v2)
print("Vector:", v)
print("L^1 norm of the vector calculated:", l1_norm)

Absolute values: 1 2 3
Vector: [1, -2, 3]
L^1 norm of the vector calculated: 6


### Implementation in NumPy

The $L^1$ norm of a vector can be easily calculated in Python using the NumPy library, specifically the `norm()` function from the `numpy.linalg` submodule, passing 1 as the order parameter of the norm. Here is an example of how to perform this calculation:

This approach provides an efficient and straightforward way to calculate the $L^1$ norm, facilitating its application in various fields of science and technology.

In [82]:
import numpy as np

# Define a vector with 3 elements
v = np.array([1, -2, 3])

# Calculate the L^1 norm
norma_l1 = np.linalg.norm(v, 1)

print("Vector:", v)
print("L^1 norm of the vector:", norma_l1)

Vector: [ 1 -2  3]
L^1 norm of the vector: 6.0


## 5.3 Euclidean Norm

The $L^2$ norm of a vector, also known as the Euclidean norm, is a measure of the length or magnitude of the vector. It is calculated as the square root of the sum of the squares of its components. The notation for the $L^2$ norm of a vector is $\|v\|_2$, where the 2 is a subscript.

The formula to calculate the $L^2$ norm of a vector \(v\) with components \(a_1, a_2, a_3\) is:

$$
\|v\|_2 = \sqrt{a_1^2 + a_2^2 + a_3^2}
$$

This formula calculates the distance of the vector from the origin of the vector space to the point defined by the vector, using the Euclidean distance (the most common way to measure distances in space).

### Operation

The calculation of the $L^2$ norm involves two main steps:

1. Squaring each component of the vector.
2. Summing all the squared values obtained and taking the square root of the result.

This process results in a positive value that represents the distance of the vector to the origin of the vector space.

### Applications and Uses in Mathematics

The $L^2$ norm has several important applications in both mathematics and computer science, including:

- **Distance Measurement:** It is the basis of Euclidean geometry and is used to measure the distance between points in space.
- **Regularization in Machine Learning:** In the context of machine learning, the $L^2$ norm is used as a regularization method (known as Ridge or Tikhonov regularization) to prevent overfitting by keeping the model's coefficients small, which in turn makes the model less complex.
- **Optimization:** The $L^2$ norm is used in optimization algorithms, especially those that require a measure of distance or length, such as gradient descent.

In [83]:
# Define the vector
v = [1, 2, 3]

# Step 1: Square each component of the vector
squares = [x**2 for x in v]

# Step 2: Sum all the squared values
sum_squares = sum(squares)

# Step 3: Take the square root of the sum result
l2_norm = sum_squares**0.5

print("The L^2 norm of the vector is:", l2_norm)

The L^2 norm of the vector is: 3.7416573867739413


### Implementation in NumPy

The $L^2$ norm of a vector can be easily calculated in Python using the NumPy library, specifically the `norm()` function from the `numpy.linalg` submodule, which by default calculates the $L^2$ norm if no other parameter is specified.

In [84]:
from numpy import array
from numpy.linalg import norm

# Define vector
a = array([1, 2, 3])
print(a)

# Calculate L^2 norm
l2 = norm(a)
print(l2)

[1 2 3]
3.7416573867739413


## 5.4 Maximum Norm

The maximum norm, also known as the max norm or $L^{\infty}$, is a way to calculate the length of a vector that, instead of summing or combining the components of the vector in some way, simply takes the largest absolute value of all the components of the vector. The notation for the maximum norm of a vector is $\|v\|_{\infty}$, where the subscript $\infty$ represents the concept of infinity.

The formula to calculate the maximum norm of a vector \(v\) with components \(a_1, a_2, a_3\) is:

$$
\|v\|_{\infty} = \max(a_1, a_2, a_3)
$$

This means that the absolute value of each component of the vector is taken, and from these values, the largest is selected as the maximum norm of the vector.

### Operation

The calculation of the maximum norm involves two main steps:

1. Taking the absolute value of each component of the vector.
2. Selecting the largest of these absolute values.

This process results in the largest absolute value of the components of the vector, representing the "length" of the vector under the maximum norm.

### Applications and Uses in Mathematics

The maximum norm has several important applications, especially in applied mathematics and computer science:

- **Numerical Analysis:** The maximum norm is used in numerical analysis as a way to estimate the maximum error in approximate calculations.
- **Optimization:** In optimization problems, the maximum norm can be useful to restrict solutions to a specific range, especially in infinite linear programming problems.
- **Regularization in Machine Learning:** In machine learning, the maximum norm is used as a regularization technique, known as max norm regularization, to prevent overfitting in the weights of neural networks. This helps keep the weights small and prevents some weights from becoming too dominant.

### Implementation in NumPy

The maximum norm of a vector can be easily calculated in Python using the NumPy library, specifically the `norm()` function from the `numpy.linalg` submodule, passing the parameter `inf` for the order of the norm.

This code defines a vector of 3 elements and then calculates its maximum norm. When running the example, the defined vector is first printed and then the maximum norm of the vector, which in this case is 3.0, since 3 is the largest absolute value among the components of the vector.

The maximum norm is a useful tool in various areas of mathematics and data science, offering a different perspective on the "length" or magnitude of vectors, especially in contexts where the largest component is of particular interest.

In [85]:
# Define the vector
v = [1, -2, 3]

# Step 1: Take the absolute value of each component of the vector
absolute_values = [abs(x) for x in v]

# Step 2: Find the maximum of these absolute values
max_norm = max(absolute_values)

print("The maximum norm of the vector is:", max_norm)

The maximum norm of the vector is: 3


In [86]:
from math import inf
from numpy import array
from numpy.linalg import norm

# Define vector
a = array([1, 2, 3])
print(a)

# Calculate maximum norm
maxnorm = norm(a, inf)
print(maxnorm)

[1 2 3]
3.0


# 6. Matrix

A matrix is a two-dimensional array of scalars with one or more columns and one or more rows.

A matrix is a two-dimensional array (a table) of numbers.

The notation for a matrix is often an uppercase letter, such as A, and the entries are referred to by their two-dimensional subscript of row (i) and column (j), such as $a_{i,j}$. For example, we can define a matrix with 3 rows and 2 columns:

$$
A = ((a_{1,1}, a_{1,2}), (a_{2,1}, a_{2,2}), (a_{3,1}, a_{3,2}))
$$

It is more common to see matrices defined using horizontal notation.

$$
A = \begin{pmatrix}
a_{1,1} & a_{1,2} \\
a_{2,1} & a_{2,2} \\
a_{3,1} & a_{3,2}
\end{pmatrix}
$$

A likely first place where you might encounter a matrix in machine learning is in the model training data composed of many rows and columns and often represented using the uppercase letter $X$. The geometric analogy used to help understand vectors and some of their operations does not hold with matrices. Additionally, a vector itself can be considered a matrix with one column and multiple rows. Often, the dimensions of the matrix are denoted as $m$ and $n$ or $m \times n$ for the number of rows and the number of columns respectively. Now that we know what a matrix is, let's see how to define one in Python.

## 6.1 Matrices and Matrix Arithmetic

En esta sección demostraremos aritmética simple de matriz a matriz, donde todas las operaciones se realizan elemento por elemento entre dos matrices de igual tamaño para resultar en una nueva matriz del mismo tamaño.

#### 6.1.1 Matrix Addition

1. **Definition of Matrices**: Two matrices, `A` and `B`, with the same dimensions are defined. In this case, both matrices are of size 2x3 (two rows and three columns).

2. **Matrix Addition**: The matrices `A` and `B` are added using the `+` operator. Matrix addition is performed element-wise. That is, each element of the resulting matrix `C` is the sum of the corresponding elements in `A` and `B`.

- The mathematical operation behind matrix addition is straightforward: if you have two matrices `A` and `B` of the same dimensions, then the resulting matrix `C` is calculated by summing the corresponding elements of `A` and `B`.

- The mathematical notation used describes how this element-wise addition is performed. For example, the element in the first row and first column of `C` (`C[0, 0]`) is the sum of `A[0, 0]` and `B[0, 0]`.

- **Linear Systems**: Matrix addition is used in solving systems of linear equations, where the combination of different systems can be represented by the sum of their corresponding matrices.

- **Linear Transformations**: In computer graphics and linear algebra, transformations of objects can be combined by adding their transformation matrices.

- **Statistics and Data Science**: In statistics, matrix addition is used to combine data or perform operations on multidimensional datasets.

In [87]:
from numpy import array

# define first matrix
A = array([
    [1, 2, 3],
    [4, 5, 6]
])

# define second matrix
B = array([
    [1, 2, 3],
    [4, 5, 6]
])

# add matrices
C = A + B

print(C)

[[ 2  4  6]
 [ 8 10 12]]


#### 6.1.2 Matrix Subtraction

Similarly, a matrix can be subtracted from another matrix with the same dimensions.

$$
C = A - B
$$

The scalar elements in the resulting matrix are calculated as the subtraction of the elements in each of the matrices.

$$
C = \begin{pmatrix}
a_{1,1} - b_{1,1} & a_{1,2} - b_{1,2} \\
a_{2,1} - b_{2,1} & a_{2,2} - b_{2,2} \\
a_{3,1} - b_{3,1} & a_{3,2} - b_{3,2}
\end{pmatrix}
$$

Or, in other words:

$$
\begin{aligned}
C[0, 0] &= A[0, 0] - B[0, 0] \\
C[1, 0] &= A[1, 0] - B[1, 0] \\
C[2, 0] &= A[2, 0] - B[2, 0] \\
C[0, 1] &= A[0, 1] - B[0, 1] \\
C[1, 1] &= A[1, 1] - B[1, 1] \\
C[2, 1] &= A[2, 1] - B[2, 1]
\end{aligned}
$$

In [88]:
from numpy import array
# define first matrix
A = array([
    [1, 2, 3],
    [4, 5, 6]
])
print(A)
# define second matrix
B = array([
    [0.5, 0.5, 0.5],
    [0.5, 0.5, 0.5]
])
print(B)
# subtract matrices
C = A - B
print(C)

[[1 2 3]
 [4 5 6]]
[[0.5 0.5 0.5]
 [0.5 0.5 0.5]]
[[0.5 1.5 2.5]
 [3.5 4.5 5.5]]


#### 6.1.3 Matrix Multiplication

The provided text describes the Hadamard product, also known as element-wise multiplication between two matrices of the same size. This operation is not conventional matrix multiplication, so a different operator, such as a circle ⊙, is used to represent it.

The operation is defined as:

$$
C = A \circ B
$$

Where \(C\) is the resulting matrix from the element-wise multiplication of matrices \(A\) and \(B\). Each element in \(C\) is calculated by multiplying the corresponding elements in \(A\) and \(B\).

For example, for matrices of size 3x2, the operation would look like this:

$$
C = \begin{pmatrix}
a_{1,1} \times b_{1,1} & a_{1,2} \times b_{1,2} \\
a_{2,1} \times b_{2,1} & a_{2,2} \times b_{2,2} \\
a_{3,1} \times b_{3,1} & a_{3,2} \times b_{3,2}
\end{pmatrix}
$$

Or, expressed in another way:

$$
\begin{aligned}
C[0, 0] &= A[0, 0] \times B[0, 0] \\
C[1, 0] &= A[1, 0] \times B[1, 0] \\
C[2, 0] &= A[2, 0] \times B[2, 0] \\
C[0, 1] &= A[0, 1] \times B[0, 1] \\
C[1, 1] &= A[1, 1] \times B[1, 1] \\
C[2, 1] &= A[2, 1] \times B[2, 1]
\end{aligned}
$$

The Hadamard product is fundamental in various areas of mathematics and science, especially in those that require element-wise operations, such as:

- **Signal and image processing**: To apply filters or perform point-to-point operations on images.
- **Neural networks**: In the forward and backward propagation of neural networks, where neuron activations are multiplied element-wise by gradients.
- **Statistics and probability**: For element-wise operations in matrices representing data or probabilities.

The Hadamard product allows great flexibility in handling matrices, facilitating operations that would be complex or impossible to perform with conventional matrix multiplication.

In [89]:
from numpy import array
# define first matrix
A = array([
    [1, 2, 3],
    [4, 5, 6]
])
print(A)
# define second matrix
B = array([
    [1, 2, 3],
    [4, 5, 6]
])
print(B)
# multiply matrices
C = A * B
print(C)

[[1 2 3]
 [4 5 6]]
[[1 2 3]
 [4 5 6]]
[[ 1  4  9]
 [16 25 36]]


#### 6.1.4 Matrix Division

The provided text describes how a matrix can be divided by another matrix of the same dimensions, element by element. This operation is known as element-wise division or Hadamard division, although this term is more commonly used for element-wise multiplication.

The operation is mathematically defined as:

$$
C = \frac{A}{B}
$$

Where \(C\) is the resulting matrix from dividing each element of matrix \(A\) by the corresponding element in the same position of matrix \(B\).

For example, for matrices of size 3x2, the operation would look like this:

$$
C = \begin{pmatrix}
\frac{a_{1,1}}{b_{1,1}} & \frac{a_{1,2}}{b_{1,2}} \\
\frac{a_{2,1}}{b_{2,1}} & \frac{a_{2,2}}{b_{2,2}} \\
\frac{a_{3,1}}{b_{3,1}} & \frac{a_{3,2}}{b_{3,2}}
\end{pmatrix}
$$

Or, expressed in another way:

$$
\begin{aligned}
C[0, 0] &= \frac{A[0, 0]}{B[0, 0]} \\
C[1, 0] &= \frac{A[1, 0]}{B[1, 0]} \\
C[2, 0] &= \frac{A[2, 0]}{B[2, 0]} \\
C[0, 1] &= \frac{A[0, 1]}{B[0, 1]} \\
C[1, 1] &= \frac{A[1, 1]}{B[1, 1]} \\
C[2, 1] &= \frac{A[2, 1]}{B[2, 1]}
\end{aligned}
$$

The implementation in Python of this operation is done using the division operator (/) directly on two NumPy arrays.

Element-wise division is useful in various mathematical and scientific applications, especially those that require direct and specific operations on the elements of matrices, such as:

- **Signal and image processing**: To adjust image intensity or apply specific corrections to each pixel.
- **Statistical analysis**: To normalize data, where each element of a data matrix can be divided by a corresponding value (e.g., an average) to obtain a relative measure.
- **Machine learning and neural networks**: In some normalization operations or weight adjustments during model training.

It is important to note that this operation requires that no element in matrix \(B\) is zero, as division by zero is undefined and would result in an error during code execution.

In [90]:
from numpy import array
# define first matrix
A = array([
    [1, 2, 3],
    [4, 5, 6]
])
print(A)
# define second matrix
B = array([
    [1, 2, 3],
    [4, 5, 6]
])
print(B)
# divide matrices
C = A / B
print(C)

[[1 2 3]
 [4 5 6]]
[[1 2 3]
 [4 5 6]]
[[1. 1. 1.]
 [1. 1. 1.]]


## 6.2 Matrix-Matrix Multiplication

Matrix multiplication, also known as the dot product of matrices, is a more complex operation than the previous ones and follows a specific rule, as not all matrices can be multiplied by each other.

The rule for matrix multiplication is as follows:

- The number of columns (n) in the first matrix (A) must be equal to the number of rows (m) in the second matrix (B).

For example, if matrix A has dimensions of m rows and n columns and matrix B has dimensions of n x k, the n columns in A and the n rows in B are equal. The result is a new matrix with m rows and k columns.

$$
C(m, k) = A(m, n) \cdot B(n, k)
$$

This rule applies to a chain of matrix multiplications where the number of columns in one matrix of the chain must match the number of rows in the next matrix of the chain.

For example, given matrices A and B:

$$
A = \begin{pmatrix}
a_{1,1} & a_{1,2} \\
a_{2,1} & a_{2,2}
\end{pmatrix}
$$

$$
B = \begin{pmatrix}
b_{1,1} & b_{1,2} \\
b_{2,1} & b_{2,2}
\end{pmatrix}
$$

The product matrix C is given by:

$$
C = \begin{pmatrix}
a_{1,1} \times b_{1,1} + a_{1,2} \times b_{2,1} & a_{1,1} \times b_{1,2} + a_{1,2} \times b_{2,2} \\
a_{2,1} \times b_{1,1} + a_{2,2} \times b_{2,1} & a_{2,1} \times b_{1,2} + a_{2,2} \times b_{2,2}
\end{pmatrix}
$$

We can describe the matrix multiplication operation using array notation.

$$
\begin{aligned}
C[0, 0] &= A[0, 0] \times B[0, 0] + A[0, 1] \times B[1, 0] \\
C[1, 0] &= A[1, 0] \times B[0, 0] + A[1, 1] \times B[1, 0] \\
C[0, 1] &= A[0, 0] \times B[0, 1] + A[0, 1] \times B[1, 1] \\
C[1, 1] &= A[1, 0] \times B[0, 1] + A[1, 1] \times B[1, 1]
\end{aligned}
$$

The matrix multiplication operation can be implemented in NumPy using the `dot()` function. It can also be calculated using the newer `@` operator, available since Python version 3.5. The example below demonstrates both methods.

### Functionality and Mathematical Applications

Matrix multiplication is fundamental in many areas of mathematics and science, including:

- **Linear algebra**: Where it is used to represent systems of linear equations.
- **Computer science**: In computer graphics, for object transformations in space.
- **Physics**: To describe rotations and transformations in quantum mechanics and relativity.
- **Economics**: In economic models to calculate the product of input-output matrices.

This operation allows the combination of linear transformations, making it powerful for modeling and solving complex mathematical and scientific problems.

In [91]:
# matrix dot product
from numpy import array
# define the first matrix
A = array([
    [1, 2],
    [3, 4]
])
print(A)
# define the second matrix
B = array([
    [2, 0],
    [1, 2]
])
print(B)
# multiply matrices using the dot function
C = A.dot(B)
print(C)
# multiply matrices using the @ operator
C = A @ B
print(C)

[[1 2]
 [3 4]]
[[2 0]
 [1 2]]
[[ 4  4]
 [10  8]]
[[ 4  4]
 [10  8]]


## 6.3 Matrix-Vector Multiplication

A matrix and a vector can be multiplied together as long as the matrix multiplication rule is observed. Specifically, the number of columns in the matrix must be equal to the number of elements in the vector. As with matrix multiplication, the operation can be written using dot notation. Since the vector has only one column, the result is always a vector.

$$
c = A \cdot v
$$

Or without the dot, in a compact form.

$$
c = Av 
$$

The result is a vector with the same number of rows as the original matrix.

$$
A = \begin{pmatrix}
a_{1,1} & a_{1,2} \\
a_{2,1} & a_{2,2} \\
a_{3,1} & a_{3,2}
\end{pmatrix}
$$

$$
v = \begin{pmatrix}
v_1 \\
v_2
\end{pmatrix}
$$

$$
c = \begin{pmatrix}
a_{1,1} \times v_1 + a_{1,2} \times v_2 \\
a_{2,1} \times v_1 + a_{2,2} \times v_2 \\
a_{3,1} \times v_1 + a_{3,2} \times v_2
\end{pmatrix}
$$

Or, more compactly.

$$
c = \begin{pmatrix}
a_{1,1}v_1 + a_{1,2}v_2 \\
a_{2,1}v_1 + a_{2,2}v_2 \\
a_{3,1}v_1 + a_{3,2}v_2
\end{pmatrix}
$$

We can also represent this with array notation.

$$
\begin{aligned}
c[0] &= A[0, 0] \times v[0] + A[0, 1] \times v[1] \\
c[1] &= A[1, 0] \times v[0] + A[1, 1] \times v[1] \\
c[2] &= A[2, 0] \times v[0] + A[2, 1] \times v[1]
\end{aligned}
$$

Matrix-vector multiplication can be implemented in NumPy using the `dot()` function.

Multiplying a matrix by a vector is a fundamental operation in linear algebra. This operation has many practical applications, including:

- **Linear transformations**: In computer graphics, object transformations in space (such as rotations, scaling, and translations) can be represented as matrix-vector multiplications.
- **Systems of linear equations**: Solving a system of linear equations can be represented as multiplying a matrix (the coefficients of the equations) by a vector (the unknowns) to equal another vector (the independent terms).
- **Network analysis**: In graph theory and network analysis, matrix-vector multiplication is used to calculate network properties, such as node centrality.

In [92]:
# matrix-vector multiplication
from numpy import array
# define matrix
A = array([
    [1, 2],
    [3, 4],
    [5, 6]
])
print(A)
# define vector
B = array([0.5, 0.5])
print(B)
# multiply
C = A.dot(B)
print(C)

[[1 2]
 [3 4]
 [5 6]]
[0.5 0.5]
[1.5 3.5 5.5]


## 6.4 Scalar-Matrix Multiplication

A matrix can be multiplied by a scalar. This can be represented using dot notation between the matrix and the scalar.

$$
C = A \cdot b
$$

Or without dot notation.

$$
C = Ab
$$

The result is a matrix of the same size as the original matrix where each element of the matrix is multiplied by the scalar value.

$$
A = \begin{pmatrix}
a_{1,1} & a_{1,2} \\
a_{2,1} & a_{2,2} \\
a_{3,1} & a_{3,2}
\end{pmatrix}
$$

$$
C = \begin{pmatrix}
a_{1,1} \times b + a_{1,2} \times b \\
a_{2,1} \times b + a_{2,2} \times b \\
a_{3,1} \times b + a_{3,2} \times b
\end{pmatrix}
$$

or

$$
C = \begin{pmatrix}
a_{1,1}b + a_{1,2}b \\
a_{2,1}b + a_{2,2}b \\
a_{3,1}b + a_{3,2}b
\end{pmatrix}
$$

We can also represent this with array notation.

$$
\begin{aligned}
C[0, 0] &= A[0, 0] \times b \\
C[1, 0] &= A[1, 0] \times b \\
C[2, 0] &= A[2, 0] \times b \\
C[0, 1] &= A[0, 1] \times b \\
C[1, 1] &= A[1, 1] \times b \\
C[2, 1] &= A[2, 1] \times b
\end{aligned}
$$

Multiplying a matrix by a scalar is a basic operation in linear algebra. This operation has several practical applications, including:

- **Scaling**: In computer graphics, it is used to change the size of objects represented by matrices.
- **Intensity adjustment**: In image processing, it can be used to adjust the brightness of an image by multiplying each pixel value (represented in a matrix) by a scalar.
- **Normalization**: In statistics and machine learning, it is used to normalize data by multiplying by a scaling factor.

In [93]:
# matrix-scalar multiplication
from numpy import array
# define matrix
A = array([
    [1, 2],
    [3, 4],
    [5, 6]
])
print(A)
# define scalar
b = 0.5
print(b)
# multiply
C = A * b
print(C)

[[1 2]
 [3 4]
 [5 6]]
0.5
[[0.5 1. ]
 [1.5 2. ]
 [2.5 3. ]]


# 7. Types of Matrix

## 7.1 Square Matrix

A square matrix is a matrix where the number of rows (n) is equivalent to the number of columns (m).

$$
n \equiv m
$$

The square matrix contrasts with the rectangular matrix where the number of rows and columns are not equal. Since the number of rows and columns match, the dimensions are often denoted as n, for example, n × n. The size of the matrix is called the order, so a square matrix of order 4 is 4 × 4. The vector of values along the diagonal of the matrix from the top left to the bottom right is called the main diagonal. Below is an example of a square matrix of order 3.

$$
M = \begin{pmatrix}
1 & 2 & 3 \\
1 & 2 & 3 \\
1 & 2 & 3
\end{pmatrix}
$$

Square matrices are easily added and multiplied with each other and are the basis of many simple linear transformations, such as rotations (like in image rotations).

## 7.2 Symmetric Matrix

A symmetric matrix is a type of square matrix where the upper right triangle is the same as the lower left triangle.

> It is not an exaggeration to say that symmetric matrices $S$ are the most important matrices the world will ever see, both in the theory of linear algebra and its applications.
>
> — Page 338, _Introduction to Linear Algebra_, Fifth Edition, 2016.

To be symmetric, the axis of symmetry is always the main diagonal of the matrix, from the top left to the bottom right. Below is an example of a 5 × 5 symmetric matrix.

$$
M = \begin{pmatrix}
1 & 2 & 3 & 4 & 5 \\
2 & 1 & 2 & 3 & 4 \\
3 & 2 & 1 & 2 & 3 \\
4 & 3 & 2 & 1 & 2 \\
5 & 4 & 3 & 2 & 1
\end{pmatrix}
$$

A symmetric matrix is always square and equal to its own transpose. Transposition is an operation that swaps the number of rows and columns. This will be explained in more detail in the next lesson.

$$
M = M^T
$$

## 7.3 Triangular Matrix

A triangular matrix is a type of square matrix that has all the values in the upper right or lower left part of the matrix, with the remaining elements filled with zeros. A triangular matrix with values only above the main diagonal is called an upper triangular matrix. Conversely, a triangular matrix with values only below the main diagonal is called a lower triangular matrix. Below is an example of a 3 × 3 upper triangular matrix.

$$
M = \begin{pmatrix}
1 & 2 & 3 \\
0 & 2 & 3 \\
0 & 0 & 3
\end{pmatrix}
$$

Below is an example of a 3 × 3 lower triangular matrix.

$$
M = \begin{pmatrix}
1 & 0 & 0 \\
1 & 2 & 0 \\
1 & 2 & 3
\end{pmatrix}
$$

NumPy provides functions to compute a triangular matrix from an existing square matrix. The `tril()` function computes the lower triangular matrix from a given matrix, and the `triu()` function computes the upper triangular matrix from a given matrix. The example below defines a 3 × 3 square matrix and computes the lower and upper triangular matrices from it.

In [94]:
# triangular matrices
from numpy import array
from numpy import tril
from numpy import triu
# define square matrix
M = array([
    [1, 2, 3],
    [1, 2, 3],
    [1, 2, 3]
])
print(M)
# lower triangular matrix
lower = tril(M)
print(lower)
# upper triangular matrix
upper = triu(M)
print(upper)

[[1 2 3]
 [1 2 3]
 [1 2 3]]
[[1 0 0]
 [1 2 0]
 [1 2 3]]
[[1 2 3]
 [0 2 3]
 [0 0 3]]


## 7.4 Diagonal matrix 

A diagonal matrix is one where values outside of the main diagonal have a zero value, where the main diagonal is taken from the top left of the matrix to the bottom right. A diagonal matrix is often denoted with the variable $D$ and may be represented as a full matrix or as a vector of values on the main diagonal.

> Diagonal matrices consist mostly of zeros and have non-zero entries only along the main diagonal.
>
> — Page 40, _Deep Learning_, 2016.

Below is an example of a 3 × 3 square diagonal matrix.

$$
D = \begin{pmatrix}
1 & 0 & 0 \\
0 & 2 & 0 \\
0 & 0 & 3
\end{pmatrix}
$$

As a vector, it would be represented as:

$$
d = \begin{pmatrix}
d_{1,1} \\
d_{2,2} \\
d_{3,3}
\end{pmatrix}
$$

Or, with the specified scalar values:

$$
d = \begin{pmatrix}
1 \\
2 \\
3
\end{pmatrix}
$$

A diagonal matrix does not have to be square. In the case of a rectangular matrix, the diagonal would cover the dimension with the smallest length; for example:

$$
D = \begin{pmatrix}
1 & 0 & 0 & 0 \\
0 & 2 & 0 & 0 \\
0 & 0 & 3 & 0 \\
0 & 0 & 0 & 4
\end{pmatrix}
$$

NumPy provides the function `diag()` that can create a diagonal matrix from an existing matrix, or transform a vector into a diagonal matrix. The example below defines a 3 × 3 square matrix, extracts the main diagonal as a vector, and then creates a diagonal matrix from the extracted vector.

In [95]:
# diagonal matrix
from numpy import array
from numpy import diag
# define matrix
A = array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])
print(A)
# extract diagonal
d = diag(A)
print(d)
# create diagonal matrix
D = diag(d)
print(D)

[[1 2 3]
 [4 5 6]
 [7 8 9]]
[1 5 9]
[[1 0 0]
 [0 5 0]
 [0 0 9]]


## 7.5 Identity Matrix

An identity matrix is a square matrix that does not change a vector when multiplied. The values of an identity matrix are known. All of the scalar values along the main diagonal (top-left to bottom-right) have the value one, while all other values are zero.

> An identity matrix is a matrix that does not change any vector when we multiply that vector by that matrix.
>
> — Page 36, _Deep Learning_, 2016.

An identity matrix is often represented using the notation $I$ or with the dimensionality $I^n$, where $n$ is a subscript that indicates the dimensionality of the square identity matrix. In some notations, the identity may be referred to as the unit matrix, or $U$, to honor the one value it contains (this is different from a Unitary matrix). For example, an identity matrix with the size 3 or $I^3$ would be as follows:

$$
I = \begin{pmatrix}
1 & 0 & 0 \\
0 & 1 & 0 \\
0 & 0 & 1
\end{pmatrix} \tag{10.11}
$$


In [96]:
# identity matrix
from numpy import identity
I = identity(3)
print(I)

[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]


## 7.6 Identity Matrix

Two vectors are orthogonal when their dot product equals zero. The length of each vector is 1 then the vectors are called orthonormal because they are both orthogonal and normalized.

$$
v \cdot w = 0
$$

or

$$
v \cdot w^T = 0
$$

This is intuitive when we consider that one line is orthogonal with another if it is perpendicular to it. An orthogonal matrix is a type of square matrix whose columns and rows are orthonormal unit vectors, e.g., perpendicular and have a length or magnitude of 1.

> An orthogonal matrix is a square matrix whose rows are mutually orthonormal and whose columns are mutually orthonormal
>
> — Page 41, _Deep Learning_, 2016.

An Orthogonal matrix is often denoted as uppercase $Q$.

> Multiplication by an orthogonal matrix preserves lengths.
>
> — Page 277, _No Bullshit Guide To Linear Algebra_, 2017.

The Orthogonal matrix is defined formally as follows:

$$
Q^T \cdot Q = Q \cdot Q^T = I
$$

Where $Q$ is the orthogonal matrix, $Q^T$ indicates the transpose of $Q$, and $I$ is the identity matrix. A matrix is orthogonal if its transpose is equal to its inverse.

$$
Q^T = Q^{-1}
$$

Another equivalence for an orthogonal matrix is if the dot product of the matrix and itself equals the identity matrix.

$$
Q \cdot Q^T = I
$$

Orthogonal matrices are used a lot for linear transformations, such as reflections and permutations. A simple 2 × 2 orthogonal matrix is listed below, which is an example of a reflection matrix or coordinate reflection.

$$
Q = \begin{pmatrix}
1 & 0 \\
0 & -1
\end{pmatrix}
$$

The example below creates this orthogonal matrix and checks the above equivalences.

Running the example first prints the orthogonal matrix, the inverse of the orthogonal matrix, and the transpose of the orthogonal matrix are then printed and are shown to be equivalent. Finally, the identity matrix is printed which is calculated from the dot product of the orthogonal matrix with its transpose.

In [97]:
# orthogonal matrix
from numpy import array
from numpy.linalg import inv
# define orthogonal matrix
Q = array([
    [1, 0],
    [0, -1]
])
print(Q)
# inverse equivalence
V = inv(Q)
print(Q.T)
print(V)
# identity equivalence
I = Q.dot(Q.T)
print(I)


[[ 1  0]
 [ 0 -1]]
[[ 1  0]
 [ 0 -1]]
[[ 1.  0.]
 [-0. -1.]]
[[1 0]
 [0 1]]


Note, sometimes a number close to zero can be represented as -0 due to the rounding of
floating point precision. Just take it as 0.0. Orthogonal matrices are useful tools as they are
computationally cheap and stable to calculate their inverse as simply their transpose.

# 8. Matrix Operations

## 8.1 Transpose 

A defined matrix can be transposed, which creates a new matrix with the number of columns and rows flipped. This is denoted by the superscript $T$ next to the matrix $A^T$.

$$
C = A^T
$$

An invisible diagonal line can be drawn through the matrix from top left to bottom right on which the matrix can be flipped to give the transpose.

$$
A = \begin{pmatrix}
1 & 2 \\
3 & 4 \\
5 & 6
\end{pmatrix}
$$

$$
A^T = \begin{pmatrix}
1 & 3 & 5 \\
2 & 4 & 6
\end{pmatrix}
$$

The operation has no effect if the matrix is symmetrical, e.g. has the same number of columns and rows and the same values at the same locations on both sides of the invisible diagonal line.

The columns of $A^T$ are the rows of $A$.

> — Page 109, _Introduction to Linear Algebra_, Fifth Edition, 2016.

We can transpose a matrix in NumPy by calling the T attribute.

In [98]:
# transpose matrix
from numpy import array
# define matrix
A = array([
    [1, 2],
    [3, 4],
    [5, 6]
])
print(A)
# calculate transpose
C = A.T
print(C)

[[1 2]
 [3 4]
 [5 6]]
[[1 3 5]
 [2 4 6]]
