# **Chapter 4 Basic Mathematics**
As previously discussed, mathematics is one of the three crucial components of data science; it is essential to understand the fundamental mathematical principles and concepts that underpin the field. In this chapter, we will cover key mathematical topics including the following:<br>
• Basic symbols(vector and matrix)/terminology<br>
• Logarithms/exponents<br>
• Set theory<br>

# 1.Basic symbols/terminology
## 1.1 Vectors and matrices
### Vector
**Definition**: An array **x** of *n* real numbers $( x_1, \dots, x_n )$ is called a **vector**, and it is written as
$$
\mathbf{x} =
\begin{bmatrix}
x_1 \\
x_2 \\
\vdots \\
x_n
\end{bmatrix}
\quad \text{or} \quad
\mathbf{x}' = [x_1, x_2, \cdots, x_n]
$$

-  Here, the prime denotes the operation of **transposing a column to a row**.

- The number *n* is referred to as the **dimension** of the vector x.

- $ x_i$ is called a **componet** of vector x.

- In genral, vector is an object with both magnitude and direction, which is represented using an arrow or bold font: $\vec{x }$  or $\mathbf{ x}$ .

**Vector Addition**

$$
\mathbf{x} + \mathbf{y} =
\begin{bmatrix}
x_1 \\
x_2 \\
\vdots \\
x_n
\end{bmatrix}
+
\begin{bmatrix}
y_1 \\
y_2 \\
\vdots \\
y_n
\end{bmatrix}
=
\begin{bmatrix}
x_1 + y_1 \\
x_2 + y_2 \\
\vdots \\
x_n + y_n
\end{bmatrix}.
$$

**Scale a vector x by multiplying it by a constant c**
$$
\mathbf{x} =
\begin{bmatrix}
x_1 \\
x_2 \\
\vdots \\
x_n
\end{bmatrix}
\Rightarrow
c\mathbf{x} =
\begin{bmatrix}
c x_1 \\
c x_2 \\
\vdots \\
c x_n
\end{bmatrix}.
$$


**Inner Product**: we have $
\mathbf{x}' = [x_1, x_2, \cdots, x_n]$ and $\mathbf{y}' = [y_1, y_2, \cdots, y_n]$, then the inner product is defiend as:


$$
\langle \mathbf{x}, \mathbf{y} \rangle = \sum_{i=1}^{n} x_i y_i = \langle \mathbf{y}, \mathbf{x} \rangle
$$



For any constant $c$:

$$
\langle c\mathbf{x}, \mathbf{y} \rangle = c \langle \mathbf{x}, \mathbf{y} \rangle
$$


**Example**

Vectors give us a simple way of storing multiple dimensions of a single data point/observation. If we measure the average satisfaction rating (0-100) of employees in three departments of a company as being 57 for HR, 89 for engineering, and 94 for management, we can represent this as a vector with the following formula:
$$
\mathbf{x} = \begin{pmatrix}x_1 \\ x_2 \\ x_3 \end{pmatrix} = \begin{pmatrix}57 \\ 89 \\ 94 \end{pmatrix}
$$



### Matrix
**Definition**: a matrix is a two-dimensional representation of arrays of numbers, denoted by a capital, bold-faced letter. Matrix $A \in \mathbb{R}^{m \times n}$, of **dimension** $m \times n$, is just a table of (real) numbers. It is written as:

$$
A =
\begin{bmatrix}
a_{1,1} & a_{1,2} & \cdots & a_{1,n} \\
a_{2,1} & a_{2,2} & \cdots & a_{2,n} \\
\vdots & \vdots & \ddots & \vdots \\
a_{m,1} & a_{m,2} & \cdots & a_{m,n}
\end{bmatrix}
$$

- $a_{i,j}$ is called the $(i,j)$-entry of the matrix $A$.

- dimension tells us that the matrix has m rows and n columns.




**Transpose**: The transpose of  $A \in \mathbb{R}^{n \times n}$ is denoted as $A'$, which is an $n \times m$ matrix, defined by:

$$
A' =
\begin{bmatrix}
a_{1,1} & a_{2,1} & \cdots & a_{m,1} \\
a_{1,2} & a_{2,2} & \cdots & a_{m,2} \\
\vdots & \vdots & \ddots & \vdots \\
a_{1,n} & a_{2,n} & \cdots & a_{m,n}
\end{bmatrix}
$$




**Square Matrix**: A matrix $A \in \mathbb{R}^{m \times n}$ is called a **square matrix** if \( m = n \).

$$
A =
\begin{bmatrix}
a_{1,1} & a_{1,2} & \cdots & a_{1,n} \\
a_{2,1} & a_{2,2} & \cdots & a_{2,n} \\
\vdots & \vdots & \ddots & \vdots \\
a_{n,1} & a_{n,2} & \cdots & a_{n,n}
\end{bmatrix}
$$

- A square matrix $ A_{m \times m} $ is **invertible** if there exists some other matrix $ B $ such that $ AB = I $.  Whenever $AB = I$, we always have $BA = I$; vice versa. In this case, we write $A^{-1} = B$.  
- Not all square matrices have an inverse — for example, the matrix with all its entries equal to zero is **not** invertible.

**A square matrix** $A \in \mathbb{R}^{n \times n}$ is **invertible** if and only if any of the following hold:

-  $\det(A) \neq 0 $
- $\text{rank}(A) = n$
- Columns of $A$ are linearly independent
- The homogeneous equation $ A\mathbf{x} = 0 $ has only the trivial solution
- $ A$ is row-equivalent to the identity matrix $I_n$
- All eigenvalues of $A$ are nonzero

**Identity Matrix**: matrix $I_{m \times m}$ is called the **identity matrix**, if $I_{i,j} = 1$ for $i = j$ and $I_{i,j} = 0$ for $i \ne j$.

$$
A =
\begin{bmatrix}
1 & 0 & \cdots & 0 \\
0 & 1 & \cdots & 0 \\
\vdots & \vdots & \ddots & \vdots \\
0 & 0 & \cdots & 1
\end{bmatrix}
$$

**Determinant**:  the determinant is a scalar-valued function of the entries of a square matrix. The determinant of a matrix $A$ is commonly denoted  $\det(A)$, or $|A|$.

- We have: $\det(AB) = \det(A)\det(B)$
- The matrix $A$ is invertible if and only if $\det(A) \ne 0$.
- The determinant of a $2 \times 2$ matrix is
$$
\begin{vmatrix}
a & b \\
c & d
\end{vmatrix}
= ad - bc
$$

- The determinant of a $3 \times 3$ matrix is

$$
\begin{vmatrix}
a & b & c \\
d & e & f \\
g & h & i
\end{vmatrix}
= aei + bfg + cdh - ceg - bdi - afh
$$

**Eigenvalues**: $\lambda$ is an eigenvalue of $A$ if and only if

$$
\det(A - \lambda I) = 0
$$
**Eigenelements**

In **linear algebra**, an **eigenvector**  or **characteristic vector** is a **vector** that has its **direction unchanged** (or reversed) by a given **linear transformation**. More precisely, an eigenvector **v** of a linear transformation **T** is **scaled by a constant factor** λ when the transformation is applied:

$$
T\mathbf{v} = \lambda \mathbf{v}
$$

The corresponding **eigenvalue**, **characteristic value**, or **characteristic root** is the multiplying factor **λ**, which may be a **negative** or **complex** number.

**Eigensystem**

An **eigensystem** of a square matrix $A \in \mathbb{R}^{n \times n}$ consists of its **eigenvalues** and the corresponding **eigenvectors**.


**Eigenvector**: Each eigenvalue $\lambda$ corresponds to at least one eigenvector $x$, such that

$$
Ax = \lambda x
$$

- When we talk about an eigenvector, we mean that it is not zero.
- When we talk about eigenvalues of $A$, $A$ must be a square matrix.





**Matrix Addition**:
Let $A = A_{m \times n},\ B = B_{m \times n}$, then $A + B$ is an $m \times n$ matrix with $(A + B)_{i,j} = a_{i,j} + b_{i,j}$， Matrix $A - B$ is similarly defined.

$$
(\mathbf{A} + \mathbf{B})_{i,j} = a_{i,j} + b_{i,j}, \quad 1 \le i \le m, \; 1 \le j \le n
$$

$$
(\mathbf{A} - \mathbf{B})_{i,j} = a_{i,j} - b_{i,j}, \quad 1 \le i \le m, \; 1 \le j \le n
$$

For example,

$$
\begin{bmatrix}
1 & 3 & 1 \\
1 & 0 & 0
\end{bmatrix}
+
\begin{bmatrix}
0 & 0 & 5 \\
7 & 5 & 0
\end{bmatrix}
=
\begin{bmatrix}
1+0 & 3+0 & 1+5 \\
1+7 & 0+5 & 0+0
\end{bmatrix}
=
\begin{bmatrix}
1 & 3 & 6 \\
8 & 5 & 0
\end{bmatrix}
$$


**Example**

Revisiting our previous example, if we have three offices in different locations, each with the same three departments: HR, engineering, and management.

First, we could make three different vectors, each holding a different office’s satisfaction scores:

$$
x = \begin{pmatrix} 57 \\ 89 \\ 94 \end{pmatrix},
y = \begin{pmatrix} 67 \\ 87 \\ 94 \end{pmatrix},
z = \begin{pmatrix} 65 \\ 98 \\ 60 \end{pmatrix}
$$

Second, let’s make a matrix where each row represents a different department and each column represents a different office, as shown in *Table 1.1*:

<p align="center">
  <img src="https://drive.google.com/uc?id=1qrZuR5xQJwmQguoIH5NYJLaW2f06Gm6g" width="600">
  <br>
  <em>Figure 1.1 – Some sample data we want to model as a matrix</em>
</p>




Third, let’s strip away the labels; we’ll be left with a matrix:

$$
X = \begin{pmatrix}
57 & 67 & 65 \\
89 & 87 & 98 \\
94 & 94 & 60
\end{pmatrix}
$$

**Quick exercises**

The following is a list of quick exercises you can do to understand matrices better:

1. If we added a fourth office, would we need a new row or column?
2. What would the dimension of the matrix be after we added the fourth office?
3. If we eliminate the management department from the original X matrix, what would the dimension of the new matrix be?
4. What is the general formula to find out the number of elements in the matrix?

**Answers**

Here are the answers:

1. Column
2. 3 x 4
3. 2 x 3
4. *n × m* (*n* being the number of rows and *m* being the number of columns)

In [None]:
import numpy as np

# use a Python list to represent it
a = [3, 6, 8]

# define a vector by array
x = np.array([3, 6, 8])

# define a matrix
X = np.array([
    [57, 67, 65],
    [89, 87, 98],
    [94, 94, 60]
])

# Transpose the matrix (swap rows and columns)
transposed = X.T
print("Transposed Matrix:")
print(transposed)

print("Vector x:", x)

print("Matrix X:")
print(X)

# Matrix Addition
## Define another matrix Y with the same shape as X
Y = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

sum_matrix = X + Y
print("\nMatrix Addition (X + Y):")
print(sum_matrix)

# Scalar Multiplication
scalar = 2
scaled_matrix = scalar * X
print(f"\nScalar Multiplication ({scalar} * X):")
print(scaled_matrix)

**Matrix multiplication**:  
- Let $A = A_{m \times n},\ B = B_{n \times p}$, then $AB$ is an $m \times p$ matrix with  
  $$(AB)_{i,j} = \sum_{k=1}^{n} a_{i,k} b_{k,j}$$

- Let $c$ be a constant, then $cA$ is an $m \times n$ matrix with  
  $$(cA)_{i,j} = c a_{i,j}$$

- Let $A = A_{m \times n}$ be a matrix and $x = x_{n \times 1}$ be a vector, then $Ax$ is an $m \times 1$ vector with  
  $$(Ax)_i := a_{i,1}x_1 + a_{i,2}x_2 + \cdots + a_{i,n}x_n$$

We should consider:

- The multiplication of matrices is not commutative, meaning that the order in which you multiply matrices matters a great deal.

- To multiply matrices, their dimensions must match up. This means that the first matrix must have the same number of columns as the second matrix has rows.

**Example 1: Matrix–Matrix Multiplication**

Let  
$$
A = \begin{bmatrix}
1 & 2 & 3 \\
4 & 5 & 6 \\
7 & 8 & 9
\end{bmatrix}, \quad
B = \begin{bmatrix}
9 & 8 & 7 \\
6 & 5 & 4 \\
3 & 2 & 1
\end{bmatrix}
$$

Then  
$$
AB = \begin{bmatrix}
1\cdot9 + 2\cdot6 + 3\cdot3 & 1\cdot8 + 2\cdot5 + 3\cdot2 & 1\cdot7 + 2\cdot4 + 3\cdot1 \\
4\cdot9 + 5\cdot6 + 6\cdot3 & 4\cdot8 + 5\cdot5 + 6\cdot2 & 4\cdot7 + 5\cdot4 + 6\cdot1 \\
7\cdot9 + 8\cdot6 + 9\cdot3 & 7\cdot8 + 8\cdot5 + 9\cdot2 & 7\cdot7 + 8\cdot4 + 9\cdot1
\end{bmatrix}
=
\begin{bmatrix}
30 & 24 & 18 \\
84 & 69 & 54 \\
138 & 114 & 90
\end{bmatrix}
$$

**Example 2: Scalar–Matrix Multiplication**

Let  
$$
c = -2, \quad
A = \begin{bmatrix}
1 & -2 & 0 \\
3 & 5 & -1 \\
4 & 0 & 6
\end{bmatrix}
$$

Then  
$$
cA = -2 \cdot A = \begin{bmatrix}
-2 & 4 & 0 \\
-6 & -10 & 2 \\
-8 & 0 & -12
\end{bmatrix}
$$

**Example 3: Matrix–Vector Multiplication**

Let  
$$
A = \begin{bmatrix}
1 & 0 & 2 \\
-1 & 3 & 1 \\
2 & 2 & 2
\end{bmatrix}, \quad
x = \begin{bmatrix}
2 \\
1 \\
0
\end{bmatrix}
$$

Then  
$$
Ax = \begin{bmatrix}
1\cdot2 + 0\cdot1 + 2\cdot0 \\
-1\cdot2 + 3\cdot1 + 1\cdot0 \\
2\cdot2 + 2\cdot1 + 2\cdot0
\end{bmatrix}
=
\begin{bmatrix}
2 \\
1 \\
6
\end{bmatrix}
$$

**Example 4**

Then go back to the movie recommendation example. The user’s movie genre preferences of comedy, romance, and action, illustrated as follows:

$$
U = userprefs = \begin{pmatrix}5 \\ 1 \\ 3\end{pmatrix}
$$

Suppose we have 10,000 movies all with a rating for these three categories, and we have a movie matrix as shown below. To make a recommendation, we need to take the dot product of the preference vector with each of the 10,000 movies.

$$
M = movies = 3 \times 10,000
$$

Now we have two matrices: one is $3 \times 1$ and the other is $3 \times 10,000$. We can’t multiply these matrices as they are because the dimensions do not work out.

We can take the **transpose** of the matrix, which will switch the dimensions around:

$$
U^T = \text{transpose of } U = (5\ 1\ 3)
$$

So, now, we have two matrices that can be multiplied together. Let’s visualize what this looks like:

$$
\begin{pmatrix}5 & 1 & 3\end{pmatrix} \cdot
\begin{pmatrix}
4 & 5 & \cdots \\
1 & 4 & \cdots \\
5 & 1 & \cdots
\end{pmatrix} \quad 3 \times 1,000
$$

The resulting matrix will be a $1 \times 1,000$ matrix (a vector) of 10,000 predictions for each movie.



In [None]:
import numpy as np
import time
# create user preferences
user_pref = np.array([[5, 1, 3]])

# create a random movie matrix of 10,000 movies
# Note that randint will make random integers from 0-4, I added a 1 at the end to increase the scale from 1-5
movies = np.random.randint(5, size=(3, 10000)) + 1

print(user_pref.shape) # (1, 3)
print(movies.shape) # (3, 1000)

#calculate preference by matrix multiplication
reference = np.dot(user_pref, movies)
print(reference.shape)

for num_movies in (10000, 100000, 1000000, 10000000, 100000000):
    movies = np.random.randint(5, size=(3, num_movies)) + 1
    now = time.time()
    np.dot(user_pref, movies)
    elapsed = time.time() - now
    print(f"{elapsed:.6f} seconds to run {num_movies} movies")

In [None]:
import numpy as np

# define a vector
x = np.array([3, 6, 8])

# define a matrix
X = np.array([
    [57, 67, 65],
    [89, 87, 98],
    [94, 94, 60]
])

# transpose the matrix
transposed = X.T
print("Transposed matrix:\n", transposed)
print("Vector x:\n", x)
print("Matrix X:\n", X)

#np.dot does both dot products and matrix multiplication
##dot product for vectors
y = np.array([1, 0, -1])
dot = np.dot(x, y)
print("Dot product of x and y:", dot)

##vector and matrix
result = np.dot(X, x)
print("Matrix X multiplied by vector x:\n", result)

##two matrices
A = np.array([[1, 2], [3, 4], [5, 6]])   # 3×2
B = np.array([[1, 2, 3], [4, 5, 6]])     # 2×3
C = np.dot(A, B)                         # 3×3
print("Matrix multiplication A * B:\n", C)

#matrix M
##inverse matrix
M = np.array([[2, 1], [5, 3]])
inv = np.linalg.inv(M)
print("Inverse of M:\n", inv)

##identity matrix
I = np.eye(2)
print("Identity matrix:\n", I)

##determinant
det = np.linalg.det(M)
print("Determinant of M:", det)

##eigenvalue
eigenvalues, eigenvectors = np.linalg.eig(M)
print("Eigenvalues:", eigenvalues)
print("Eigenvectors:\n", eigenvectors)

Transposed Matrix:
[[57 89 94]
 [67 87 94]
 [65 98 60]]
Vector x: [3 6 8]
Matrix X:
[[57 67 65]
 [89 87 98]
 [94 94 60]]

Matrix Addition (X + Y):
[[ 58  69  68]
 [ 93  92 104]
 [101 102  69]]

Scalar Multiplication (2 * X):
[[114 134 130]
 [178 174 196]
 [188 188 120]]


##1.2 Arithmetic symbols

In this section, we will go over some symbols associated with basic arithmetic that appear in most, if not all, data science tutorials and books.

### 1.2.1 Summation

**Definition**: The uppercase **sigma, Σ**, symbol is a universal symbol for addition. Whatever is to the right of the sigma symbol is usually something iterable, meaning that we can go over it one by one.

**Example 1**

Let’s create the representation of a vector, X = [1, 2, 3, 4, 5].
To find the sum of the content, we can use the following formula:

$$
\sum_i X_i = 15
$$

**Example 2**

The formula for calculating the mean of a series of numbers is quite common. If we have a vector (x) of length n, the mean of the vector can be calculated as follows:

$$
mean = \frac{1}{n} \sum_i X_i = 3
$$

This means that we will add up each element of *x*, denoted by *xi*, and then divide by *n* (the length of the vector).


###1.2.2 Dot Product/Inner Product
**Definition**: The **inner product** or **dot product** of $ \mathbb{R}^n $ is a function $ \langle \cdot, \cdot \rangle$ defined by:

$$
\langle u, v \rangle = a_1 b_1 + a_2 b_2 + \cdots + a_n b_n
$$

$$
\quad \text{for} \quad u = [a_1, a_2, \ldots, a_n]^T, \, v = [b_1, b_2, \ldots, b_n]^T \in \mathbb{R}^n.
$$

The inner product $\langle \cdot, \cdot \rangle$ satisfies the following properties:

1. **Linearity**: $\langle au + bv, w \rangle = a \langle u, w \rangle + b \langle v, w \rangle $
2. **Symmetric Property**: $ \langle u, v \rangle = \langle v, u \rangle $
3. **Positive Definite Property**: For any $u \in V $, $ \langle u, u \rangle \geq 0$; and $\langle u, u \rangle = 0$ if and only if $u = 0$



> The author wrote dot product here. The dot product is specific to real vectors in Euclidean space, while the inner product generalizes to broader contexts like complex vectors. We usually use inner product in advanced mathematics.




**Example 1**
$$
\begin{pmatrix}3 \\ 7\end{pmatrix} \cdot \begin{pmatrix}9 \\ 5\end{pmatrix} = 3*9 + 7*5 = 62
$$


**Example 2**

 Let’s say we have a vector that represents a customer’s sentiments toward three genres of movies: comedy, romance, and action. On a scale of 1 to 5, a customer loves comedies, hates romantic movies, and is fine with action movies. We might represent this as follows:

$$
\begin{pmatrix}5 \\ 1 \\ 3\end{pmatrix}
$$

Here, 5 denotes their love for comedies, 1 denotes their hatred of romantic movies, and 3 denotes the customer’s indifference toward action movies.

Now, let’s assume that we have two new movies, one of which is a romantic comedy and the other is a funny action movie. The movies would have their own vector of qualities, as shown here:

$$
m1 = \begin{pmatrix}4 \\ 5 \\ 1\end{pmatrix},
m2 = \begin{pmatrix}5 \\ 1 \\ 5\end{pmatrix}
$$

Here, *m1* is a romantic comedy and *m2* is a funny action movie.

To make a recommendation, we must apply the dot product between the customer’s preferences for each movie. The higher value will win and be recommended to the user.

Let’s compute the recommendation score for each movie. For movie 1, we want to compute the following:

$$
\begin{pmatrix}5 \\ 1 \\ 3\end{pmatrix} \cdot \begin{pmatrix}4 \\ 5 \\ 1\end{pmatrix}
$$

We can think of this problem as follows:

(There is a typo in this figure, **"user doesn't love romance"**)


<p align="center">
  <img src="https://drive.google.com/uc?id=1HkLyltbJUXdlhh-iJwX1XzaLmhSNOc8z" alt="Figure 1.2 – How to interpret a dot product" width="600">
  <br>
  <em>Figure 1.2 – How to interpret a dot product</em>
</p>
Now let's explain the score in detail. The best score anyone can ever get is when all values are 5, making the outcome as follows:

$$
\begin{pmatrix}5 \\ 5 \\ 5\end{pmatrix} \cdot \begin{pmatrix}5 \\ 5 \\ 5\end{pmatrix} = 5^2 + 5^2 + 5^2 = 75
$$

The lowest possible score is when all values are 1, as shown here:

$$
\begin{pmatrix}1 \\ 1 \\ 1\end{pmatrix} \cdot \begin{pmatrix}1 \\ 1 \\ 1\end{pmatrix} = 1^2 + 1^2 + 1^2 = 3
$$

So, we must think about 28 on a scale from 3 to 75. The number 28 is closer to 3 than it is to 75. Let’s try this for movie 2:

$$
\begin{pmatrix}5 \\ 1 \\ 3\end{pmatrix} \cdot \begin{pmatrix}5 \\ 1 \\ 5\end{pmatrix} = (5*5) + (1*1) + (3*5) = 41
$$


This is higher than 28. So we would recommend movie 2 to our user.

This is how most movie prediction engines work. They build a customer profile, which is represented as a vector. They then take a vector representation of each movie they have to offer, combine them with the customer profile (perhaps with a dot product), and make recommendations from there.


#2 Exponents/Logarithms
##2.1 Exponents
**Definition**: an exponent tells you how many times you have to multiply a number by itself. In mathematics, exponentiation, denoted $b^n$, is an operation involving two numbers: the **base** $b$, and the **exponent** or **power** $n$. When $n$ is a positive integer, exponentiation corresponds to repeated multiplication of the base:

$$
b^n = \underbrace{b \times b \times \cdots \times b}
$$

In particular,
$$
b^1 = b.
$$


<p align="center">
  <img src="https://drive.google.com/uc?id=1_v69DsYLPJG0m9v6k8FToJcwH_TmIzLP" alt="Figure 2.1 – The exponent tells you how many times to multiply a number by itself", width="350">
  <br>
  <em>Figure 2.1 – The exponent tells you how many times to multiply a number by itself</em>
</p>

**Exponet Law**
- **Product**: $$a^m \times a^n = a^{m+n}$$
- **Quotient**: $$\frac{a^m}{a^n} = a^{m-n}$$
- **Zero Exponent**: $$a^0 = 1$$
- **Negative Exponent**: $$ a^{-m} = \frac{1}{a^m}$$
- **Power of a Power**: $$(a^m)^n = a^{mn}$$
- **Power of a Product**: $$(ab)^m = a^m b^m$$
- **Power of a Quotient**:$$\left(\frac{a}{b}\right)^m = \frac{a^m}{b^m}$$


## 2.2 Logarithm
**Definition**: A logarithm is a number that answers the question “What exponent gets me from the base to this other number?” In mathematics, the logarithm of a number is the exponent by which another fixed value, the base, must be raised to produce that number.


<p align="center">
  <img src="https://drive.google.com/uc?id=1N9z-ASK6TlQmvUg1LjxhLWXiat56-KKo" alt="Figure 2.2 – The exponent from Figure 4.3 written in logarithm form", width="350">
  <br>
  <em>Figure 2.2 – The exponent from Figure 4.3 written in logarithm form</em>
</p>


Exponents and logarithms are heavily related. The basic calculation rule is that:

$$
\log_a a^k = k
$$

<p align="center">
  <img src="https://drive.google.com/uc?id=18Qw_K0UL2MSGUHB-2dJ4mY-xYLWFEfow" alt="Figure 2.3 – Logarithms and exponents are the same!", width="350">
  <br>
  <em>Figure 2.3 – Logarithms and exponents are the same!</em>
</p>

**Natural Logarithm**: When we take the logarithm of a number with a base of *e*, it is called a natural logarithm. We can rewrite the logarithm as $
\ln(a)$.

**Logarithmic Laws**

- **Products:**  
  $$\log_b(mn) = \log_b m + \log_b n$$

- **Ratios:**  
  $$\log_b\left(\frac{m}{n}\right) = \log_b m - \log_b n$$

- **Powers:**  
  $$\log_b(n^p) = p \log_b n$$

- **Roots:**  
  $$\log_b\left(\sqrt[q]{n}\right) = \frac{1}{q} \log_b n$$

- **Change of bases:**  
  $$\log_b n = \frac{\log_a n}{\log_a b}$$




**Example 1**

$$ \log_3 81 = 4 \text{ because } 3^4 = 81 $$
$$ \log_5 125 = 3 \text{ because } 5^3 = 125 $$

Let’s rewrite the first equation to note something interesting:

$$
\log_3 81 = 4
$$

Now, let’s replace 81 with the equivalent statement, $3^4$, as follows:

$$
\log_5 125 = 3
$$


Exponents and logarithms are most important when dealing with growth. If a quantity is growing (or declining in growth), an exponent/logarithm can help model this behavior. Let's see in example 2.

**Example 2**

The number *e* is around 2.718 and has many practical applications. Suppose you have \$5,000 deposited in a bank with continuously compounded interest at the rate of 3%. In this case, you can use the following formula to model the growth of your deposit:

$$
A = Pe^{rt}
$$

In this formula, we have the following:

- *A* denotes the final amount  
- *P* denotes the principal investment (5,000)  
- *e* denotes a constant (2.718)  
- *r* denotes the rate of growth (0.03)  
- *t* denotes the time (in years)

> When will our investment double?
>How long would I have to have my money in this investment to achieve 100% growth?


$$
10,000 = 5,000e^{0.03t}
$$

$$
2 = e^{0.03t} \quad \text{(divided by 5,000 on both sides)}
$$

At this point, we have a variable in the exponent that we want to solve. When this happens, we can use the logarithm notation to figure it out:

<p align="center">
  <img src="https://drive.google.com/uc?id=1BO3OE0FACgJVNqUxHM6VM_99dbxv7o4C" alt="Figure 2.4 – The conversion from exponent form to logarithm form", width="350">
  <br>
  <em>Figure 2.4 – The conversion from exponent form to logarithm form</em>
</p>

This leaves us:

$$
\ln(2) = 0.03t
$$

Using a calculator (or Python), we’ll find that $\ln(2) = 0.69$:

$$
0.69 = 0.03t
$$

$$
t = 23.1
$$

This means that it would take *23.1 years* to double our money.

> **There is a typo in book**, $\ln(2) = 0.069$

**Simple Interest**

Simple interest means interest is calculated only on the principal, not on previously earned interest. Simple interest grows linearly over time. The formula is:
$$
A = P + Prt = P(1 + rt)
$$

$$
2*P = P(1 + 0.03t)
$$

$$
t = \frac{1}{r}
$$


In [None]:
import numpy as np
x = np.array([1,2,3,4,5])

#calculate sum
print(sum(x))

#calculate mean
print(sum(x)/len(x))
mean_x = np.mean(x)
print("Mean of x:", mean_x)

# 指数运算 e^x
exp_x = np.exp(x)
print("e^x:", exp_x)

# 自然对数 ln(x)，注意输入必须 > 0
log_x = np.log(x)
print("ln(x):", log_x)

# 以10为底的对数
log10_x = np.log10(x)
print("log10(x):", log10_x)

# 任意底数的对数，利用换底公式 log_b(x) = ln(x)/ln(b)
base = 2
log_base_x = np.log(x) / np.log(base)
print(f"log base {base}:", log_base_x)

##example 2
# paras
P = 5000        # 本金
A_target = 2 * P  # 目标金额是本金的两倍
r = 0.03        # 年利率 (3%)

#A_target = P * np.exp(0.03 * t_double)
# 根据 A = P * e^(rt)，对两边同时取ln得到 t = ln(A/P)/r
t_double = np.log(A_target / P) / r

print(f"Time to double the investment: {t_double:.2f} years")

# Time needed to double investment with simple interest
## From: 2P = P*(1 + r*t) => t = 1/r
t_double_simple = 1 / r

print(f"Time to double the investment at simple interest: {t_double_simple:.2f} years")


15
3.0
Mean of x: 3.0
e^x: [  2.71828183   7.3890561   20.08553692  54.59815003 148.4131591 ]
ln(x): [0.         0.69314718 1.09861229 1.38629436 1.60943791]
log10(x): [0.         0.30103    0.47712125 0.60205999 0.69897   ]
log base 2: [0.         1.         1.5849625  2.         2.32192809]
Time to double the investment: 23.10 years
Time to double the investment at simple interest: 33.33 years


In [None]:
import numpy as np
a = np.log(2)  # 自然对数 ln(2)
print(f"ln(2) = {a:.4f}")

import pandas as pd

data = {
    "Office 1": [57, 89, 94],
    "Office 2": [67, 87, 84],
    "Office 3": [65, 98, 60]
}

df = pd.DataFrame(data, index=["HR", "Engineering", "Management"])
df


ln(2) = 0.6931


#3 Set theory

Set theory involves mathematical operations at the set level. It is sometimes thought of as a basic fundamental group of theorems that governs the rest of mathematics.
## 3.1 Theory
**Definition**: A set is a collection of distinct objects. A set can be thought of as a list in Python but with no repeat objects.

**Magnitude/Cardinality**: The magnitude of a set is the number of elements in the set and is represented as $| A |$. For a finite set, the cardinality of a set is the number of members it contains.

For the set $S = \{1, 2, 3\}$ we show cardinality by writing:   
$$
|S| = 3
$$

> The author used magnitude here to denote the size of a set, however, usually we use cardinality to denote the same thing in set theory. It can be translated as 势.

**Empty Set**: the empty set is a set containing no objects. It is written as a pair of curly braces with nothing inside `{}` or by using the symbol `∅`. It has a magnitude of 0.

**Set Membership Symbol**: `∈` is used to say that an object is a member of a set.  It has a partner symbol `∉` which is used to say an object is *not* in a set. For example:
  
$$
S = \{1, 2, 3\}
$$

Then 3 `∈` S and 4 `∉` S.


**Subset/Superset**
- Definition: For two sets $S$ and $T$ we say that **$S$ is a subset of $T$** if each element of $S$ is also an element of $T$. In formal notation $ S \subseteq T$ if for all $x \in S$ we have $x \in T$.

- If $S \subseteq T$ then we also say $T$ contains $S$.
- If $S \subseteq T$ and $S \ne T$, then we write $S \subset T$ and we say $S$ is a **proper subset** of $T$.


- Example 1:
If $A = \{a, b, c\}$ then $A$ has eight different subsets:

$$
\emptyset \quad \{a\} \quad \{b\} \quad \{c\} \\
\{a,b\} \quad \{a,c\} \quad \{b,c\} \quad \{a,b,c\}
$$

- Notice that $ A \subseteq A$ and in fact each set is a subset of itself.  
- The empty set $\emptyset$ is a subset of every set.
   

- Examples 2:

1. A set of even numbers is a subset of all integers  
2. Every set is a subset, but not a proper subset, of itself  
3. A set of all tweets is a superset of English tweets

We now move on to a number of operations on sets. We are already familiar with several operations on numbers such as addition, multiplication, and negation

**Intersection**


- Definition: the intersection of two sets $S$ and $T$ is the collection of all objects that are in both sets.  
It is written $S \cap T$. Using curly brace notation:

$$
S \cap T = \{ x : (x \in S) \text{ and } (x \in T) \}
$$

- This can also be written more compactly using Boolean notation:

$$
S \cap T = \{ x : (x \in S) \land (x \in T) \}
$$

<p align="center">
  <img src="https://drive.google.com/uc?id=1IDzVGiLQfeH61b9fVT0ko-zR_XkxHcms" alt="Intersection", width="300">
  <br>
  <em> Figure 3.1 Intersection </em>
</p>

- Example:
Suppose $S = \{1,2,3,5\}$,  $T = \{1,3,4,5\}$, and $U = \{2,3,4,5\}$, then:  
  - $S \cap T = \{1,3,5\}$
  - $S \cap U = \{2,3,5\}$
  - $T \cap U = \{3,4,5\}$

**Union**
- Definition: the union of two sets $S$ and $T$ is the collection of all objects that are in either set.  
It is written $S \cup T$. Using curly brace notation:

$$
S \cup T = \{ x : (x \in S) \text{ or } (x \in T) \}
$$

- Using Boolean notation:

$$
S \cup T = \{ x : (x \in S) \lor (x \in T) \}
$$


<p align="center">
<img src="https://drive.google.com/uc?id=1sN4OfyIhJcIv6fmIcnng5qaQgrj1HxeI"
alt="Image Description", width="300">
<br>
<em>Figure 3.2 Union </em>
</p>

- Example: Suppose $S = \{1,2,3\}$,  $T = \{1,3,5\}$, and  $U = \{2,3,4,5\}$ then:
 - $S \cup T = \{1,2,3,5\}$
 - $S \cup U = \{1,2,3,4,5\}$
 - $T \cup U = \{1,2,3,4,5\}$


**Difference**

- Definition: The difference of two sets $S$ and $T$ is the collection of objects in $S$ that are not in $T$.  
The difference is written $S − T$. Using curly brace notation:

$$
S - T = \{ x : x \in (S \cap (T^c)) \}
$$


<p align="center">
<img src="https://drive.google.com/uc?id=184icNjfgvCIsc_pyoVG-usjV1FwDu73X" alt="Image Description", width="300">
<br>
<em>Figure 3.3 Set Difference</em>
</p>

- Example: $C$ is the difference set of $S - T$.

$$
S = \{1,2,3,6,7,8\}, \quad T = \{6,7,8\}, \quad C= \{1,2,3\}
$$


**Jaccard measure**
- Definition: it can be defined as the magnitude of the intersection of the two sets divided by the magnitude of the union of the two sets. The Jaccard measure (Jaccard similarity) between A and B sets is defined as follows.

$$
JS(A, B) = \frac{|A \cap B|}{|A \cup B|}
$$

- This gives us a way to quantify similarities between elements represented with sets. Intuitively, the Jaccard measure is a number between 0 and 1, such that when the number is closer to 0, people are more dissimilar, and when the measure is closer to 1, people are considered similar to each other.

## 3.2 Application
In data science, we use sets (and lists) to represent a list of objects and, often, to generalize the behavior of consumers. It is common to reduce a customer to a set of characteristics.

**Example**

Imagine that we are a marketing firm trying to predict where a person wants to shop for clothes, and we have the information as below. We are given a set of clothing brands the user has previously visited, and our goal is to predict a new store that they would also enjoy.


$$
\text{user1} \cap \text{user2} = \{\text{Banana Republic}\}
$$

$$
|\text{user1} \cap \text{user2}| = 1
$$

$$
\text{user1} \cup \text{user2} = \{\text{Banana Republic, Target, Old Navy, Gap, Kohl's}\}
$$

$$
|\text{user1} \cup \text{user2}| = 5
$$

When looking at the similarities between user1 and user2, we should use **Jaccard measure**. Take a look at the jaccard measure:

$$
JS(A, B) = \frac{\text{Number of stores they share in common}}{\text{Unique number of stores they liked combined}}
$$

We can define the similarity between the two users as follows:

$$
\frac{|\text{user1} \cap \text{user2}|}{|\text{user1} \cup \text{user2}|} = \frac{1}{5} = 0.2
$$

Here, the numerator represents the number of stores that the users have in common (in the sense that they like shopping there), while the denominator represents the unique number of stores that they like put together.


In [None]:
import numpy as np
import pandas as pd

s = set()
s = set([1, 2, 2, 3, 2, 1, 2, 2, 3, 2])
# will remove duplicates from a list
print(s == {1, 2, 3})  # True

#In Python, curly braces, { }, can denote a set or a dictionary（key-value pairs).
animals = {"dog": "human's best friend", "cat": "destroyer of world"}
print(animals["dog"])# == "human's best friend"
print(len(animals["cat"])) # == 18
#遍历
for key, value in animals.items():
    print(f"{key}: {value}")
#but if we try to create a pair with the same key as an existing key
animals["dog"] = "Arf"
print(animals)  # {"dog": "Arf", "cat": "destroyer of world"} Dictionaries cannot have two values for one key.

#magnitude of a set
s1 = {1, 2, 3}
print(s1 == {1, 2, 3})  # True
print(len(s1))          # 3

#Suppose user1 and user2 have previously shopped at the following stores:
user1 = {"Target","Banana Republic","Old Navy"}
user2 = {"Banana Republic","Gap","Kohl's"}

#Suppose we are wondering how similar these users are. With the limited information we have,
def jaccard(user1, user2):
    stores_in_common = len(user1 & user2)
    stores_all_together = len(user1 | user2)
    return stores_in_common / stores_all_together
# using our new jaccard function
print(jaccard(user1, user2))  # 输出：0.2

True
human's best friend
18
dog: human's best friend
cat: destroyer of world
{'dog': 'Arf', 'cat': 'destroyer of world'}
True
3
0.2


## Summary

In this chapter, we took a look at some basic mathematical principles that will become very important as we progress through this book. Between logarithms/exponents, matrix algebra, and proportionality, mathematics has a big role not just in analyzing data but in many aspects of our lives.

The coming chapters will take a much deeper dive into two big areas of mathematics: probability and statistics. Our goal will be to define and interpret the smallest and biggest theorems in these two giant fields of mathematics.

It is in these next few chapters that everything will start to come together. So far in this book, we have looked at math examples, data exploration guidelines, and basic insights into types of data. It is time to begin to tie all of these concepts together.