# <ins>**Linear Algebra**</ins>

## **Importance to Data Science**
Data is often represented in vectors and matrices. Linear algebra is the tool to handle and manipulate those. Linear algebra plays an important role in machine learning. For example one of the most simplest and common machine learning algorithm is *linear regression* which uses linear algebra to find best-fit line for predicting outcomes. It's also present in *optimization*, *neural networks*, *image recognition*, *recommendation systems* and many more areas. Knowing linear algebra is essential to computer and data science. 

****

## **Vectors**

### <ins>What are Vectors?</ins>
A vector is essentially an ordered list of numbers. They are used to represent data points, measurements or any kind of numeric information in a structured way. 

### <ins>Characteristics of Vectors</ins>

#### 1. Dimension
- The number of elements in a vector is called its dimension. For example a vector with 3 elements is called a 3-dimensional vector.

#### 2. Notation
- Vectors are often written as a column of numbers like this:

$$
\mathbf{v} = \begin{pmatrix} 1 \\ 2 \\ 3 \end{pmatrix}
$$

&nbsp;&nbsp;&nbsp;&nbsp;or as a row of numbers:

$$
\mathbf{v} = (1, 2, 3)
$$

#### 3. Components
- Each number in the vector is called a component. For example in the vector $\mathbf{v} = (1, 2, 3)$, the components are 1, 2 and 3.

### <ins>Python Examples of Creating Vectors</ins>
In Python we could use NumPy library for creating vectors. Creating a row vector is easy but the problem is that NumPy will treat column vectors like regular 1D arrays unless we explicitly shape it. This would make the column vector a 2D vector.

#### Row Vector Example:

In [1]:
import numpy as np

row_vector = np.array([1, 2, 3])
print(f"Row vector: \n{row_vector}")

Row vector: 
[1 2 3]


#### Column Vector Example:

In [2]:
column_vector = np.array([[1], [2], [3]])
print(f"Column vector: \n{column_vector}")

Column vector: 
[[1]
 [2]
 [3]]


### <ins>How Vectors are Used in Data Science</ins>
Here are some common use cases of vectors in data science:

#### 1. Data Representation:
- Features of a Data point: Each data point in a dataset can be represented as a vector. For example, if you are working with a dataset of bulking pandas where each panda is described by its height (in meters), weight (in kilograms) and age (in years), each panda can be represented as a 3-dimensional vector:

$$
Panda 1 = (1.0, 125.0, 8) \\
Panda 2 = (1.1, 140.0, 15)
$$

#### 2. Operations on Vectors:
- **Addition:** Vectors can be added together by adding their corresponding components. If $a = (1, 2)$ and $b = (3, 4)$, then: 
$$
a + b = (1 + 3, 3 + 4) = (4, 6)
$$

- **Scalar Multiplication:** A vector can be multiplied by a scalar (a single number). You multiply each component with the scalar number. If $\mathbf{v} = (2, 3)$ and the scalar is 4, then:
$$
4\mathbf{v} = 4(2, 3) = (4 \times{} 2, 4 \times{} 3) = (8, 12)
$$ 

#### 3. Distance and Similarity:
- **Euclidean Distance:** The Euclidean distance between two vectors is a measure how far apart they are, the similarity between two data points. Often times distance and similarity are considered separate things. Distance tells you how far apart the vectors are while similarity tells you how similar or aligned the vectors are. Distance ranges from 0 to infinity while similarity can have negative metrics. Choosing which one to use depends on application. For example distance for clustering algorithms and similarity for information retrieval and text analysis. We are not going that deep here yet and focus on simpler things like Euclidean distance. For vectors $a = (x_1, y_1)$ and $b = (x_2, y_2)$, the distance is given by:
$$
Distance = \sqrt{(x_2 - x_ 1)^2 + (y_2 - y_1)^2}
$$

#### 4. Direction and Magnitude:
- **Direction:** The direction of the vector is the way it points in space. This is important in multiple disciplines like physics and engineering but also in understanding the orientation of data points in data science.

- **Magnitude:** In this context magnitude means the measured length of the vector $\mathbf{v} = (x, y)$ in space and is given by:
$$
||\mathbf{v}|| = \sqrt{x^2 + y^2}
$$

&nbsp;&nbsp;&nbsp;&nbsp;and more generally depending on number of dimensions for vector $\mathbf{v} = (v_1, v_2...,v_n)$:

$$
||\mathbf{v}|| = \sqrt{v^2_1 + v^2_2 + ... + v^2_n}
$$

&nbsp;&nbsp;&nbsp;&nbsp;Example: consider $\mathbf{v} = (3, 4)$:

$$
||\mathbf{v}|| = \sqrt{3^2 + 4^4} = \sqrt{9 + 16} = \sqrt{25} = 5
$$

&nbsp;&nbsp;&nbsp;&nbsp;This means the magnitude for the vector is 5.

- **Further Understanding the Magnitude:** In context of vectors, the notation $||\mathbf{v}||$ (read as *"norm of v"* or *"magnitude of v"*) is just a fancy way of saying magnitude. The double vertical bars $|| \; ||$ are used to denote the magnitude of a vector. You do not have to go deeper interpreting it!

### <ins>Python Examples of Vector Operations and Calculations</ins>
For the sake of simplicity we are going to use functions in NumPy and not write everything from a scratch. This is how it would work in real life as well unless you need to build your own custom function. **Do not re-invent the wheel!** Here are few examples of manipulating vectors in python:

#### Addition, Subtraction and Scalar Multiplication:
I know there was nothing about subtraction above but it works the same as addition. Lets use the vectors above, $a = (1, 2)$ and $b = (3, 4)$:

In [3]:
# Define the vectors
a = np.array([1, 2])
b = np.array([3, 4])

# Addition
vectors_added = a + b
print(f"Addition of vectors a and b is: {vectors_added}")

# Subtraction 
vectors_subtracted = a - b
print(f"Subtraction of vectors a and b is: {vectors_subtracted}")

Addition of vectors a and b is: [4 6]
Subtraction of vectors a and b is: [-2 -2]


Next for scalar multiplication we use $\mathbf{v} = (2, 3)$ with the scalar 4:

In [4]:
# Define the vector and scalar
v = np.array([2, 3])
scalar = 4

# Scalar multiplication
vector_multiplied = v * scalar
print(f"Scalar multiplication of vector is: {vector_multiplied}")

Scalar multiplication of vector is: [ 8 12]


#### Euclidean Distance
For this example lets use vectors $c = (1, 2, 3)$ and $d = (4, 5, 6)$:

In [5]:
# Define vectors
c = np.array([1, 2, 3])
d = np.array([4, 5, 6])

# Calculate Euclidean distance
euclidean_distance = np.linalg.norm(c - d)
print(f"Euclidean distance between c and d is: {euclidean_distance}")

Euclidean distance between c and d is: 5.196152422706632


#### Magnitude
For magnitude lets use one of the vectors in the previous example:

In [6]:
magnitude = np.linalg.norm(c)
print(f"Magnitude for vector c is: {magnitude}")

Magnitude for vector c is: 3.7416573867739413


#### Direction a.k.a. Normalization
Now here is something new before the python code. Direction and normalization are the same in most cases. When we talk about the direction of the vector, we are often interested in *unit vector* that points the same direction as the original vector. Thus, the unit vector is obtained from normalizing the vector. The equation for normalization looks something like this: 
$$
\mathbf{\hat{v}} = \frac{\mathbf{v}} {||\mathbf{v}||}
$$

In [7]:
# Using the c vector from above again
direction = c / np.linalg.norm(c)
print(f"Direction of the vector c is: {direction}")

Direction of the vector c is: [0.26726124 0.53452248 0.80178373]


### <ins>Final Notes About Vectors</ins>
By final notes I mean one final example of vector use case in data science. Not going into details but giving this example as an extra. 
- Suppose that we have a dataset of athletic pandas who have scores (from 0 to 100) of two performances: Jumping and Running.

$$
Panda 1 = (85, 78) \\
Panda 2 = (92, 88) \\
Panda 3 = (45, 60) \\
Panda 4 = (50, 65)
$$

- Each panda is represented as a simple 2-dimensional vector based on their scores.

- We could use clustering algorithms like K-means clustering to group pandas with similar scores together. The algorithm calculates the distance between vectors from clusters.

- The pandas might be clustered in two groups: high-performing and low-performing based on their vector scores. 