# Basics of linear algebra for machine learning

## 01 - Introduction to linear algebra
### Linear algebra
Linear algebra is about linear combinations: using arithmetic on columns of numbers (vectors) and arrays of numbers (matrices) to create new columns and arrays of numbers. It's been formalized in the 1800s to find unknowns in systems of linear equations. 

A linear equation is a series of terms and mathematical operations where some terms are unknown, for example:  
$y = 4 x + 1$

They are called linear equations because they describe a line on a two-dimensional graph. We can line up a system of equations with two or more unknowns:  
- $y = 0.1 x_{1} + 0.4 x_{2}$  
- $y = 0.3 x_{1} + 0.9 x_{2}$  
- $y = 0.2 x_{1} + 0.3 x_{2}$

where
- the column of $y$ values is a column vector of outputs from the equation
- the two columns of float values are the data columns $a_{1}$ and $a_{2}$ forming the matrix $A$
- the two unknown values $x_{1}$ and $x_{2}$ are the coefficients of the equation and form a vector of unknowns $b$ to be solved

summarized in linear algebra as  
$y = A \cdot b$

Such problems are challenging to solve because:
- there are usually more unknowns than there are equations to solve
- no single line can satisfy all of the equations without error

Interesting problems are often described by system with an infinite number of solutions. This is the core of linear algebra as it relates to machine learning. The rest of the operations are about making such problems easier to understand and solve.

### Numerical linear algebra
Implementations of vector and matrix operations were initially implemented in FORTRAN with libraries such as:
- LAPACK
- BLAS
- ATLAS

Popular packages used nowadays in Python for example build on top of these libraries.

### Linear algebra and statistics
- using vector and matrix notation (multivariate statistics)
- solving least squares and weighted least squares (linear regression)
- estimating means and variance of data matrices
- using the covariance matrix (multinomial Gaussian distributions)
- leveraging the concepts above for data reduction with principal component analysis

### Applications of linear algebra
- matrices in engineering (line of springs)
- graphs and networks (graph analysis)
- Markov matrices, population, economics (population growth)
- linear programming (simplex optimization method)
- Fourier series - linear algebra for functions (signal processing)
- linear algebra for statistics and probabilities (least squares for regression)
- computer graphics (translation, rescaling, rotation of images)

## 02 - Linear algebra and machine learning
Linear algebra is the mathematics of data. Often recommended as a prerequisite to machine learning, it can make more sense to first build context of the applied machine learning process.

### Reasons not to learn linear algebra
- it's not required in order to use machine learning as a tool to solve problems
- it's slow and might delay you achieving your goals
- it's a huge field and not all of it is relevant to machine learning

A breadth-first (results-first) approach can help build a skeleton and some context on which to build to deepen knowledge about how algorithms work or the math that underlies them.

### Linear algebra notation
You need to know how to read and write vector and matrix notation. It enables you to:
- describe operations on data precisely
- read descriptions of algorithms in textbooks
- implement machine learning algorithms faster and more efficiently
- interpret and implement new methods in research papers
- describe your own methods to other practitioners

### Linear algebra arithmetic
You need to know how to perform arithmetic operations: add, subtract and multiply scalars, vectors and matrices. Matrix multiplication and tensor multiplication are often non-intuitive at first. Understanding vector and matrix operations is required to effectively read and write matrix notation.

### Learn linear algebra for statistics
Linear algebra is heavily used in multivariate statistics. To read and interpret statistics, you need to know the notation and operations of linear algebra, such as vectors used for means and variance, or covariance matrices describing the relationships between multiple Gaussian variables. Principal component analysis also leverages such methods.

### Learn matrix factorization
Matrix factorization, is also called matrix decomposition. You need to know how to factorize a matrix and what it means. Matrix factorization is necessary for more complex operations in linear algebra (matrix inverse) and machine learning (least squares). Different matrix factorization exist, such as singular-value decomposition. To read and interpret higher-order matrix operations, matrix factorization is required.

### Learn linear least squares
Matrix factorization can be used to solve linear least squares. Problems where there is no line able to fit the data without error can be solved using the least squares method, called linear least squares in linear algebra. Linear least squares are used in regression models, and in a range of machine learning algorithms.

### One more reason
Seeing how the operations work on real data will help you develop a strong intuition for the methods. You will experience knowledge buzz and mind-expanding moments.

## 03 - Examples of linear algebra in machine learning
Linear algebra is concerned with vectors, matrices and linear transforms. It is foundational to machine learning from notations used to describe algorithms operation to their implementation in code. The relationship between linear algebra and machine learning is often left unexplained or abstract. Here are some examples of how linear algebra is leveraged in machine learning.

### Dataset and data files
Data is a matrix, which can be split into inputs (a matrix $X$) and outputs (a vector $y$. Each row has the same length (same number of columns): the data is vectorized and can be passed to a model one by one or in batch. The model can be pre-configured to expect rows of a fixed width.

### Images and photographs
An image is a table structure with a width and height and one-pixel value in each cell for black and white images or three pixel values (red, green and blue) for color images. Operations such as cropping, scaling, shearing are described using linear algebra notations and operations.

### One hot encoding
Categorical data can be one hot encoded so they are easier to work with and learn from by some machine learning techniques. One column is created for each category and a row for each example (e.g. if the categories are red, green and blue, we create a red column, a green column and a blue column). For each row in the dataset, we enter 1 in the column corresponding to the category and 0 in the others. Each row is encoded as a binary vector (0 or 1), which is an example of sparse representation.

### Linear regression
Linear regression is used to describe the relationship between variables. Solving the linear regression problem means finding a set of coefficients that gives the best prediction of the output variable when multiplied by each of the input variable and added together. It is usually solved using least squares optimization leveraging matrix factorization such as LU decomposition or singular-value decomposition.

It can be summarized using linear algebra notation:  
$y = A \cdot b$
where
- $y$ is the output variable
- $A$ is the dataset
- $b$ are the model coefficients

### Regularization
Simpler models often have smaller coefficient values. Regularization is leveraged to encourage a model to minimize the size of coefficients. Common implementations are the $L^{1}$ and $L^{2}$ forms. Both are a measure of the length of the coefficients as a vector, and leverage the vector norm.

### Principal component analysis
Modeling data with many features is challenging. Principal component analysis is a dimensionality reduction method used to create projections of high-dimensional data for visualization and training models. It uses a matrix factorization method; more robust implementations leverage eigendecomposition and singular-value decomposition.

### Singular-value decomposition
Singular-value decomposition is a dimensionality reduction method with applications in feature selection, visualization and noise reduction.

### Latent semantic analysis
Latent semantic analysis, also called latent semantic indexing, is a natural language processing method applied to document-term matrices (sparse representations of a text) and distill the representation down to its most relevant essence using matrix factorization methods such as singular-value decomposition.

### Recommender systems
The similarity between sparse customer behavior vectors leverages distance measures (e.g. Euclidean distance) or dot products. Matrix factorization methods such as single-value decomposition are used to distill user data to their essence for querying, searching and comparison.

### Deep learning
Artificial neural networks are nonlinear machine learning algorithms inspired by the way our brain processes information and have proved effective at a range of problems such as machine translation, photo captioning or speech recognition. Their execution leverages linear algebra structures (vectors, matrices and tensors of inputs and coefficients) multiplied and added together.

## 04 - Introduction to NumPy arrays
### NumPy n-dimensional array
NumPy is the preferred Python tool for linear algebra operations:
- the main structure is the `ndarray`, short for n-dimensional array
- data in an `ndarray` is referred to as an array
- data in an `ndarray` must be of the same type
- the type of an `ndarray` can be retrieved using the argument `.dtype` on the array
- the shape (ength of each dimension) of an `ndarray` can be retrieved using the argument `.shape` on the array
- the function `array()` is used to create an `ndarray`

In [11]:
import numpy as np

# Create arrays of integer, float and mixed types
array_int = np.array([1, 2, 3])
array_float = np.array([1.09, 2.87, 3.654])
array_mixed = np.array([1, 2.5, 3])

# Print arrays
print(f"array_int = {array_int}")
print(f"array_float = {array_float}")
print(f"array_mixed = {array_mixed}")
    
# Get the shape of all arrays
print(f"\nType of array_int: {array_int.dtype}")
print(f"Type of array_float: {array_float.dtype}")
print(f"""Type of array_mixed: {array_mixed.dtype}
==> <array_mixed> was passed an array of mixed data types (integers and floats)
and NumPy forced all the `ndarray` to a float dtype)"""

# Get the type of all arrays
print(f"\nShape of array_int: {array_int.shape}")
print(f"Shape of array_float: {array_float.shape}")
print(f"Shape of array_mixed: {array_mixed.shape}")

array_int = [1 2 3]
array_float = [1.09  2.87  3.654]
array_mixed = [1.  2.5 3. ]

Type of array_int: int64
Type of array_float: float64
Type of array_mixed: float64
==> <array_mixed> was passed an array of mixed data types (integers and floats)
and NumPy forced all the `ndarray` to a float dtype)

Shape of array_int: (3,)
Shape of array_float: (3,)
Shape of array_mixed: (3,)


### Functions to create arrays
- `empty()` creates an array of random variables of the specified shape
- `zeros()` creates an array of zeros of the specified shape
- `ones()` creates an array of ones variables of the specified shape

In [24]:
# Create arrays
array_empty = np.empty([3,3])
array_zeros = np.zeros([3,5])
array_ones = np.ones([3,5])

# Print arrays
print(f"array_empty =\n{array_empty}")
print(f"\narray_zeros =\n{array_zeros}")
print(f"\narray_ones =\n{array_ones}")

array_empty =
[[9.27119196e-310 0.00000000e+000 1.39067116e-309]
 [5.93760961e-038 2.14330648e+184 5.44909756e-090]
 [5.26237620e-037 1.38524351e-309 4.74303020e-322]]

array_zeros =
[[0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]]

array_ones =
[[1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]]


### Combining arrays
Arrays can be stacked:
- vertically using `vstack()`: given two one-dimensional arrays of the same length, you get a new two-dimensional array with two rows
- horizontally using `hstack()`: given two one-dimensional arrays of potentially similar length, you get a new one-dimensional array

In [28]:
# Same length
print("Same length:")
# Creating arrays
array_s01 = np.array([1, 2, 3])
array_s02 = np.array([4, 5, 6])

# Stacking
stack_sv = np.vstack([array_s01, array_s02])
stack_sh = np.hstack([array_s01, array_s02])

# Printing results
print(f"Vertical stack\n{stack_sv}")
print(f"\nHorizontal stack\n{stack_sh}")

# Different length
print("\nDifferent length:")
# Creating arrays
array_d01 = np.array([1, 2, 3])
array_d02 = np.array([4, 5, 6, 7])

# Stacking
print(f"Vertical stack")
try:
    stack_dv = np.vstack([array_d01, array_d02])
    print(stack_dv)
except ValueError as e:
    print(f"ValueError: {e}")
stack_dh = np.hstack([array_d01, array_d02])

# Printing results
print(f"\nHorizontal stack\n{stack_dh}")

Same length:
Vertical stack
[[1 2 3]
 [4 5 6]]

Horizontal stack
[1 2 3 4 5 6]

Different length:
Vertical stack
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 3 and the array at index 1 has size 4

Horizontal stack
[1 2 3 4 5 6 7]
