# Basics of linear algebra for machine learning - Jason Brownlee

## 01 - Introduction to linear algebra
### Linear algebra
Linear algebra is about linear combinations: using arithmetic on columns of numbers (vectors) and arrays of numbers (matrices) to create new columns and arrays of numbers. It's been formalized in the 1800s to find unknowns in systems of linear equations. 

A linear equation is a series of terms and mathematical operations where some terms are unknown, for example:  
$y = 4 x + 1$

They are called linear equations because they describe a line on a two-dimensional graph. We can line up a system of equations with two or more unknowns:  
- $y = 0.1 x_{1} + 0.4 x_{2}$  
- $y = 0.3 x_{1} + 0.9 x_{2}$  
- $y = 0.2 x_{1} + 0.3 x_{2}$

where
- the column of $y$ values is a column vector of outputs from the equation
- the two columns of float values are the data columns $a_{1}$ and $a_{2}$ forming the matrix $A$
- the two unknown values $x_{1}$ and $x_{2}$ are the coefficients of the equation and form a vector of unknowns $b$ to be solved

summarized in linear algebra as  
$y = A \cdot b$

Such problems are challenging to solve because:
- there are usually more unknowns than there are equations to solve
- no single line can satisfy all of the equations without error

Interesting problems are often described by system with an infinite number of solutions. This is the core of linear algebra as it relates to machine learning. The rest of the operations are about making such problems easier to understand and solve.

### Numerical linear algebra
Implementations of vector and matrix operations were initially implemented in FORTRAN with libraries such as:
- LAPACK
- BLAS
- ATLAS

Popular packages used nowadays in Python for example build on top of these libraries.

### Linear algebra and statistics
- using vector and matrix notation (multivariate statistics)
- solving least squares and weighted least squares (linear regression)
- estimating means and variance of data matrices
- using the covariance matrix (multinomial Gaussian distributions)
- leveraging the concepts above for data reduction with principal component analysis

### Applications of linear algebra
- matrices in engineering (line of springs)
- graphs and networks (graph analysis)
- Markov matrices, population, economics (population growth)
- linear programming (simplex optimization method)
- Fourier series - linear algebra for functions (signal processing)
- linear algebra for statistics and probabilities (least squares for regression)
- computer graphics (translation, rescaling, rotation of images)

## 02 - Linear algebra and machine learning
Linear algebra is the mathematics of data. Often recommended as a prerequisite to machine learning, it can make more sense to first build context of the applied machine learning process.

### Reasons not to learn linear algebra
- it's not required in order to use machine learning as a tool to solve problems
- it's slow and might delay you achieving your goals
- it's a huge field and not all of it is relevant to machine learning

A breadth-first (results-first) approach can help build a skeleton and some context on which to build to deepen knowledge about how algorithms work or the math that underlies them.

### Linear algebra notation
You need to know how to read and write vector and matrix notation. It enables you to:
- describe operations on data precisely
- read descriptions of algorithms in textbooks
- implement machine learning algorithms faster and more efficiently
- interpret and implement new methods in research papers
- describe your own methods to other practitioners

### Linear algebra arithmetic
You need to know how to perform arithmetic operations: add, subtract and multiply scalars, vectors and matrices. Matrix multiplication and tensor multiplication are often non-intuitive at first. Understanding vector and matrix operations is required to effectively read and write matrix notation.

### Learn linear algebra for statistics
Linear algebra is heavily used in multivariate statistics. To read and interpret statistics, you need to know the notation and operations of linear algebra, such as vectors used for means and variance, or covariance matrices describing the relationships between multiple Gaussian variables. Principal component analysis also leverages such methods.

### Learn matrix factorization
Matrix factorization, is also called matrix decomposition. You need to know how to factorize a matrix and what it means. Matrix factorization is necessary for more complex operations in linear algebra (matrix inverse) and machine learning (least squares). Different matrix factorization exist, such as singular-value decomposition. To read and interpret higher-order matrix operations, matrix factorization is required.

### Learn linear least squares
Matrix factorization can be used to solve linear least squares. Problems where there is no line able to fit the data without error can be solved using the least squares method, called linear least squares in linear algebra. Linear least squares are used in regression models, and in a range of machine learning algorithms.

### One more reason
Seeing how the operations work on real data will help you develop a strong intuition for the methods. You will experience knowledge buzz and mind-expanding moments.

## 03 - Examples of linear algebra in machine learning
Linear algebra is concerned with vectors, matrices and linear transforms. It is foundational to machine learning from notations used to describe algorithms operation to their implementation in code. The relationship between linear algebra and machine learning is often left unexplained or abstract. Here are some examples of how linear algebra is leveraged in machine learning.

### Dataset and data files
Data is a matrix, which can be split into inputs (a matrix $X$) and outputs (a vector $y$. Each row has the same length (same number of columns): the data is vectorized and can be passed to a model one by one or in batch. The model can be pre-configured to expect rows of a fixed width.

### Images and photographs
An image is a table structure with a width and height and one-pixel value in each cell for black and white images or three pixel values (red, green and blue) for color images. Operations such as cropping, scaling, shearing are described using linear algebra notations and operations.

### One hot encoding
Categorical data can be one hot encoded so they are easier to work with and learn from by some machine learning techniques. One column is created for each category and a row for each example (e.g. if the categories are red, green and blue, we create a red column, a green column and a blue column). For each row in the dataset, we enter 1 in the column corresponding to the category and 0 in the others. Each row is encoded as a binary vector (0 or 1), which is an example of sparse representation.

### Linear regression
Linear regression is used to describe the relationship between variables. Solving the linear regression problem means finding a set of coefficients that gives the best prediction of the output variable when multiplied by each of the input variable and added together. It is usually solved using least squares optimization leveraging matrix factorization such as LU decomposition or singular-value decomposition.

It can be summarized using linear algebra notation:  
$y = A \cdot b$
where
- $y$ is the output variable
- $A$ is the dataset
- $b$ are the model coefficients

### Regularization
Simpler models often have smaller coefficient values. Regularization is leveraged to encourage a model to minimize the size of coefficients. Common implementations are the $L^{1}$ and $L^{2}$ forms. Both are a measure of the length of the coefficients as a vector, and leverage the vector norm.

### Principal component analysis
Modeling data with many features is challenging. Principal component analysis is a dimensionality reduction method used to create projections of high-dimensional data for visualization and training models. It uses a matrix factorization method; more robust implementations leverage eigendecomposition and singular-value decomposition.

### Singular-value decomposition
Singular-value decomposition is a dimensionality reduction method with applications in feature selection, visualization and noise reduction.

### Latent semantic analysis
Latent semantic analysis, also called latent semantic indexing, is a natural language processing method applied to document-term matrices (sparse representations of a text) and distill the representation down to its most relevant essence using matrix factorization methods such as singular-value decomposition.

### Recommender systems
The similarity between sparse customer behavior vectors leverages distance measures (e.g. Euclidean distance) or dot products. Matrix factorization methods such as single-value decomposition are used to distill user data to their essence for querying, searching and comparison.

### Deep learning
Artificial neural networks are nonlinear machine learning algorithms inspired by the way our brain processes information and have proved effective at a range of problems such as machine translation, photo captioning or speech recognition. Their execution leverages linear algebra structures (vectors, matrices and tensors of inputs and coefficients) multiplied and added together.

## 04 - Introduction to NumPy arrays
### NumPy n-dimensional array
NumPy is the preferred Python tool for linear algebra operations:
- the main structure is the `ndarray`, short for n-dimensional array
- data in an `ndarray` is referred to as an array
- data in an `ndarray` must be of the same type
- the type of an `ndarray` can be retrieved using the argument `.dtype` on the array
- the shape (ength of each dimension) of an `ndarray` can be retrieved using the argument `.shape` on the array
- the function `array()` is used to create an `ndarray`

In [2]:
import numpy as np
#from collections.abc import Callable
from typing import Callable, Union

# Create arrays of integer, float and mixed types
array_int = np.array([1, 2, 3])
array_float = np.array([1.09, 2.87, 3.654])
array_mixed = np.array([1, 2.5, 3])

# Print arrays
print(f"array_int = {array_int}")
print(f"array_float = {array_float}")
print(f"array_mixed = {array_mixed}")
    
# Get the shape of all arrays
print(f"\nType of array_int: {array_int.dtype}")
print(f"Type of array_float: {array_float.dtype}")
print(f"""Type of array_mixed: {array_mixed.dtype}
==> <array_mixed> was passed an array of mixed data types (integers and floats)
and NumPy forced all the `ndarray` to a float dtype""")

# Get the type of all arrays
print(f"\nShape of array_int: {array_int.shape}")
print(f"Shape of array_float: {array_float.shape}")
print(f"Shape of array_mixed: {array_mixed.shape}")

array_int = [1 2 3]
array_float = [1.09  2.87  3.654]
array_mixed = [1.  2.5 3. ]

Type of array_int: int64
Type of array_float: float64
Type of array_mixed: float64
==> <array_mixed> was passed an array of mixed data types (integers and floats)
and NumPy forced all the `ndarray` to a float dtype

Shape of array_int: (3,)
Shape of array_float: (3,)
Shape of array_mixed: (3,)


### Functions to create arrays
- `empty()` creates an array of random variables of the specified shape (NDH. the values are not really random, but rather uninitialized)
- `zeros()` creates an array of zeros of the specified shape
- `ones()` creates an array of ones variables of the specified shape

In [2]:
# Create arrays
array_empty = np.empty([3,3])
array_zeros = np.zeros([3,5])
array_ones = np.ones([3,5])

# Print arrays
print(f"array_empty =\n{array_empty}")
print(f"\narray_zeros =\n{array_zeros}")
print(f"\narray_ones =\n{array_ones}")

array_empty =
[[9.27120335e-310 0.00000000e+000 1.72963877e-309]
 [2.14263370e+160 5.92936410e-038 4.71865494e-090]
 [7.11289854e-038 3.09570359e-259 3.95252517e-322]]

array_zeros =
[[0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]]

array_ones =
[[1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]]


In [3]:
# From scratch
# Create arrays with values generated by a function
import random
import time
random.seed(time.process_time())

def create_array_with_func(m: int, n: int, function: Callable[..., Union[int, float]], *function_arguments) -> list:
    """Creates a matrix of size m x n with values generated by the function passed

    Args:
        m (int): the number of rows of the created matrix
        n (int): the number of columns of the created matrix
        function (function): the function to apply to generate the number

    Returns:
        list: a matrix of size m x n with values generated by the function passed
    """
    array = []
    for i in range(0, m):
        array.append([])
        for j in range(0, n):
            array[i].append(function(*function_arguments))
    return array

In [4]:
# Implement empty() from scratch
np.array(create_array_with_func(3, 3, np.random.uniform, 0, 7))

array([[4.8523103 , 6.8933665 , 3.42736435],
       [6.84691407, 6.57227902, 1.99361783],
       [4.9086816 , 5.99988212, 2.03652842]])

In [5]:
# Implement zeros() from scratch
np.array(create_array_with_func(3, 5, lambda x: 0., None))

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

In [6]:
# Implement ones() from scratch
np.array(create_array_with_func(3, 5, lambda x: 1., None))

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

### Combining arrays
Arrays can be stacked:
- vertically using `vstack()`: given two one-dimensional arrays of the same length, you get a new two-dimensional array with two rows
- horizontally using `hstack()`: given two one-dimensional arrays of potentially similar length, you get a new one-dimensional array

In [7]:
# Same length
print("Same length:")
# Creating arrays
array_s01 = np.array([1, 2, 3])
array_s02 = np.array([4, 5, 6])

# Stacking
stack_sv = np.vstack([array_s01, array_s02])
stack_sh = np.hstack([array_s01, array_s02])

# Printing results
print(f"Vertical stack\n{stack_sv}")
print(f"\nHorizontal stack\n{stack_sh}")

# Different length
print("\nDifferent length:")
# Creating arrays
array_d01 = np.array([1, 2, 3])
array_d02 = np.array([4, 5, 6, 7])

# Stacking
print(f"Vertical stack")
try:
    stack_dv = np.vstack([array_d01, array_d02])
    print(stack_dv)
except ValueError as e:
    print(f"ValueError: {e}")
stack_dh = np.hstack([array_d01, array_d02])

# Printing results
print(f"\nHorizontal stack\n{stack_dh}")

Same length:
Vertical stack
[[1 2 3]
 [4 5 6]]

Horizontal stack
[1 2 3 4 5 6]

Different length:
Vertical stack
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 3 and the array at index 1 has size 4

Horizontal stack
[1 2 3 4 5 6 7]


In [8]:
# From scratch
def stack_arrays(stack_type: str, arrays) -> list:
    output = []
    it = iter(arrays)
    length = len(next(it))
    
    if stack_type == "h":
        for array in arrays:
            for item in array:
                output.append(item)
    elif stack_type == "v":
        # Check if no array has a different dimension than the first one
        if any(len(l) != length for l in it):
            return("Can't vertically stack arrays of different dimensions.")
        else:
            for array in arrays:
                output.append(array)
            
    return np.array(output)

In [9]:
# Same length
print("Same length:")
# Creating arrays
array_s01 = [1, 2, 3]
array_s02 = [4, 5, 6]

# Stacking
stack_sv = stack_arrays("v", [array_s01, array_s02])
stack_sh = stack_arrays("h", [array_s01, array_s02])

# Printing results
print(f"Vertical stack\n{stack_sv}")
print(f"\nHorizontal stack\n{stack_sh}")

# Different length
print("\nDifferent length:")
# Creating arrays
array_d01 = np.array([1, 2, 3])
array_d02 = np.array([4, 5, 6, 7])

stack_dv = stack_arrays("v", [array_d01, array_d02])
stack_dh = stack_arrays("h", [array_d01, array_d02])

print(f"Vertical stack\n{stack_dv}")
print(f"\nHorizontal stack\n{stack_dh}")

Same length:
Vertical stack
[[1 2 3]
 [4 5 6]]

Horizontal stack
[1 2 3 4 5 6]

Different length:
Vertical stack
Can't vertically stack arrays of different dimensions.

Horizontal stack
[1 2 3 4 5 6 7]


## 05 - Index, slice and reshape NumPy arrays
Machine learning data is represented as arrays - and in Python, almost always as NumPy arrays.

### From list to arrays
#### One-dimensional list to array
- the `array()` function can convert a one-dimensional Python list to a NumPy array

In [10]:
# Create Python list
python_list = [11, 22, 33, 44, 55]

# Create NumPy array
numpy_array = np.array(python_list)

# Print results
print(f"NumPy array:\n{numpy_array}")
print(f"NumPy array type: {type(numpy_array)}")
print(f"NumPy array data type: {numpy_array.dtype}")
print(f"Shape: {numpy_array.shape}")

NumPy array:
[11 22 33 44 55]
NumPy array type: <class 'numpy.ndarray'>
NumPy array data type: int64
Shape: (5,)


#### Two-dimensional list to array
Two-dimensional data is more likely in machine learning: it corresponds to a table where rows represent observations and columns represent features.

You can convert a list of lists where each list is a new observation to a NumPy array using the `array()` function as well.

In [11]:
# Create Python list
python_list = [[11, 22],
               [33, 44],
               [55, 66]]

# Create NumPy array
numpy_array = np.array(python_list)

# Print results
print(f"NumPy array:\n{numpy_array}")
print(f"NumPy array type: {type(numpy_array)}")
print(f"NumPy array data type: {numpy_array.dtype}")
print(f"Shape: {numpy_array.shape}")

NumPy array:
[[11 22]
 [33 44]
 [55 66]]
NumPy array type: <class 'numpy.ndarray'>
NumPy array data type: int64
Shape: (3, 2)


### Array indexing
You can access data in a NumPy array using indexing.

#### One-dimensional indexing
Indexing works like in Python, using:
- the square bracket operators (`[]`)
- a zero-offset index for the value to retrieve
- a negative indexes to retrieve values offset from the end of the array

In [12]:
# Create NumPy array
python_list = [11, 22, 33, 44, 55]
numpy_array = np.array(python_list)

# Get data at 1st and 5th position
print(f"Value at 1st position (index 0): {numpy_array[0]}")
print(f"Value at 5th position (index 4): {numpy_array[4]}")
print(f"Value at 4th position (index -2): {numpy_array[-2]}")

Value at 1st position (index 0): 11
Value at 5th position (index 4): 55
Value at 4th position (index -2): 44


In [13]:
# From scratch
python_list = [11, 22, 33, 44, 55]

# Get data at 1st and 5th position
print(f"Value at 1st position (index 0): {python_list[0]}")
print(f"Value at 5th position (index 4): {python_list[4]}")
print(f"Value at 4th position (index -2): {python_list[-2]}")

Value at 1st position (index 0): 11
Value at 5th position (index 4): 55
Value at 4th position (index -2): 44


#### Two-dimensional indexing
- indexing two-dimensional data is similar to indexing one-dimensional data, except that a comma is used to separate the index for each dimension
- the column index value can be left blank to select all columns for the given row

In [14]:
# Create Python list
python_list = [[11, 22],
               [33, 44],
               [55, 66]]

# Create NumPy array
numpy_array = np.array(python_list)

# Get data in first row, second column third row, first column
print(f"Value at 1st row, 2nd column ([0, 1]): {numpy_array[0, 1]}")
print(f"Value at 3rd row, 1st column ([2, 0]): {numpy_array[2, 0]}")
print(f"All values of 2nd row ([0,]): {numpy_array[1,]}")

Value at 1st row, 2nd column ([0, 1]): 22
Value at 3rd row, 1st column ([2, 0]): 55
All values of 2nd row ([0,]): [33 44]


In [15]:
# From scratch
def get_value_from_index(indexes: list, arrays: list):
    if indexes[0] == ",":
        return arrays
    elif len(indexes) == 1:
        return arrays[indexes[0]]
    else:
        return get_value_from_index(indexes[1:], arrays[indexes[0]])
        

In [16]:
# Two dimensions
python_list = [[11, 22],
               [33, 44],
               [55, 66]]
print(f"Value at 1st row, 2nd column ([0, 1]): {get_value_from_index([0, 1], python_list)}")
print(f"Value at 3rd row, 1st column ([2, 0]): {get_value_from_index([2, 0], python_list)}")
print(f"All values of 2nd row ([0,]): {get_value_from_index([1, ','], python_list)}")

# n dimensions
python_list = [[[11, 12], [22, 34]],
               [[33, 13], [44, 35]],
               [[55, 14], [66, 36]]]

print(f"Value at 1st 1d, 2nd 2d, 2nd 3d ([0, 1, 1]): {get_value_from_index([0, 1, 1], python_list)}")
print(f"Value at 3rd 1d ([2,]): {get_value_from_index([2, ','], python_list)}")

Value at 1st row, 2nd column ([0, 1]): 22
Value at 3rd row, 1st column ([2, 0]): 55
All values of 2nd row ([0,]): [33, 44]
Value at 1st 1d, 2nd 2d, 2nd 3d ([0, 1, 1]): 34
Value at 3rd 1d ([2,]): [[55, 14], [66, 36]]


### Array slicing
The slice extends from the *from* index and ends one item before the *to* index: `data[from:to]`.

#### One-dimensional slicing
All data in an array dimension can be selected by specifying the slice with no indexes: `:`.

In [17]:
# Create NumPy array
python_list = [11, 22, 33, 44, 55]
numpy_array = np.array(python_list)

# Get all data using a slice with no indexes
print(f"Selecting all data with `:`:\n{numpy_array[:]}")

Selecting all data with `:`:
[11 22 33 44 55]


In [18]:
# From scratch
python_list = [11, 22, 33, 44, 55]
print(f"Selecting all data with `:`:\n{np.array(python_list[:])}")

Selecting all data with `:`:
[11 22 33 44 55]


#### Two-dimensional slicing
It is common to split your data into input variables $X$ and output variable $y$.


This can be done using slicing:
- we can select all rows and all columns except the last one by specifying `:` in the rows index and `:-1` in the columns index
- we can select all rows and the last column by specifying `:` in the rows index and `-1` in the columns index.

In [19]:
# Create Python list
python_list = [[11, 22, 33],
               [44, 55, 66],
               [77, 88, 99]]

# Create NumPy array
numpy_array = np.array(python_list)

# Get inputs
inputs_X = numpy_array[:, :-1]
output_y = numpy_array[:, -1]

# Print results
print(f"Inputs X:\n{inputs_X}")
print(f"Outputs y:\n{output_y}")

Inputs X:
[[11 22]
 [44 55]
 [77 88]]
Outputs y:
[33 66 99]


In [20]:
# From scratch
# Getting slices to slice each array
def get_slice(array, index):
    if index.startswith(":"):
        if len(index)==1:
            index_start = 0
            index_end = len(array)
        else:
            index_start = 0
            index_end = int(index[1:])
    elif index.endswith(":"):
        index_start = int(index[:-1])
        index_end = len(index)
    elif ":" in index:
        index_start, index_end = [int(i) for i in index.split(":")]
    elif "-" in index:
        index_start = int(index)
        index_end = len(array) - int(index)
    else:
        index_start = int(index)
        index_end = int(index) + 1
    return slice(index_start, index_end, 1)

# Getting slices to slice each array
# Tested on 2D and 3D arrays
def slice_array(output, arrays, *indexes):
    if len(indexes) >= 2:
        for array in arrays[get_slice(arrays, indexes[0])]:        
            slice_array(output, array[get_slice(array, indexes[1])], *indexes[2:])
    elif type(arrays[0]) == list:
        if len(arrays) > 1:
            output.append(arrays[0][get_slice(arrays[0], indexes[0])])
            print(f"Added {arrays[0][get_slice(arrays[0], indexes[0])]}")
        else:
            output.append(arrays[0][get_slice(arrays[0], indexes[0])][0])
            print(f"Added {arrays[0][get_slice(arrays[0], indexes[0])][0]}")

    elif type(arrays[0]) == int:
        if len(arrays) > 1:
            output.append(arrays)
            print(f"Added {arrays}")
        else:
            output.append(arrays[0])
            print(f"Added {arrays[0]}")
            
    return output
    

# Testing for 2D arrays
python_list = [[11, 22, 33],
               [44, 55, 66],
               [77, 88, 99]]

inputs_X = np.array(slice_array([], python_list, ":", ":-1", "1"))
output_y = np.array(slice_array([], python_list, ":", "-1", "1"))

# Print results
print(f"Inputs X:\n{inputs_X}")
print(f"Outputs y:\n{output_y}")


# Testing for 3D arrays
python_list = [[[11, 12], [22, 34]],
               [[33, 13], [44, 35]],
               [[55, 14], [66, 36]]]

inputs_X = np.array(slice_array([], python_list, ":", ":-1", "1"))
output_y = np.array(slice_array([], python_list, ":", "-1", "1"))

# Print results
print(f"Inputs X:\n{np.array(inputs_X)}")
print(f"Outputs y:\n{np.array(output_y)}")

Added [11, 22]
Added [44, 55]
Added [77, 88]
Added 33
Added 66
Added 99
Inputs X:
[[11 22]
 [44 55]
 [77 88]]
Outputs y:
[33 66 99]
Added 12
Added 13
Added 14
Added 34
Added 35
Added 36
Inputs X:
[12 13 14]
Outputs y:
[34 35 36]


#### Split train and test rows
It is common to split a loaded dataset into separate training and testing sets.

In NumPy, this can be done:
- you can select the training dataset slicing all columns by specifying `:` in the second dimension index: `train = data[:split, :]`
- you can select the testing dataset slicing all columns by specifying `:` in the second dimension index: `train = data[split:, :]`

In [21]:
# Create Python list
python_list = [[11, 22, 33],
               [44, 55, 66],
               [77, 88, 99]]

# Create NumPy array
numpy_array = np.array(python_list)

# Split into train and test sets
split = 2
train = numpy_array[:split, :]
test = numpy_array[split:, :]

# Print results
print(f"Train:\n{train}")
print(f"Test:\n{test}")

Train:
[[11 22 33]
 [44 55 66]]
Test:
[[77 88 99]]


In [22]:
# From scratch
# Define split_train_test() function
def split_train_test(table: list, threshold: int):
    train = table[:threshold]
    test = table[threshold:]
    return train, test

# Create Python list
python_list = [[11, 22, 33],
               [44, 55, 66],
               [77, 88, 99]]

# Split into train and test sets
threshold = 2
train, test = split_train_test(python_list, threshold)

# Print results
print(f"Train:\n{np.array(train)}")
print(f"Test:\n{np.array(test)}")

Train:
[[11 22 33]
 [44 55 66]]
Test:
[[77 88 99]]


### Array reshaping
You may need to reshape your data. Some libraries like scikit-learn require that a one-dimensional array of output variables ($y$) be shaped as a two-dimensional array with one column and outcomes for each column. Some algorithms like the long short-term memory recurrent neural network in Keras require inputs to be specified as a three-dimensional array representing samples, timesteps and features.

#### Data shape
The `shape` attribute returns a tuple of the length of each dimension of the array.

NDH: Shapes are returned from the outer list to the inner lists.

In [23]:
# Create NumPy arrays
array_01 = np.array([1, 2, 3, 4, 5])
array_02 = np.array([[1, 2, 3, 4, 5],
                     [6, 7, 8, 9, 10]])
array_03 = np.array([[[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]],
                     [[11, 12, 13, 14, 15], [16, 17, 18, 19, 20]],
                     [[21, 22, 23, 24, 25], [26, 27, 28, 29, 30]]])

# Print shapes
print(f"Shape of array_01:\n{array_01.shape}")
print(f"Shape of array_02:\n{array_02.shape}")
print(f"corresponding to:\nRows:{array_02.shape[0]}\nColumns:{array_02.shape[1]}")
print(f"Shape of array_03:\n{array_03.shape}")

Shape of array_01:
(5,)
Shape of array_02:
(2, 5)
corresponding to:
Rows:2
Columns:5
Shape of array_03:
(3, 2, 5)


#### Reshape 1D to 2D array
It is common to need to reshape a one-dimensional array into a two-dimensional array with one column and multiple arrays:
- the `reshape()` function takes a single argument that specifies the new shape of the array
- this single argument is a tuple with the shape of the array as the first dimension and 1 for the second dimension

In [24]:
# Create NumPy arrays
array_01 = np.array([1, 2, 3, 4, 5])

# Print original shape
print(f"Original shape: {array_01.shape}")

# Reshape NumPy array
array_01 = array_01.reshape((array_01.shape[0], 1))

# Print new shape
print(f"New shape: {array_01.shape}")

Original shape: (5,)
New shape: (5, 1)


#### Reshape 2D to 3D array
It is common to need to reshape two-dimensional data where each row represents a sequence into a three-dimensional array for algorithms that expect multiple samples of one or more timesteps and one or more features.

In [25]:
# Create NumPy array
array_01 = np.array([[1, 2, 3, 4, 5],
                     [6, 7, 8, 9, 10]])

# Print original shape
print(f"Original shape: {array_01.shape}")

# Reshape NumPy array
array_01 = array_01.reshape((array_01.shape[0], array_01.shape[1], 1))

# Print new shape
print(f"New shape: {array_01.shape}")

Original shape: (2, 5)
New shape: (2, 5, 1)


## 06 - NumPy array broadcasting
Arrays with different sizes cannot be added, subtracted or generally used in arithmetic. NumPy overcomes this with array **broadcasting**: duplicating the smaller array so that it has the dimensionality and size of the larger array.

### Limitation with array arithmetic
You can perform array arithmetic such as addition and subtraction on NumPy arrays:
- two arrays can be added together
- values at each index are added together
- arithmetic can only be performed on arrays that have the same dimensions and dimensions with the same size

In [26]:
# Create arrays
array_01 = np.array([1, 2, 3])
array_02 = np.array([1, 2, 3])

# Print sum
print(array_01 + array_02)

[2 4 6]


In [27]:
# From scratch
# Create arrays
list_01 = [1, 2, 3]
list_02 = [1, 2, 3]

# Print sum
output = [list_01[i] + list_02[i] for i in range(len(list_01))]
print(np.array(output))

[2 4 6]


### Array broadcasting

Broadcasting allows array arithmetic between arrays with a different shape or size. The technique was developed for NumPy but has since been adopted by other libraries such as Theano, TensorFlow and Octave.

### Broadcasting in NumPy
#### Scalar and one-dimensional array
If we have a one-dimensional array and a scalar $b$, then $b$ is broadcast across the one-dimensional array by duplicating it as many times as possible:

In [28]:
# Create array and scalar
array_01 = np.array([1, 2, 3])
scalar_01 = 3

# Print broadcast result
print(array_01 + scalar_01)

[4 5 6]


In [29]:
# From scratch
# Create array and scalar
list_01 = [1, 2, 3]
scalar_01 = 3

# Print broadcast result
output = []
for item in list_01:
    output.append(item + scalar_01)

print(np.array(output))

[4 5 6]


#### Scalar and two-dimensional array

If we have a two-dimensional array and a scalar $b$, then $b$ is broadcast across all dimensions of the two-dimensional array by duplicating it as many times as possible:

In [30]:
# Create array and scalar
array_01 = np.array([[1, 2, 3],
                     [1, 2, 3]])
scalar_01 = 3

# Print broadcast result
print(array_01 + scalar_01)

[[4 5 6]
 [4 5 6]]


In [31]:
# From scratch
# Create array and scalar
list_01 = [[1, 2, 3],
           [1, 2, 3]]

scalar_01 = 3

# Print broadcast result
output = []
for array in list_01:
    row = []
    for item in array:
        row.append(item + scalar_01)
    output.append(row)

print(np.array(output))

[[4 5 6]
 [4 5 6]]


#### One-dimensional and two-dimensional arrays
If we have a one-dimensional array and a two-dimensional array, then the one-dimensional array is broadcast across each row of the two dimensional array by creating a second copy to result in a new two-dimensional array:

In [32]:
# Create arrays
array_01 = np.array([[1, 2, 3],
                     [1, 2, 3]])
array_02 = np.array([2, 4, 6])

# Print broadcast result
print(array_01 + array_02)

[[3 6 9]
 [3 6 9]]


In [33]:
# From scratch
# Create array and scalar
list_01 = [[1, 2, 3],
           [1, 2, 3]]
list_02 = np.array([2, 4, 6])

# Print broadcast result
output = []
for array in list_01:
    row = []
    for i in range(len(array)):
        row.append(array[i] + list_02[i])
    output.append(row)

print(np.array(output))

[[3 6 9]
 [3 6 9]]


### Limitations of broadcasting
Broadcasting doesn't work for all cases and imposes a strict rule for broadcasting to be performed:
- the shape of each dimension in the arrays must be equal, or one has the dimension of size 1
- the dimensions are considered in reverse order, starting with the trailing dimension (e.g. looking at columns before rows in a two-dimensional case)
- NumPy will in effect pad missing dimensions with a size of 1 when comparing arrays (in the example below, the shape of `array_02` will effectively be interpreted by NumPy as `1, 3`

In [34]:
# Create arrays
array_01 = np.array([[1, 2, 3],
                     [1, 2, 3]])
array_02 = np.array([2, 4, 6])

# Print arrays shapes
print(array_01.shape)
print(array_02.shape)

(2, 3)
(3,)


# 07 - Vectors and vector arithmetic
Vectors are used in machine learning in the description of machine learning and processes (e.g. the target variable $y$).

## What is a vector
- a vector is a tuple of one or more values, a list of numbers
- vector algebra are the operations performed on the numbers in the list

**Math notation**:  
$v = \left(v_{1}, v_{2}, v_{3}\right)$  
or  
$v = 
\begin{pmatrix} 
v_{1} \\
v_{2} \\
v_{3}\end{pmatrix}$  
where $v_{1}, v_{2}, v_{3}$ are scalar, often real, values.

In machine learning, when training an algorithm, it is common to represent the target variable as a vector $y$. Vectors are usually introduced with a geometric analogy where a vector represents a points or a coordinate in an $n$-dimensional space, where $n$ is the number of dimensions. It is also thought of as a line from the origin of the vector space with a direction and magnitude.  

These are a good starting point, but machine learning applications often consider very high dimensional vectors, where some of these analogies fail to hold. The vector-as-coordinates analogy is the most compelling in machine learning.

## Defining a vector
A vector can be represented with a NumPy array.

In [36]:
# Creating a vector
v = np.array([1, 2, 3])
print(v)

[1 2 3]


In [37]:
# From scratch
# Creating a vector
v = [1, 2, 3]
print(np.array(v))

[1 2 3]


## Vector arithmetic
### Vector addition
- two vectors of equal length can be added together to create a new third vector
- the new vector has the same length as the other two vectors
- each element of the new vector is calculated as the addition of the elements of the other vectors at the same index

**Math notation:**  
$c = a + b$  
$c = \left(a_{1} + b_{1}, a_{2} + b_{2}, a_{3} + b_{3}\right)$  

In [38]:
# Creating vectors
a = np.array([1, 2, 3])
b = np.array([1, 2, 3])

# Print vector addition result
print(a + b)

[2 4 6]


In [41]:
# From scratch
# Creating vectors
a = [1, 2, 3]
b = [1, 2, 3]

# Print vector addition result
output = [a[i] + b[i] for i in range(len(a))]
print(np.array(output))

[2 4 6]


### Vector subtraction
- one vector can be subtracted from another vector with equal length to create a new third vector
- the new vector has the same length as the other two vectors
- each element of the new vector is calculated as the subtraction of the elements of the other vectors at the same index

**Math notation:**  
$c = a - b$  
$c = \left(a_{1} - b_{1}, a_{2} - b_{2}, a_{3} - b_{3}\right)$  

In [44]:
# Creating vectors
a = np.array([1, 2, 3])
b = np.array([0.5, 0.5, 0.5])

# Print vector subtraction result
print(a - b)

[0 0 0]


In [46]:
# From scratch
# Creating vectors
a = [1, 2, 3]
b = [0.5, 0.5, 0.5]

# Print vector subtraction result
output = [a[i] - b[i] for i in range(len(a))]
print(np.array(output))

[0.5 1.5 2.5]


### Vector multiplication
- two vectors of equal length can be multiplied together
- the new vector has the same length as the other two vectors
- each element of the new vector is calculated as the product of the elements of the other vectors at the same index

**Math notation:**  
$c = a \times b$  
$c = \left(a_{1} \times b_{1}, a_{2} \times b_{2}, a_{3} \times b_{3}\right)$  
$c = \left(a_{1}b_{1}, a_{2}b_{2}, a_{3}b_{3}\right)$  

In [48]:
# Creating vectors
a = np.array([1, 2, 3])
b = np.array([1, 2, 3])

# Print vector multiplication result
print(a * b)

[1 4 9]


In [50]:
# From scratch
# Creating vectors
a = [1, 2, 3]
b = [1, 2, 3]

# Print vector multiplication result
output = [a[i] * b[i] for i in range(len(a))]
print(np.array(output))

[1 4 9]


### Vector division
- two vectors of equal length can be divided
- the new vector has the same length as the other two vectors
- each element of the new vector is calculated as the division of the elements of the other vectors at the same index

**Math notation:**  
$c = \frac{a}{b}$  
$c = \left(\frac{a_{1}}{b_{1}}, \frac{a_{2}}{b_{2}}, \frac{a_{3}}{b_{3}}\right)$  

In [51]:
# Creating vectors
a = np.array([1, 2, 3])
b = np.array([1, 2, 3])

# Print vector division result
print(a / b)

[1. 1. 1.]


In [52]:
# From scratch
# Creating vectors
a = [1, 2, 3]
b = [1, 2, 3]

# Print vector division result
output = [a[i] / b[i] for i in range(len(a))]
print(np.array(output))

[1. 1. 1.]


## Vector dot product
- the dot product consists in calculated the sum of the multiplied elements of two vectors of the same length to obtain a scalar
- it's a called dot product because we use the $\cdot$ (dot) operator
- it's a key tool for calculating vector projections, vector decomposition, and determine orthogonality
- it can be used in machine learning to calculate the weighted sum of a vector

**Math notation:**  
$c = a \cdot b$  
$c = \left(a_{1} \times b_{1} + a_{2} \times b_{2} + a_{3} \times b_{3}\right)$  
$c = \left(a_{1}b_{1} + a_{2}b_{2} + a_{3}b_{3}\right)$  

In [53]:
# Creating vectors
a = np.array([1, 2, 3])
b = np.array([1, 2, 3])

# Print vector division result
print(a.dot(b))

14


In [60]:
# From scratch
# Creating vectors
a = [1, 2, 3]
b = [1, 2, 3]

# Print vector dot product result
output = 0
for i in range(len(a)):
    product = a[i] * b[i]
    output += product
print(np.array(output))

14


### Vector-scalar multiplication
- a vector can be multiplied by a scalar
- this results in scaling the magnitude of the vector
- the new vector has the same length as the original vector
- the multiplication is performed on each element
- vector-scalar addition, subtraction and division are performed the same way

**Math notation:**  
$c = s \times v$  
$c = sv$  
$c = \left(s \times v_{1}, s \times v_{2}, s \times v_{3}\right)$  

In [64]:
# From scratch
# Creating vectors
a = [1, 2, 3]
s = 0.5

# Print vector dot product result
output = []
for i in range(len(a)):
    product = a[i] * s
    output.append(product)
print(np.array(output))

[0.5 1.  1.5]


## 08 - Vector norms

Calculating the length or magnitude or vectors can be required:
- as a regularization method in machine learning
- as part of broader vector or matrix operations

There are different ways to calculate vector lengths or magnitudes, called vector norms:
- the $L^{1}$ norm is calculated as the sum of the absolute values of the vector
- the $L^{2}$ norm is calculated as the square root of the sum of the squared values
- the max norm is calculated as the maximum vector values

### Vector norm
The length of a vector:
- is also called vector magnitude or vector norm
- is a non-negative number that describes the extent of the vector in space
- is always positive except for a vector with all zero values

### Vector $L^{1}$ norm
The $L^{1}$ norm:
- is also called taxicab norm or Manhattan norm
- is calculated as the sum of the absolute vector values
- is used in machine learning applications where it is important to discriminate between elements that are exactly 0 and elements that are small but non-zero
- is used in machine learning applications as a regularization method to keep coefficients small and the model less complex

#### Math notation
$L^{1}(v)=\|v\|_{1}$  
$\|v\|_{1} = |a_{1}| + |a_{2}| + |a_{3}|$

In [11]:
# With NumPy
from numpy.linalg import norm
## Creating vector
v = np.array([1, 2, 3])

## Calculating the L1 norm
l1_norm = norm(v, 1)
print(l1_norm)

6.0


In [12]:
# From scratch
## Creating vector
v = [1, 2, 3]

## Creating L1 norm
def calculate_l1_norm(v: list):
    """Calculates the L1 norm of a vector

    Args:
        v (list): the vector whose L1 norm needs to be extracted

    Returns:
        int: a zero or positive number, the value of the L1 norm of the vector
    """
    l1_norm = 0
    for i in v:
        l1_norm += abs(i)
    
    return float(l1_norm)

## Calculating the L1 norm
l1_norm = calculate_l1_norm(v)
print(l1_norm)

6


### Vector $L^{2}$ norm
The $L^{2}$ norm:
- is also called Euclidean norm
- is calculated as the square root of the sum of the squared vector values
- is used in machine learning applications as a regularization method to keep the coefficients of the model small and the model less complex
- is the most commonly used vector norm in machine learning

#### Math notation
$L^{2}(v)=\|v\|_{2}$  
$\|v\|_{2}=\sqrt{a^{2}_{1} + a^{2}_{2} + a^{2}_{3}}$

In [14]:
# With NumPy
from numpy.linalg import norm
## Creating vector
v = np.array([1, 2, 3])

## Calculating the L2 norm
l2_norm = norm(v, 2)
print(l2_norm)

3.7416573867739413


In [13]:
# From scratch
## Creating vector
v = [1, 2, 3]

## Creating L2 norm
def calculate_l2_norm(v: list):
    """Calculates the L1 norm of a vector

    Args:
        v (list): the vector whose L2 norm needs to be extracted

    Returns:
        int: a zero or positive number, the value of the L2 norm of the vector
    """
    l2_norm = 0
    for i in v:
        l2_norm += (i)**2
    
    return float(l2_norm**(1/2))

## Calculating the L2 norm
l2_norm = calculate_l2_norm(v)
print(l2_norm)

3.7416573867739413


### Vector max norm

The max norm:
- is calculated as the maximum vector value
- is used in machine learning applications as a regularization method called max norm regularization, e.g. on weights in neural networks

#### Math notation
$L^{inf} = \|v\|_{inf}$  
$L^{inf} = max \space a_{1}, a_{2}, a_{3}$

In [18]:
# With NumPy
from math import inf
from numpy.linalg import norm

## Creating vector
v = np.array([1, 2, 3])

## Calculating the Linf norm
max_norm = norm(v, inf)
print(max_norm)

In [20]:
# From scratch
## Creating vector
v = [1, 2, 3]

## Creating max norm
def calculate_max_norm(v: list):
    """Calculates the L1 norm of a vector

    Args:
        v (list): the vector whose max norm needs to be extracted

    Returns:
        int: a zero or positive number, the value of the max norm of the vector
    """
    max_norm = 0
    for i in v:
        if i > max_norm:
            max_norm = i
    
    return float(max_norm)

## Calculating the max norm
max_norm = calculate_max_norm(v)
print(max_norm)

3.0


## 09 - Matrices and matrix arithmetic
Matrices:
- are a foundational element of linear algebra
- are used in the descriptions of algorithms and processes in machine learning (e.g. the input variable $X$ when training a model)

### What is a matrix
- a matrix is a two-dimensional array, or table, of scalars with one or more columns and one or more rows
- a matrix is often represented with an uppercase letter (e.g. $A$)
- entries of a matrix are referred to by their two-dimensional subscript of row ($i$) and column ($j$), such as $a_{i, j}$
- the geometry analogy used to help understand vectors and some of their operations does not hold with matrices
- a vector can be considered as a matrix with one column and multiple rows


#### Math notation
A 3-row, 2-column matrix:  
$A = ((a_{1, 1}, a_{1, 2}), (a_{2, 1}, a_{2, 2}), (a_{3, 1}, a_{3, 2}))$  

Horizontal notation:  
$A = \begin{pmatrix}
     a_{1, 1} & a_{1, 2} \\
     a_{2, 1} & a_{2, 2} \\
     a_{3, 1} & a_{3, 2} \\
\end{pmatrix}$  

Dimensions of a matrix:  
$m \times n$  
where:  
- $m$ is the number of rows
- $n$ is the number of columns

In [10]:
# With NumPy
## Create matrix
A = np.array([[1, 2, 3],
              [4, 5, 6]])
print(A)

[[1 2 3]
 [4 5 6]]


In [14]:
# From scratch
## Create matrix
A = [[1, 2, 3],
    [4, 5, 6]]

for row in A:
    print(row)

[1, 2, 3]
[4, 5, 6]


### Matrix arithmetic
All operations in this section are performed element-wise:
- between two matrices of equal size
- resulting in a new matrix with the same size

#### Matrix addition
Two matrices with the same dimensions can be added together to create a new third matrix.

##### Math notation
$C = A + B$  

$C = \begin{pmatrix}
     a_{1, 1} + b_{1, 1} & a_{1, 2} + b_{1, 2} \\
     a_{2, 1} + b_{2, 1} & a_{2, 2} + b_{2, 2} \\
     a_{3, 1} + b_{3, 1} & a_{3, 2} + b_{3, 2} \\
\end{pmatrix}$  

$C[0, 0] = A[0, 0] + B[0, 0]$  
$C[1, 0] = A[1, 0] + B[1, 0]$  
$C[2, 0] = A[2, 0] + B[2, 0]$  
$C[0, 1] = A[0, 1] + B[0, 1]$  
$C[1, 1] = A[1, 1] + B[1, 1]$  
$C[2, 1] = A[2, 1] + B[2, 1]$  

In [21]:
# With NumPy
## Create matrices
A = np.array([[1, 2, 3],
              [4, 5, 6]])
B = np.array([[1, 2, 3],
              [4, 5, 6]])

print(f"A =\n{A}\n")
print(f"B =\n{B}\n")

## Add matrices
C = A + B
print(f"C =\n{C}\n")

A =
[[1 2 3]
 [4 5 6]]

B =
[[1 2 3]
 [4 5 6]]

C =
[[ 2  4  6]
 [ 8 10 12]]



In [98]:
# From scratch
## Create matrices
A = [[1, 2, 3],
    [4, 5, 6]]

B = [[1, 2, 3],
    [4, 5, 6]]

print("A = ")
for row in A:
    print(row)
    
print("\nB = ")
for row in B:
    print(row)
    
## Add matrices
def add_matrices(A, B):
    C = []
    for i in range(0, len(A)):
        C.append([])
        for j in range(0, len(A[0])):
            C_ij = A[i][j] + B[i][j]
            C[i].append(C_ij)
    return C

print("\nC = ")
C = add_matrices(A, B)
for row in C:
    print(row)

A = 
[1, 2, 3]
[4, 5, 6]

B = 
[1, 2, 3]
[4, 5, 6]

C = 
[2, 4, 6]
[8, 10, 12]


#### Matrix subtraction
One matrix can be subtracted from another matrix with the same dimensions. 

##### Math notation
$C = A + B$  

$C = \begin{pmatrix}
     a_{1, 1} - b_{1, 1} & a_{1, 2} - b_{1, 2} \\
     a_{2, 1} - b_{2, 1} & a_{2, 2} - b_{2, 2} \\
     a_{3, 1} - b_{3, 1} & a_{3, 2} - b_{3, 2} \\
\end{pmatrix}$  

$C[0, 0] = A[0, 0] - B[0, 0]$  
$C[1, 0] = A[1, 0] - B[1, 0]$  
$C[2, 0] = A[2, 0] - B[2, 0]$  
$C[0, 1] = A[0, 1] - B[0, 1]$  
$C[1, 1] = A[1, 1] - B[1, 1]$  
$C[2, 1] = A[2, 1] - B[2, 1]$  

In [27]:
# With NumPy
## Create matrices
A = np.array([[1, 2, 3],
              [4, 5, 6]])
B = np.array([[1, 2, 3],
              [4, 5, 6]])

print(f"A =\n{A}\n")
print(f"B =\n{B}\n")

## Add matrices
C = A - B
print(f"C =\n{C}\n")

A =
[[1 2 3]
 [4 5 6]]

B =
[[1 2 3]
 [4 5 6]]

C =
[[0 0 0]
 [0 0 0]]



In [96]:
# From scratch
## Create matrices
A = [[1, 2, 3],
     [4, 5, 6]]

B = [[1, 2, 3],
     [4, 5, 6]]

print("A = ")
for row in A:
    print(row)
    
print("\nB = ")
for row in B:
    print(row)
    
## Add matrices
def subtract_matrices(A, B):
    C = []
    for i in range(0, len(A)):
        C.append([])
        for j in range(0, len(A[0])):
            C_ij = A[i][j] - B[i][j]
            C[i].append(C_ij)
    return C

print("\nC = ")
C = subtract_matrices(A, B)
for row in C:
    print(row)

A = 
[1, 2, 3]
[4, 5, 6]

B = 
[1, 2, 3]
[4, 5, 6]

C = 
[0, 0, 0]
[0, 0, 0]


#### Matrix multiplication (Hadamart product)
- two matrices the same size can be multiplied together
- this is called element-wise matrix multiplication or Hadamart product
- it isn't the typical operation referred to by matrix multiplication
- a different operator, $\circ$, is used

##### Math notation
$C = A \circ B$  

$C = \begin{pmatrix}
     a_{1, 1} \times b_{1, 1} & a_{1, 2} \times b_{1, 2} \\
     a_{2, 1} \times b_{2, 1} & a_{2, 2} \times b_{2, 2} \\
     a_{3, 1} \times b_{3, 1} & a_{3, 2} \times b_{3, 2} \\
\end{pmatrix}$  

$C[0, 0] = A[0, 0] \times B[0, 0]$  
$C[1, 0] = A[1, 0] \times B[1, 0]$  
$C[2, 0] = A[2, 0] \times B[2, 0]$  
$C[0, 1] = A[0, 1] \times B[0, 1]$  
$C[1, 1] = A[1, 1] \times B[1, 1]$  
$C[2, 1] = A[2, 1] \times B[2, 1]$  

In [31]:
# With NumPy
## Create matrices
A = np.array([[1, 2, 3],
              [4, 5, 6]])
B = np.array([[1, 2, 3],
              [4, 5, 6]])

print(f"A =\n{A}\n")
print(f"B =\n{B}\n")

## Add matrices
C = A * B
print(f"C =\n{C}\n")

A =
[[1 2 3]
 [4 5 6]]

B =
[[1 2 3]
 [4 5 6]]

C =
[[ 1  4  9]
 [16 25 36]]



In [99]:
# From scratch
## Create matrices
A = [[1, 2, 3],
     [4, 5, 6]]

B = [[1, 2, 3],
     [4, 5, 6]]

print("A = ")
for row in A:
    print(row)
    
print("\nB = ")
for row in B:
    print(row)
    
## Multiply matrices
def multiply_matrices(A, B):
    C = []
    for i in range(0, len(A)):
        C.append([])
        for j in range(0, len(A[0])):
            C_ij = A[i][j] * B[i][j]
            C[i].append(C_ij)
    return C

print("\nC = ")
C = multiply_matrices(A, B)
for row in C:
    print(row)

A = 
[1, 2, 3]
[4, 5, 6]

B = 
[1, 2, 3]
[4, 5, 6]

C = 
[1, 4, 9]
[16, 25, 36]


#### Matrix division
One matrix can be divided by another matrix with the same dimensions.

##### Math notation
$C = \frac{A}{B}$  

$C = \begin{pmatrix}
     \frac{a_{1, 1}}{b_{1, 1}} & \frac{a_{1, 2}}{b_{1, 2}} \\
     \frac{a_{2, 1}}{b_{2, 1}} & \frac{a_{2, 2}}{b_{2, 2}} \\
     \frac{a_{3, 1}}{b_{3, 1}} & \frac{a_{3, 2}}{b_{3, 2}} \\
\end{pmatrix}$  

$C[0, 0] = A[0, 0] \space / \space B[0, 0]$  
$C[1, 0] = A[1, 0] \space / \space B[1, 0]$  
$C[2, 0] = A[2, 0] \space / \space B[2, 0]$  
$C[0, 1] = A[0, 1] \space / \space B[0, 1]$  
$C[1, 1] = A[1, 1] \space / \space B[1, 1]$  
$C[2, 1] = A[2, 1] \space / \space B[2, 1]$  

In [33]:
# With NumPy
## Create matrices
A = np.array([[1, 2, 3],
              [4, 5, 6]])
B = np.array([[1, 2, 3],
              [4, 5, 6]])

print(f"A =\n{A}\n")
print(f"B =\n{B}\n")

## Add matrices
C = A / B
print(f"C =\n{C}\n")

A =
[[1 2 3]
 [4 5 6]]

B =
[[1 2 3]
 [4 5 6]]

C =
[[1. 1. 1.]
 [1. 1. 1.]]



In [101]:
# From scratch
## Create matrices
A = [[1, 2, 3],
     [4, 5, 6]]

B = [[1, 2, 3],
     [4, 5, 6]]

print("A = ")
for row in A:
    print(row)
    
print("\nB = ")
for row in B:
    print(row)
    
## Divide matrices
def divide_matrices(A, B):
    C = []
    for i in range(0, len(A)):
        C.append([])
        for j in range(0, len(A[0])):
            C_ij = A[i][j] / B[i][j]
            C[i].append(C_ij)
    return C

print("\nC = ")
C = divide_matrices(A, B)
for row in C:
    print(row)

A = 
[1, 2, 3]
[4, 5, 6]

B = 
[1, 2, 3]
[4, 5, 6]

C = 
[1.0, 1.0, 1.0]
[1.0, 1.0, 1.0]


#### Matrix-matrix multiplication
- matrix-matrix multiplication is also called the matrix **dot product**
- not all matrices can be multiplied together using the dot product
- the number of columns $n$ of the first matrix $A$ must be equal to the number of rows $m$ in the second matrix $B$
- the rule above applies for a chain of matrix multiplications where the number of columns in one matrix in the chain must match the number of rows in the following matrix in the chain
- if
  - matrix $A$ has dimensions $m \times n$
  - matrix $B$ has dimensions $n \times k$
  - then the resulting matrix $C$ equal to $A \cdot B$ has dimensions $m \times k$

##### Math notation
$A = \begin{pmatrix}
     a_{1, 1} & a_{1, 2} \\
     a_{2, 1} & a_{2, 2} \\
     a_{3, 1} & a_{3, 2} \\
\end{pmatrix}$  

$B = \begin{pmatrix}
     b_{1, 1} & b_{1, 2} \\
     b_{2, 1} & b_{2, 2} \\
\end{pmatrix}$  

$C = A \cdot B$  

$C = \begin{pmatrix}
     a_{1, 1} \times b_{1, 1} + a_{1, 2} \times b_{2, 1} & a_{1, 1} \times b_{1, 2} + a_{1, 2} \times b_{2, 2} \\
     a_{2, 1} \times b_{1, 1} + a_{2, 2} \times b_{2, 1} & a_{2, 1} \times b_{1, 2} + a_{2, 2} \times b_{2, 2} \\
     a_{3, 1} \times b_{1, 1} + a_{3, 2} \times b_{2, 1} & a_{3, 1} \times b_{1, 2} + a_{3, 2} \times b_{2, 2} \\
\end{pmatrix}$  

$C[0, 0] = A[0, 0] \times B[0, 0] + A[0, 1] \times B[1, 0]$  
$C[1, 0] = A[1, 0] \times B[0, 0] + A[1, 1] \times B[1, 0]$  
$C[2, 0] = A[2, 0] \times B[0, 0] + A[2, 1] \times B[1, 0]$  
$C[0, 1] = A[0, 0] \times B[0, 1] + A[0, 1] \times B[1, 1]$  
$C[1, 1] = A[1, 0] \times B[0, 1] + A[1, 1] \times B[1, 1]$  
$C[2, 1] = A[2, 0] \times B[0, 1] + A[2, 1] \times B[1, 1]$  

In [35]:
# With NumPy
## Create matrices
A = np.array([[1, 2],
              [3, 4],
              [5, 6]])
B = np.array([[1, 2],
              [3, 4]])

print(f"A =\n{A}\n")
print(f"B =\n{B}\n")

## Add matrices
C = A.dot(B)
print(f"C =\n{C}\n")

A =
[[1 2]
 [3 4]
 [5 6]]

B =
[[1 2]
 [3 4]]

C =
[[ 7 10]
 [15 22]
 [23 34]]



In [125]:
# From scratch
## Create matrices
A = [[1, 2],
     [3, 4],
     [5, 6]]

B = [[1, 2],
     [3, 4]]

print("A = ")
for row in A:
    print(row)
    
print("\nB = ")
for row in B:
    print(row)
    
## Calculate dot product of matrices
def calculate_dot_product(A, B):
    C = []
    for i in range(0, len(A)):
        C.append([])
        for j in range(0, len(A[0])):
            C_ij = A[i][0] * B[0][j] + A[i][1] * B[1][j]
            C[i].append(C_ij)
    return C

print("\nC = ")
C = calculate_dot_product(A, B)
for row in C:
    print(row)

A = 
[1, 2]
[3, 4]
[5, 6]

B = 
[1, 2]
[3, 4]

C = 
[7, 10]
[15, 22]
[23, 34]


#### Matrix-vector multiplication
- a matrix and a vector can be multiplied together...
- as long as the rule of matrix multiplication is observed: then umber of columns in the matrix must be equal to the number of items in the vector
- the result is always a vector
- the operation can be written using the $\cdot$ notation

##### Math notation
$c = A \cdot v$  
or more compactly  
$c = Av$

$A = \begin{pmatrix}
     a_{1, 1} & a_{1, 2} \\
     a_{2, 1} & a_{2, 2} \\
     a_{3, 1} & a_{3, 2} \\
\end{pmatrix}$

$v = \begin{pmatrix}
     v_{1} \\
     v_{2} \\
\end{pmatrix}$  

$c = \begin{pmatrix}
     a_{1, 1} \times v_{1} & a_{1, 2} v_{2} \\
     a_{2, 1} \times v_{1} & a_{2, 2} v_{2} \\
     a_{3, 1} \times v_{1} & a_{3, 2} v_{2} \\
\end{pmatrix}$  
or more compactly  
$c = \begin{pmatrix}
     a_{1, 1}v_{1} & a_{1, 2}v_{2} \\
     a_{2, 1}v_{1} & a_{2, 2}v_{2} \\
     a_{3, 1}v_{1} & a_{3, 2}v_{2} \\
\end{pmatrix}$  

$C[0] = A[0, 0] \times v[0] + A[0, 1] \times v[1]$  
$C[1] = A[1, 0] \times v[0] + A[1, 1] \times v[1]$  
$C[2] = A[2, 0] \times v[0] + A[2, 1] \times v[1]$  

In [42]:
# With NumPy
## Create matrices
A = np.array([[1, 2],
              [3, 4],
              [5, 6]])
v = np.array([0.5, 0.5])

print(f"A =\n{A}\n")
print(f"v =\n{v}\n")

## Add matrices
C = A.dot(v)
print(f"C =\n{C}\n")

A =
[[1 2]
 [3 4]
 [5 6]]

v =
[0.5 0.5]

C =
[1.5 3.5 5.5]



In [107]:
# From scratch
## Create matrices
A = [[1, 2],
     [3, 4],
     [5, 6]]

v = [0.5, 0.5]

print("A = ")
for row in A:
    print(row)
    
print("\nv = ")
for row in v:
    print(row)
    
## Multiply matrix by vector
C = []
def multiply_matrix_by_vector(M, v):
    for i in range(0, len(M)):
        C_ij = M[i][0] * v[0] + M[i][1] * v[1]
        C.append(C_ij)
    return C

C = multiply_matrix_by_vector(A, v)
print(f"\nC = {C}")

A = 
[1, 2]
[3, 4]
[5, 6]

v = 
0.5
0.5

C = [1.5, 3.5, 5.5]


#### Matrix-scalar multiplication
- a matrix can be multiplied by a scalar
- the result is a matrix of the same dimension as the parent matrix where each element is multiplied by the scalar value

##### Math notation
$C = A \cdot b$  
or more compactly  
$C = Ab$  

$A = \begin{pmatrix}
     a_{1, 1} & a_{1, 2} \\
     a_{2, 1} & a_{2, 2} \\
     a_{3, 1} & a_{3, 2} \\
\end{pmatrix}$  

$C = \begin{pmatrix}
     a_{1, 1} \times b & a_{1, 2} \times b \\
     a_{2, 1} \times b & a_{2, 2} \times b \\
     a_{3, 1} \times b & a_{3, 2} \times b \\
\end{pmatrix}$  
or more compactly  
$C = \begin{pmatrix}
     a_{1, 1}b & a_{1, 2}b \\
     a_{2, 1}b & a_{2, 2}b \\
     a_{3, 1}b & a_{3, 2}b \\
\end{pmatrix}$  

$C[0, 0] = A[0, 0] \times b$  
$C[1, 0] = A[1, 0] \times b$  
$C[2, 0] = A[2, 0] \times b$  
$C[1, 1] = A[0, 1] \times b$  
$C[2, 1] = A[1, 1] \times b$  
$C[3, 1] = A[2, 1] \times b$  

In [46]:
# With NumPy
## Create matrices
A = np.array([[1, 2],
              [3, 4],
              [5, 6]])
b = 0.5

print(f"A =\n{A}\n")
print(f"v =\n{v}\n")

## Add matrices
C = A * b
print(f"C =\n{C}\n")

A =
[[1 2]
 [3 4]
 [5 6]]

v =
[0.5, 0.5]

C =
[[0.5 1. ]
 [1.5 2. ]
 [2.5 3. ]]



In [109]:
# From scratch
## Create matrices
A = [[1, 2],
     [3, 4],
     [5, 6]]

b = 0.5

print("A = ")
for row in A:
    print(row)
    
print(f"\nb = {b}")
    
## Multiply matrix by scalar
C = []
def multiply_matrix_by_scalar(M, s):
    for i in range(0, len(A)):
        C.append([])
        for j in range(0, len(A[0])):
            C_ij = A[i][j] * s
            C[i].append(C_ij)
    return C

print(f"\nC =")
C = multiply_matrix_by_scalar(A, b)
for row in C:
    print(row)

A = 
[1, 2]
[3, 4]
[5, 6]

b = 0.5

C =
[0.5, 1.0]
[1.5, 2.0]
[2.5, 3.0]


## 10 - Types of matrices
### Square matrix
- a square matrix has the same number of rows $m$ and $columns$ $n$
- it's different from a rectangular matrix where the number of rows and numbers are not equal
- the size of the matrix is called the **order** (e.g. an order 4 matrix is a $4 \times 4$ matrix)
- the vector of values form the top left of the matrix down to the bottom right is called the **main diagonal**
- squares matrices are easily added and multiplied together
- square matrices are the basis of many simple linear transformations, such as rotations (e.g. image rotation)

#### Math notation
$A = \begin{pmatrix}
     a_{1, 1} & a_{1, 2} & a_{1, 3} \\
     a_{2, 1} & a_{2, 2} & a_{1, 3} \\
     a_{3, 1} & a_{3, 2} & a_{1, 3} \\
\end{pmatrix}$  

### Symmetric matrix
- a symmetric matrix is a type of square matrix where the top-right triangle is the same as the lower-left triangle
- the axis of symmetry is always the main diagonal
- a symmetric matrix is always square and equal to its own transpose (the transpose is an operation that flips the number of rows and columns
- the symmetric matrix is one of the most important types of matrices in linear algebra and linear algebra applications

#### Math notation
$M = M^{T}$  

$M = \begin{pmatrix}
     1 & 2 & 3 & 4 & 5 \\
     2 & 1 & 2 & 3 & 4 \\
     3 & 2 & 1 & 2 & 3 \\
     4 & 3 & 2 & 1 & 2 \\
     5 & 4 & 3 & 2 & 1 \\
\end{pmatrix}$

### Triangular matrix
- a triangular matrix is a type of square matrix that has all values in the upper-right or lower-left corner non-zero, and the remaining elements equal to 0
- a triangular matrix with values only above the main diagonal is called an **upper triangular matrix**
- a triangular matrix with values only below the main diagonal is called an **lower triangular matrix**
- with NumPy, a lower triangular matrix can be obtained using `tril()`, and an upper triangular matrix can be obtained using `triu()`

#### Math notation
Upper triangular matrix:  
$A = \begin{pmatrix}
     1 & 2 & 3 \\
     0 & 2 & 3 \\
     0 & 0 & 3 \\
\end{pmatrix}$  

Upper triangular matrix:  
$A = \begin{pmatrix}
     1 & 0 & 0 \\
     1 & 2 & 0 \\
     1 & 2 & 3 \\
\end{pmatrix}$  

In [56]:
# With NumPy
## Define a square matrix
M = np.array([[1, 2, 3],
              [1, 2, 3],
              [1, 2, 3]])
print(M)

## Create a lower triangular matrix
lower = np.tril(M)
print(f"\n{lower}")

## Create an upper triangular matrix
upper = np.triu(M)
print(f"\n{upper}")

[[1 2 3]
 [1 2 3]
 [1 2 3]]

[[1 0 0]
 [1 2 0]
 [1 2 3]]

[[1 2 3]
 [0 2 3]
 [0 0 3]]


In [75]:
# From scratch
def create_triangular_matrix(M, kind="upper"):
    new_matrix = []
    if kind == "upper":
        for i in range(len(M)):
            new_matrix.append([0 for i in range(len(M[0]))])
            for j in range(len(M[0])):
                if j >= i:
                    new_matrix[i][j:] = M[i][j:]
                    
    if kind == "lower":
        for i in range(len(M)):
            new_matrix.append([0 for i in range(len(M[0]))])
            new_matrix[i][:i+1] = M[i][:i+1]
                    
    return new_matrix
                
M = [[1, 2, 3],
     [1, 2, 3],
     [1, 2, 3]]

for row in M:
    print(row)

lower = create_triangular_matrix(M, kind="lower")
print("")
for row in lower:
    print(row)

upper = create_triangular_matrix(M, kind="upper")
print("")
for row in upper:
    print(row)

[1, 2, 3]
[1, 2, 3]
[1, 2, 3]

[1, 0, 0]
[1, 2, 0]
[1, 2, 3]

[1, 2, 3]
[0, 2, 3]
[0, 0, 3]


### Diagonal matrix
- a matrix is diagonal when values outside of the main diagonal have a 0 value
- a diagonal matrix is denoted $D$
- a diagonal matrix can be represented as a matrix or as the vector of its main diagonal values
- a diagonal matrix doesn't have to be square: in a rectangular matrix, the diagonal covers the dimension with the smallest length
- with NumPy, a diagonal matrix can be obtained using `diag()`

#### Math notation
As a matrix:  
$D = \begin{pmatrix}
     1 & 0 & 0 \\
     0 & 2 & 0 \\
     0 & 0 & 3 \\
\end{pmatrix}$  

As a vector:  
$d = \begin{pmatrix}
     d_{1, 1} \\
     d_{2, 2} \\
     d_{3, 3} \\
\end{pmatrix}$  

with the specific values:  
$d = \begin{pmatrix}
     1 \\
     2 \\
     3 \\
\end{pmatrix}$  

In [80]:
# With NumPy
## Create array
M = np.array([[1, 2, 3],
              [1, 2, 3],
              [1, 2, 3]])
print(f"\n{M}")

## Extract diagonal vector
d = np.diag(M)
print(f"\n{d}")

## Extract diagonal matrix from vector
D = np.diag(d)
print(f"\n{D}")


[[1 2 3]
 [1 2 3]
 [1 2 3]]

[1 2 3]

[[1 0 0]
 [0 2 0]
 [0 0 3]]


In [81]:
# From scratch
def create_diagonal_matrix(M):
    new_matrix = []
    for i in range(len(M)):
        new_matrix.append([0 for i in range(len(M[0]))])
        for j in range(len(M[0])):
            if i == j:
                new_matrix[i][j] = M[i][j]
                    
    return new_matrix
                
M = [[1, 2, 3],
     [1, 2, 3],
     [1, 2, 3]]

for row in M:
    print(row)

diagonal = create_diagonal_matrix(M)
print("")
for row in diagonal:
    print(row)

[1, 2, 3]
[1, 2, 3]
[1, 2, 3]

[1, 0, 0]
[0, 2, 0]
[0, 0, 3]


### Identity matrix
- an identity matrix is a matrix that does not change a vector when multiplied
- all of the scalar values along the main diagonal are equal to one
- all of the scalar values that are not along the main diagonal are equal to zero
- an identity matrix is often denoted $I^{n}$ where $n$ is the dimensionality of the square matrix
- an identity matrix can also be denoted $U$ for **unit** matrix (not to be confused with the *unitary* matrix)
- an identity matrix can be created using the `identity()` function in NumPy
- the identity matrix is a component in important matrix operations (e.g. matrix inversion)

#### Math notation
$I^{3} = \begin{pmatrix}
         1 & 0 & 0 \\
         0 & 1 & 0 \\
         0 & 0 & 1 \\
\end{pmatrix}$  

In [82]:
# With NumPy
## Create identity matrix
I = np.identity(3)
print(I)

[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]


In [89]:
# From scratch
def create_identity_matrix(n):
    I = []
    for i in range(n):
        I.append([])
        for j in range(n):
            if i != j:
                I[i].append(0)
            else:
                I[i].append(1)

    return I
            
# Create identity matrix
I = create_identity_matrix(3)
for row in I:
    print(row)

[1, 0, 0]
[0, 1, 0]
[0, 0, 1]


### Orthogonal matrix
- two vectors are orthogonal when their dot product equals zero
- if the length of each vector is one, then the vectors are called **orthonormal** as they are **orthogonal** and **normalized**
- rows are mutually orthonormal and columns are mutually orthonormal
- one line is orthogonal with another if it's perpendicular to it
- an orthogonal matrix is a type of square matrix whose columns and rows are orthonormal unit vectors (e.g. perpendicular and with a magnitude of one)
- multiplication by an orthogonal matrix preserves lengths
- a matrix is orthogonal if its transpose is equal to its inverse
- a matrix is orthogonal if the dot product of the matrix and its transpose equals the identity matrix
- orthogonal matrices are used in linear transformations, such as reflections of permutations

#### Math notation
$v \cdot w = 0$  
or  
$v \cdot w^{T} = 0$  

$Q^{T} \times Q = Q \times Q^{T} = I$  

A matrix is orthogonal if its transpose is equal to its inverse:  
$Q^{T} = Q^{-1}$  

A matrix is orthogonal if the dot product of the matrix and its transpose equals the identity matrix:  
$Q \cdot Q^{T} = I$  

Example of an orthogonal matrix:  
$Q = \begin{pmatrix}
     1 & 0 \\
     0 & -1 \\
\end{pmatrix}$  

In [92]:
# With NumPy
## Define orthogonal matrix
Q = np.array([[1, 0],
              [0, -1]])
print(f"{Q}\n")

## Get inverse equivalence
print(f"{Q.T}\n")

## Get dot product equivalence
print(f"{Q.dot(Q.T)}\n")

[[ 1  0]
 [ 0 -1]]

[[ 1  0]
 [ 0 -1]]

[[1 0]
 [0 1]]



In [128]:
# From scratch
## Define transpose function
def transpose_matrix(M):
    T = []
    for i in range(len(M[0])):
        T.append([])
        for j in range(len(M)):
            T[i].append(M[j][i])
    return T

# Create matrix
Q = [[1, 0],
     [0, -1]]
for row in Q:
    print(row)

# Get inverse equivalence
matrix_transpose = transpose_matrix(Q)
print("")
for row in matrix_transpose:
    print(row)

# Get dot product equivalence
dot_product = calculate_dot_product(Q, matrix_transpose)
print("")
for row in dot_product:
    print(row)

[1, 0]
[0, -1]

[1, 0]
[0, -1]

[1, 0]
[0, 1]


## 11 - Matrix operations