## Lesson 1
This lesson gave an introduction to matrices & vectors. Andrew covered how their indices work, and how their dimensionality is measured.

### Notation and Terms
* $A_{ij}$ refers to the element within a matrix on the $i^{th}$ row and in the $j^{th}$ column
* $\mathbb{R}$ refers to the set of real numbers (Andrew called this the set of scalar real numbers.

Below is some code that Andrew shared to practice matrix manipulation:

In [4]:
% The ; denotes we are going back to a new row.
A = [1, 2, 3; 4, 5, 6; 7, 8, 9; 10, 11, 12]

% Initialize a vector 
v = [1;2;3] 

% Get the dimension of the matrix A where m = rows and n = columns
[m,n] = size(A)

% You could also store it this way
dim_A = size(A)

% Get the dimension of the vector v 
dim_v = size(v)

% Now let's index into the 2nd row 3rd column of matrix A
A_23 = A(2,3)


A =

    1    2    3
    4    5    6
    7    8    9
   10   11   12

v =

   1
   2
   3

m =  4
n =  3
dim_A =

   4   3

dim_v =

   3   1

A_23 =  6


## Lesson 2
This lesson covered matrix addition, matrix dot products and matrix cross products.

To add a matrix, you simply add each element at the same index for each matrix, and place it into a new matrix at the same index. He clarifies that addition between matrices requires the same dimensionality.

To perform scalar multiplication, you simply multiply the constant against every element in the matrix, and place the result in a new matrix at the same index as the original matrix value's index.

To summarize:
### Addition
$\begin{bmatrix} a & b \\ c & d \end{bmatrix} + \begin{bmatrix} w & x \\ y & z \end{bmatrix} = \begin{bmatrix} a+w & b+x \\ c+y & d+z \end{bmatrix}$
### Subtraction
$\begin{bmatrix} a & b \\ c & d \\ \end{bmatrix} - \begin{bmatrix} w & x \\ y & z \\ \end{bmatrix} = \begin{bmatrix} a-w & b-x \\ c-y & d-z \end{bmatrix}$
### Scalar multiplication
$\begin{bmatrix} a & b \\ c & d \end{bmatrix} * x =\begin{bmatrix} a*x & b*x \\ c*x & d*x \end{bmatrix}$
### Scalar division
$\begin{bmatrix} a & b \\ c & d \end{bmatrix} / x =\begin{bmatrix} a /x & b/x \\ c /x & d /x \end{bmatrix}$

Andrew also gave us some Octave code to practice with:

In [2]:
% Initialize matrix A and B 
A = [1, 2, 4; 5, 3, 2]
B = [1, 3, 4; 1, 1, 1]

% Initialize constant s 
s = 2

% See how element-wise addition works
add_AB = A + B 

% See how element-wise subtraction works
sub_AB = A - B

% See how scalar multiplication works
mult_As = A * s

% Divide A by s
div_As = A / s

% What happens if we have a Matrix + scalar?
add_As = A + s


A =

   1   2   4
   5   3   2

B =

   1   3   4
   1   1   1

s =  2
add_AB =

   2   5   8
   6   4   3

sub_AB =

   0  -1   0
   4   2   1

mult_As =

    2    4    8
   10    6    4

div_As =

   0.50000   1.00000   2.00000
   2.50000   1.50000   1.00000

add_As =

   3   4   6
   7   5   4



## Lesson 3
This lesson covered matrix-vector cross products.

When muliplying a M x N matrix with a N x P matrix, what results is a M x P matrix. With a vector, which is always N x 1 in dimensionality, this means the result is always M x 1 in dimensionality.

The algorithm for cross products in linear algebra is simple: We map the column of the vector onto each row of the matrix, multiplying each element and summing the result.

$\begin{bmatrix} a & b \newline c & d \newline e & f \end{bmatrix} \times \begin{bmatrix} x \newline y \newline \end{bmatrix} =\begin{bmatrix} a*x + b*y \newline c*x + d*y \newline e*x + f*y\end{bmatrix}$


Andrew also shows us what we can do with this in a linear regression problem: given a set of housing sizes, and a hypothesis function predicting price from house size, we can state the prediction results as a matrix and vector cross product, which will yield the values we will later need for gradient descent. To state our problem you transform the set of house sizes into a N x 2 matrix, where N is the number of house sizes in the training set and each row includes the number 1 in the first column and the value of the house size in the second. You then define a P x 1 vector for your hypothesis function, where P is the number of parameters you have; you start with the first parameter in the top row and each subsequent parameter following, in order.

Octave allows us to express this operation very simply:

In [3]:
% Initialize matrix A 
A = [1, 2, 3; 4, 5, 6;7, 8, 9] 

% Initialize vector v 
v = [1; 1; 1] 

% Multiply A * v
Av = A * v

A =

   1   2   3
   4   5   6
   7   8   9

v =

   1
   1
   1

Av =

    6
   15
   24



## Lesson 4
In this, we dive into the cross product further, this time with full matrix x matrix multiplication.
Andrew opens the video explaining that, to solve a linear regression problem numerically (which he mentioned was possible before), we use matrix multiplication as part of the solution.

Matrix-matrix multiplication is the same as what we did before, but for each additional column in the second matrix, we perform the calculations with that second vector, or column-wise matrix slice, and store the results in the same location of the resultant matrix (very simple stuff).

One thing to always remember is that, to perform cross product operations, you have to ensure that the matrix operands match on their "adjacent" dimensional elements, that is, if matrix A is M x N, matrix B will need to be N x P in dimensionality for it to be possible for a cross product to be computed.

So, to solve the linear regression problem numerically, you do so by modelling your hypothesis functions as a matrix: each matrix or column slice stores the parameters for the particular hypothesis. Your training set is stored just as it was before: as a matrix, with 1 in the first column of every row (it's for the constant parameter) and the value of your house size in the second for each row.

This results in a matrix, in which each vector (again, column-wise slice) is the result set for the hypothesis for that column index.

This strategy has been very heavily optimized for in linear programming libraries, as matrix-matrix multiplication is very typical. Ultimately, though, I can see that, if I have a tremendously large training set, and I'm not working with a limited/bounded set of hypothesis functions, it will be computationally expensive, to the point of being infeasible, to identify my ideal hypothesis later with a cost function.

To restate the cross product for matrix multiplication:
$\begin{bmatrix} a & b \\ c & d \\ e & f \end{bmatrix} \times \begin{bmatrix} w & x \\ y & z \\ \end{bmatrix} =\begin{bmatrix} a*w + b*y & a*x + b*z \\ c*w + d*y & c*x + d*z \\ e*w + f*y & e*x + f*z\end{bmatrix}$

And, again, Andrew gave us some code to play with:

In [4]:
% Initialize a 3 by 2 matrix 
A = [1, 2; 3, 4;5, 6]

% Initialize a 2 by 1 matrix 
B = [1; 2] 

% We expect a resulting matrix of (3 by 2)*(2 by 1) = (3 by 1) 
mult_AB = A*B

% Make sure you understand why we got that result

A =

   1   2
   3   4
   5   6

B =

   1
   2

mult_AB =

    5
   11
   17



## Lesson 5

This lecture covers properties of the matrix cross product.

The first property covered is that the cross product is **not** commutative. That is, the order of the operands does matter for the cross product operator.

The second property is that the cross product **is** associative. 

There is also a special matrix, the identity matrix, or $\mathbb{I}$ which is associative when multiplying it with another matrix, but the result of the operation is also simply the original matrix multiplied with $\mathbb{I}$.

Below is some code to play with.

In [5]:
% Initialize random matrices A and B 
A = [1,2;4,5]
B = [1,1;0,2]

% Initialize a 2 by 2 identity matrix
I = eye(2)

% The above notation is the same as I = [1,0;0,1]

% What happens when we multiply I*A ? 
IA = I*A 

% How about A*I ? 
AI = A*I 

% Compute A*B 
AB = A*B 

% Is it equal to B*A? 
BA = B*A 

% Note that IA = AI but AB != BA

A =

   1   2
   4   5

B =

   1   1
   0   2

I =

Diagonal Matrix

   1   0
   0   1

IA =

   1   2
   4   5

AI =

   1   2
   4   5

AB =

    1    5
    4   14

BA =

    5    7
    8   10



## Lesson 6
This lecture covers the matrix inverse and matrix transpose operations for manipulating matrices.

Andrew covers the concept of an inverse for scalar numbers (he only covers inverse as it relates to multiplication), which is a number that, when multiplied by the original, produces identity, or 1.
Matrices have inverses as well, such that $\mathbb{A}(\mathbb{A}^{-1}) = \mathbb{A}^{-1}\mathbb{A} = \mathbb{I}$
You can solve for a matrix's inverse manually by hand, but there is a lot of linear programming sofware that does this for us, including in Octave.
Ultimately, not every matrix has an inverse, for the same reason not every scalar number has an inverse under scalar multiplication, e.g. 0

Transpose is hella easy. I will skip this section of the video, and just remind myself: transpose is where you take a matrix, and make every row a column, and every column a row. Not the clearest or most correct explanation of what it is, but I know what matrix transposition is, and I don't need a reminder. Ugh, writing this up took enough time for Andrew to give a decent explanation: it's a mirroring along the $45^\circ$ line.

Andrew's notes:
$$A = \begin{bmatrix} a & b \newline c & d \newline e & f \end{bmatrix}$$
$$A^T = \begin{bmatrix} a & c & e \newline b & d & f \newline \end{bmatrix}$$
$$A_{ij} = A^T_{ji}$$

Below is some sample code:

In [None]:
% Initialize matrix A 
A = [1,2,0;0,5,6;7,0,9]

% Transpose A 
A_trans = A' 

% Take the inverse of A 
A_inv = inv(A)

% What is A^(-1)*A? 
A_invA = inv(A)*A