*Credit*: some material here has been adapted from [Sam Roweis](https://www.cs.nyu.edu/home/people/in_memoriam/samroweis.html)' [Linear Algebra Review](http://www.cs.ubc.ca/~murphyk/Teaching/Papers/roweis_linAlgebra.ps).


In [0]:
import numpy as np
import matplotlib.pyplot as plt

# Vectors and Matrices

Linear algebra is the study of vectors and matrices and how they can be manipulated to perform various calculations. 

We will start by exploring some simple operations on vectors and matrices.

## Addition and scaling

Adding up two vectors or two matrices is easy: just add their corresponding elements (of course the two must be the same shape!):

In [0]:
x = np.array([1, 2, 3])
y = np.ones(3, )
print(x)
print(y)
print(x + y)

Note that when printing vectors, NumPy will format them as row vectors. But to stay consistent with [Mathematics of Machine Learning](https://mml-book.github.io/), for vectors where we are not explicit about whether they are a row- or column-vector, that is $\mathbf{x} \in \mathbb{R}^n$ as opposed to $\mathbf{x} \in \mathbb{R}^{1 \times n}$ or $\mathbf{x} \in \mathbb{R}^{n \times 1}$, we will call these column vectors.

Do you know why there is a decimal point following the elements of the second and third vectors above?

In [0]:
A = np.array([[1, 2, 3], [4, 5, 6]])
B = np.arange(6).reshape(A.shape)
print(A)
print(B)
print(A + B)

Multiplying a vector or matrix by a scalar just multiplies each element by the scalar:

In [0]:
print(2 * x)
print(0.5 * A)

## Matrix-vector and matrix-matrix multiplication

A good way to think of an $n \times m$ matrix $A$ is as a machine that eats $m$ sized vectors and spits out $n$ sized vectors. This conversion process is known as (left) multiplying by $A$ and has many similarities to scalar multiplication, but also a few differences. Most importantly, the machine only accepts inputs of the right size.

In [0]:
# In Numpy, both matrix-vector and matrix-matrix multiplication is performed by np.dot
print(A.shape)
print(y.shape)
print(np.dot(A, x))
print(A.dot(x))

In [0]:
print(np.dot(A, x[:2]))  # not compatible sizes

Like scalar multiplication, matrix multiplication is *distributive* and *associative*:

$$
\begin{aligned}
A(\mathbf{x} + \mathbf{y}) & = A\mathbf{x} + A\mathbf{y}\\
 B(A\mathbf{x}) & = (BA)\mathbf{x}
\end{aligned}
$$

One way to think of this is that the matrix product $BA$ is the equivalent linear operator you get if you compose the action of $A$ followed by the action of $B$.

Matrix-matrix multiplication can be thought of as a sequence of matrix-vector multiplications, one for each column, whose results get stacked beside each other in columns to form a new matrix. In general, we can think of column vectors of length $n$ as just $n \times 1$ and row vectors as $1 \times n$ matrices. This eliminates any distinction between matrix-matrix and matrix-vector multiplication.

In [0]:
print(np.dot(A, A.T))

Note that in the above, we flipped or "transposed" the matrix. This interchanges the rows and columns, and in the example above, made the shapes compatible for matrix-matrix multiplication.

In [0]:
print(A.shape)
print(A.T.shape)

Unlike scalar multiplication, matrix multiplication is not *commutative*:

$$ A \mathbf{x} \neq \mathbf{x} A $$

## Inverses

First, let's consider the concept of reversing or undoing or *inverting* the function represented by a matrix $A$. For a function to be invertible, there needs to be a one-to-one relationship between inputs and outputs so that given the output you can always say exactly what the input was. In other words, we need a function which, when composed with $A$ gives back the original vector. Such a function -- if it exists -- is called the *inverse* of $A$ and the matrix is denoted $A^{-1}$.

In matrix terms, we seek a matrix that left multiplies $A$ to give the identity matrix:

$$A^{-1}A = I$$

The identity matrix, $I_{ij} = \delta_{ij}$ corresponds to the identity (do-nothing) function.

Only a few, special linear functions are invertible. 

* They must have at least as many outputs as inputs
* They must not map any two inputs to the same output

Technically this means that they must have *full rank*, a concept which we will get to later.

Non-square matrices ($m$-by-$n$ matrices for which $m \neq n$) technically do not have an inverse. However, in some cases such a matrix may have a *left inverse* or *right inverse* (but not both). If A is $m$-by-$n$ and the rank of $A$ is equal to $n$, then $A$ has a left inverse: an $n$-by-$m$ matrix $B$ such that $BA = I$. If $A$ has rank $m$, then it has a right inverse: an $n$-by-$m$ matrix $B$ such that $AB = I$. 

In [0]:
C = A.dot(A.T)  # A trick to make an invertible matrix
C_inv = np.linalg.inv(C)
print(C)
print(C_inv) 
print(C_inv.dot(C))

Why are the entries of the result not exactly zero?

## Einsum

[Mathematics of Machine Learning](https://mml-book.github.io/) gives an example of using the ``np.einsum`` function. Einsum (short for Einstein summation) is implemented in NumPy, as well as deep learning libraries such as TensorFlow and PyTorch.

Einsum is a little-known but extremely general function, which is a compact and elegant way to specify almost any product of scalars, vectors, matrices and their higher-order generalizations, tensors.

Let's walk through the example given in the book (we also formulate matrices $A$ and $B$):

In [0]:
A = np.array([[1, 2, 3], [4, 5, 6]])
B = np.arange(6).reshape(A.T.shape)
C = np.einsum('il,lj', A, B)
print(A)
print(B)
print(C)

This is short form for:

In [0]:
C = np.einsum('il,lj->ij', A, B)
print(C)

Einsum is invoked with a format string (the first argument), and any number of NumPy tensors as the following arguments. It returns a result tensor.

The format string contains commas which separate the specifications of the arguments, as well as an arrow ``->`` which separates the specifications of the input arguments from the specifications of the result tensor.

The specifications of the input arguments and result tensors are a series of alphabetical ASCII characters. The number of characters in an argument's specification is exactly equal to its dimensions.

So here, ``il`` refers to the dimensions of matrix $A$, ``lj`` refers to the dimensions of matrix $B$ (note that they are compatible for multiplication) and ``ij`` refers to the dimensions of the output matrix $C$.

This particular format string defines a matrix multiplication between $A$ and $B$ to produce $C$.

We can do much more complicated expressions, like batched outer products, like so:


In [0]:
P = np.random.rand(4, 2)
Q = np.random.rand(4, 3)
R = np.einsum('bi,bj->bij', P, Q)
print(P)
print(Q)
print(R)

This Einsum computes the outper product of the $b$th row of $P$ and the $b$th row of $Q$ for all $b$. This is difficult (and bug-prone) to do with other functions.

### Further reading

For a basic introduction to NumPy's Einsum and a list of simple operations achieved by it, see [Alex Riley's blog](http://ajcr.net/Basic-guide-to-einsum/). For a much more detailed introduction to Einsum and a discussion of its inner workings, see [Olexa Bilaniuk's blog](https://obilaniu6266h16.wordpress.com/2016/02/04/einstein-summation-in-numpy/).  For a more advanced primer of its use in TensorFlow and PyTorch, see [Tim Rocktäschel's blog](https://rockt.github.io/2018/04/30/einsum).