# Linear algebra

In [1]:
import numpy as np

In [2]:
x = np.array([3.0, 4.0])
y = np.array([2.0, 1.0])

print('x + y = ', x + y)
print('x * y = ', x * y)
print('x / y = ', x / y)
print('x ** y = ', np.power(x,y))

x + y =  [5. 5.]
x * y =  [6. 4.]
x / y =  [1.5 4. ]
x ** y =  [9. 4.]


## Vectors

Vectors are e.g. ``[1.0,3.0,4.0,2.0]``. We use 1D NDArrays.

In [3]:
x = np.arange(12)
print('x = ', x) 

x =  [ 0  1  2  3  4  5  6  7  8  9 10 11]


In [4]:
x[3]

3

## Length, dimensionality and shape

The length of a vector is commonly called its $dimension$. As with an ordinary Python array, we can access the length of an NDArray
by calling Python's in-built ``len()`` function.

We can also access a vector's length via its `.shape` attribute.
The shape is a tuple that lists the dimensionality along each of its axes.

In [5]:
x.shape

(12,)

We can also get the total number of elements in the NDArray instance through the size property.

In [6]:
x.size

12

The reshape function change the shape of the line vector $x$ to (3, 4), which is a matrix of 3 rows and 4 columns.

In [7]:
x = x.reshape((3, 4))
x

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

The word dimension is overloaded between number of axes and number of elements. **To avoid confusion, when we say *2D* array or *3D* array, we mean an array with 2 or 3 axes respectively. But if we say *$n$-dimensional* vector, we mean a vector of length $n$.**

In [8]:
a = 2
x = np.array([1, 2, 3])
y = np.array([10, 20, 30])
print(a * x)
print(a * x + y)

[2 4 6]
[12 24 36]


## Matrices

Just as vectors generalize scalars from order $0$ to order $1$,
matrices generalize vectors from $1D$ to $2D$.
Matrices, which we'll typically denote with capital letters ($A$, $B$, $C$),
are represented in code as arrays with 2 axes.
Visually, we can draw a matrix as a table,
where each entry $a_{ij}$ belongs to the $i$-th row and $j$-th column.


$$A=\begin{pmatrix}
 a_{11} & a_{12} & \cdots & a_{1m} \\
 a_{21} & a_{22} & \cdots & a_{2m} \\
\vdots & \vdots & \ddots & \vdots \\
 a_{n1} & a_{n2} & \cdots & a_{nm} \\
\end{pmatrix}$$

In [9]:
print(np.arange(10))
A = np.arange(20).reshape((5,4))
print(A)

[0 1 2 3 4 5 6 7 8 9]
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]
 [16 17 18 19]]


We can access elements $a_{ij}$ by specifying row $i$ and column $j$. Leaving them blank selects via `:` takes all ements in the respective dimension. 

We can transpose the matrix through `T`. That is, if $B = A^T$, then $b_{ij} = a_{ji}$ for any $i$ and $j$.

In [10]:
print(A.T)

[[ 0  4  8 12 16]
 [ 1  5  9 13 17]
 [ 2  6 10 14 18]
 [ 3  7 11 15 19]]


## Tensors

Just as vectors generalize scalars, and matrices generalize vectors, we can increase the number of axes. When working with images the axes correspond to the height, width, and the three (RGB) color channels.

In [11]:
X = np.arange(24).reshape((2, 3, 4))
print('X.shape =', X.shape)
print('X =', X)

X.shape = (2, 3, 4)
X = [[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]


## Basic properties of tensor arithmetic

Given two tensors $X$ and $Y$ with the same shape,
$\alpha X + Y$ has the same shape
(numerical mathematicians call this the AXPY operation).

In [12]:
a = 2
x = np.ones(3)
y = np.zeros(3)
print(x.shape)
print(y.shape)
print((a * x).shape)
print((a * x + y).shape)

(3,)
(3,)
(3,)
(3,)


## Sums and means

In math we express sums using the $\sum$ symbol.
To express the sum of the elements in a vector $\mathbf{u}$ of length $d$,
we can write $\sum_{i=1}^d u_i$. In code, we can just call ``nd.sum()``.

In [13]:
print(x)
print(np.sum(x))

[1. 1. 1.]
3.0


We can similarly express sums over the elements of tensors of arbitrary shape. For example, the sum of the elements of an $m \times n$ matrix $A$ could be written $\sum_{i=1}^{m} \sum_{j=1}^{n} a_{ij}$.

In [14]:
print(A)
print(np.sum(A,0))

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]
 [16 17 18 19]]
[40 45 50 55]


A related quantity is the *mean*. 
We calculate the mean by dividing the sum by the total number of elements. In code this is ``np.mean()``.

$$\mathrm{mean}(\mathbf{u}) = \frac{1}{d} \sum_{i=1}^{d} u_i \text{ and }
\mathrm{mean}(A) = \frac{1}{n \cdot m} \sum_{i=1}^{m} \sum_{j=1}^{n} a_{ij}$$

In [15]:
print(np.mean(A))
print(np.sum(A) / A.size)

9.5
9.5


## Dot products

Given two vectors $\mathbf{u}$ and $\mathbf{v}$, the dot product $\mathbf{u}^T \mathbf{v}$ is a sum over the products of the corresponding elements: $\mathbf{u}^T \mathbf{v} = \sum_{i=1}^{d} u_i \cdot v_i$.

In [16]:
x = np.arange(4) + 1.0
y = np.ones(4)
print(x, y, np.dot(x, y))

[1. 2. 3. 4.] [1. 1. 1. 1.] 10.0


Note that we can express the dot product of two vectors ``np.dot(u, v)`` equivalently by performing an element-wise multiplication and then a sum:

In [17]:
np.sum(x * y)

10.0

## Matrix-vector products

$$A=\begin{pmatrix}
 a_{11} & a_{12} & \cdots & a_{1m} \\
 a_{21} & a_{22} & \cdots & a_{2m} \\
\vdots & \vdots & \ddots & \vdots \\
 a_{n1} & a_{n2} & \cdots & a_{nm} \\
\end{pmatrix},\quad\mathbf{x}=\begin{pmatrix}
 x_{1}  \\
 x_{2} \\
\vdots\\
 x_{m}\\
\end{pmatrix} $$

$$A\mathbf{x}=
\begin{pmatrix}
\cdots & \mathbf{a}^T_{1} &...  \\
\cdots & \mathbf{a}^T_{2} & \cdots \\
 & \vdots &  \\
 \cdots &\mathbf{a}^T_n & \cdots \\
\end{pmatrix}
\begin{pmatrix}
 x_{1}  \\
 x_{2} \\
\vdots\\
 x_{m}\\
\end{pmatrix}
= \begin{pmatrix}
 \mathbf{a}^T_{1} \mathbf{x}  \\
 \mathbf{a}^T_{2} \mathbf{x} \\
\vdots\\
 \mathbf{a}^T_{n} \mathbf{x}\\
\end{pmatrix}
$$

So you can think of multiplication by a matrix $A\in \mathbb{R}^{m \times n}$ as a transformation that projects vectors from $\mathbb{R}^{m}$ to $\mathbb{R}^{n}$.


We can also use matrix-vector products to describe the calculations of each layer in a neural network.

Expressing matrix-vector products in code with ``ndarray``, we use the same ``np.dot()`` function as for dot products. 

In [18]:
A = A.reshape((5,4))
A

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19]])

In [19]:
x

array([1., 2., 3., 4.])

In [20]:
np.dot(A, x)

array([ 20.,  60., 100., 140., 180.])

## Matrix-matrix multiplication

If you've gotten the hang of dot products and matrix-vector multiplication, then matrix-matrix multiplications should be pretty straightforward.

Say we have two matrices, $A \in \mathbb{R}^{n \times k}$ and $B \in \mathbb{R}^{k \times m}$:

$$A=\begin{pmatrix}
 a_{11} & a_{12} & \cdots & a_{1k} \\
 a_{21} & a_{22} & \cdots & a_{2k} \\
\vdots & \vdots & \ddots & \vdots \\
 a_{n1} & a_{n2} & \cdots & a_{nk} \\
\end{pmatrix},\quad
B=\begin{pmatrix}
 b_{11} & b_{12} & \cdots & b_{1m} \\
 b_{21} & b_{22} & \cdots & b_{2m} \\
\vdots & \vdots & \ddots & \vdots \\
 b_{k1} & b_{k2} & \cdots & b_{km} \\
\end{pmatrix}$$

$$AB = \begin{pmatrix}
\cdots & \mathbf{a}^T_{1} &...  \\
\cdots & \mathbf{a}^T_{2} & \cdots \\
 & \vdots &  \\
 \cdots &\mathbf{a}^T_n & \cdots \\
\end{pmatrix}
\begin{pmatrix}
\vdots & \vdots &  & \vdots \\
 \mathbf{b}_{1} & \mathbf{b}_{2} & \cdots & \mathbf{b}_{m} \\
 \vdots & \vdots &  &\vdots\\
\end{pmatrix}
= \begin{pmatrix}
\mathbf{a}^T_{1} \mathbf{b}_1 & \mathbf{a}^T_{1}\mathbf{b}_2& \cdots & \mathbf{a}^T_{1} \mathbf{b}_m \\
 \mathbf{a}^T_{2}\mathbf{b}_1 & \mathbf{a}^T_{2} \mathbf{b}_2 & \cdots & \mathbf{a}^T_{2} \mathbf{b}_m \\
 \vdots & \vdots & \ddots &\vdots\\
\mathbf{a}^T_{n} \mathbf{b}_1 & \mathbf{a}^T_{n}\mathbf{b}_2& \cdots& \mathbf{a}^T_{n} \mathbf{b}_m
\end{pmatrix}
$$

You can think of the matrix-matrix multiplication $AB$ as simply performing $m$ matrix-vector products and stitching the results together.

In [21]:
B = np.ones(shape=(4, 3))
np.dot(A, B)

array([[ 6.,  6.,  6.],
       [22., 22., 22.],
       [38., 38., 38.],
       [54., 54., 54.],
       [70., 70., 70.]])

## Norms

All norms must satisfy a handful of properties:

 $$|\alpha A| = |\alpha| |A|$$
 $$|A + B| \leq |A| + |B|$$
$$|A| \geq 0$$
$$\text{If} \; \forall {i,j}, a_{ij} = 0,  \; \text{then} \; |A| = 0$$

To calculate the $\ell_2$ norm, we can just call ``np.linalg.norm()``.

In [22]:
np.linalg.norm(x, 2)

5.477225575051661

To calculate the $\ell_1$-norm we can simply perform the absolute value and then sum over the elements.

In [23]:
np.sum(np.abs(x))

10.0

Or just call 

In [24]:
np.linalg.norm(x, 1)

10.0