# Linear Algebra

In [3]:
from mxnet import nd

## Scalars
Single numbers represented by lower case letters ($x$, $y$, $z$).  The space of all 
scalars is denoted as $\mathcal{R}$. If you want to denote x as a scalar can say $x \in \mathcal{R}$.
$\in$ means in and signifies membership of a set.


In [3]:
# Addition of single element NDArray(Scalars)
x = nd.array([3.0])
y = nd.array([2.0])

print('x + y = ', x + y)
print('x * y = ', x * y)
print('x / y = ', x / y)
print('x ** y = ', nd.power(x,y))

# can convert to Python value using .asscalar()
x.asscalar()

x + y =  
[5.]
<NDArray 1 @cpu(0)>
x * y =  
[6.]
<NDArray 1 @cpu(0)>
x / y =  
[1.5]
<NDArray 1 @cpu(0)>
x ** y =  
[9.]
<NDArray 1 @cpu(0)>


3.0

## Vectors
Vectors are just an array of scalars, each scalar is an entry or component of the vector.  Vectors are 
denoted as bold lower case letters ($\mathbf{u}$, $\mathbf{v}$, $\mathbf{w})$.  In MXNet they are 1D arrays.


In [4]:
x = nd.arange(4)
print('x = ', x)


x =  
[0. 1. 2. 3.]
<NDArray 4 @cpu(0)>


In [5]:
# you can index into any vector
x[3]


[3.]
<NDArray 1 @cpu(0)>

## Length, Dimensionality and Shape
vector $\mathbf{x}$ consists of $n$ scalars and can be expressed as $\mathbf{x} \in \mathcal{R}^n$. Length and dimension 
and synonymous in describing vectors. Length can be accessed with ``len()`` function or using `.shape` attribute

In [6]:
x.shape



(4,)

## Matrices
- Vectors generalize scalars from order $0$ to order $1$.
- Matrices generalize vectors from $1D$ to $2D$.
- Matrices are denoted with capitals ($A$, $B$, $C$) and are arrays with 2 axes.
- Each entry $a_{ij}$ belongs to the $i$-th row and $j$-th column.

$$A=\begin{pmatrix}
 a_{11} & a_{12} & \cdots & a_{1m} \\
 a_{21} & a_{22} & \cdots & a_{2m} \\
\vdots & \vdots & \ddots & \vdots \\
 a_{n1} & a_{n2} & \cdots & a_{nm} \\
\end{pmatrix}$$

to create specify a shape with two components `(n,m)`

In [8]:
A = nd.arange(20).reshape((5,4))
print(A)


[[ 0.  1.  2.  3.]
 [ 4.  5.  6.  7.]
 [ 8.  9. 10. 11.]
 [12. 13. 14. 15.]
 [16. 17. 18. 19.]]
<NDArray 5x4 @cpu(0)>


- can access the scalar elements $a_{ij}$ of a matrix $A$ by specifying the indices for the row ($i$) and column ($j$).
- Can transpose with `T`.
- if $B = A^T$, then $b_{ij} = a_{ji}$ for any $i$ and $j$

In [9]:
print(A.T)




[[ 0.  4.  8. 12. 16.]
 [ 1.  5.  9. 13. 17.]
 [ 2.  6. 10. 14. 18.]
 [ 3.  7. 11. 15. 19.]]
<NDArray 4x5 @cpu(0)>


## Tensors
Generic way of discussing arrays with arbitrary number of axes, vectors are 1st order tensors Matrices are 2nd order.

In [10]:
X = nd.arange(24).reshape((2, 3, 4))
print('X.shape = ', X.shape)
print('X = ', X)



X.shape =  (2, 3, 4)
X =  
[[[ 0.  1.  2.  3.]
  [ 4.  5.  6.  7.]
  [ 8.  9. 10. 11.]]

 [[12. 13. 14. 15.]
  [16. 17. 18. 19.]
  [20. 21. 22. 23.]]]
<NDArray 2x3x4 @cpu(0)>


## Basic properties of tensor Arithmetic
- the results of an element-wise operation on operands with the same shape, is a resulting tensor of the same shape.
- Multiplication by a scalar also produces a tensor of the same shape.
- In math, given two tensors $X$ and $Y$ with the same shape, $\alpha X + Y$ has the same shape (numerical mathematicians call this the AXPY operation).

In [11]:
a = 2
x = nd.ones(3)
y = nd.zeros(3)
print(x.shape)
print(y.shape)
print((a * x).shape)
print((a * x + y).shape)


(3,)
(3,)
(3,)
(3,)


## Sums and Means
- expressed using $\sum$ symbol.
- the sum of the elements in a vector $\mathbf{u}$ of length $d$,
we can write $\sum_{i=1}^d u_i$
- can use ``nd.sum()``

In [12]:
print(x)
print(nd.sum(x))


[1. 1. 1.]
<NDArray 3 @cpu(0)>

[3.]
<NDArray 1 @cpu(0)>


- The sum of the elements of an $m \times n$ matrix $A$ is written $\sum_{i=1}^{m} \sum_{j=1}^{n} a_{ij}$.

In [13]:
print(A)
print(nd.sum(A))


[[ 0.  1.  2.  3.]
 [ 4.  5.  6.  7.]
 [ 8.  9. 10. 11.]
 [12. 13. 14. 15.]
 [16. 17. 18. 19.]]
<NDArray 5x4 @cpu(0)>

[190.]
<NDArray 1 @cpu(0)>


- Calculate the mean by dividing the sum by the number of elements
- For a vector: $\mathbf{u}$ as $\frac{1}{d} \sum_{i=1}^{d} u_i$
- For a matrix: $A$ as  $\frac{1}{n \cdot m} \sum_{i=1}^{m} \sum_{j=1}^{n} a_{ij}$.
- use ``nd.mean()`` on any tensor

In [14]:
print(nd.mean(A))
print(nd.sum(A) / A.size)


[9.5]
<NDArray 1 @cpu(0)>

[9.5]
<NDArray 1 @cpu(0)>


## Dot Products
Given two vectors $\mathbf{u}$ and $\mathbf{v}$, the dot product $\mathbf{u}^T \mathbf{v}$ is a sum over the products of the corresponding elements: $\mathbf{u}^T \mathbf{v} = \sum_{i=1}^{d} u_i \cdot v_i$.



In [16]:
x = nd.arange(4)
y = nd.ones(4)
print(x, y, nd.dot(x, y))

# For two vectors this is the same as 
nd.sum(x * y)


[0. 1. 2. 3.]
<NDArray 4 @cpu(0)> 
[1. 1. 1. 1.]
<NDArray 4 @cpu(0)> 
[6.]
<NDArray 1 @cpu(0)>



[6.]
<NDArray 1 @cpu(0)>

- given a set of weights $\mathbf{w}$, the weighted sum of some values ${u}$ could be expressed as the dot product $\mathbf{u}^T \mathbf{w}$. When the weights are non-negative and sum to one $\left(\sum_{i=1}^{d} {w_i} = 1\right)$, the dot product expresses a *weighted average*

## Matrix-vector products

$$A=\begin{pmatrix}
 a_{11} & a_{12} & \cdots & a_{1m} \\
 a_{21} & a_{22} & \cdots & a_{2m} \\
\vdots & \vdots & \ddots & \vdots \\
 a_{n1} & a_{n2} & \cdots & a_{nm} \\
\end{pmatrix},\quad\mathbf{x}=\begin{pmatrix}
 x_{1}  \\
 x_{2} \\
\vdots\\
 x_{m}\\
\end{pmatrix} $$

you can express a matrix by its row vectors:

$$A=
\begin{pmatrix}
\mathbf{a}^T_{1} \\
\mathbf{a}^T_{2} \\
\vdots \\
\mathbf{a}^T_n \\
\end{pmatrix},$$

where each $\mathbf{a}^T_{i} \in \mathbb{R}^{m}$
is a row vector representing the $i$-th row of the matrix $A$.

Then the matrix vector product $\mathbf{y} = A\mathbf{x}$ is simply a column vector $\mathbf{y} \in \mathbb{R}^n$ where each entry $y_i$ is the dot product $\mathbf{a}^T_i \mathbf{x}$.

$$A\mathbf{x}=
\begin{pmatrix}
\mathbf{a}^T_{1}  \\
\mathbf{a}^T_{2}  \\
 \vdots  \\
\mathbf{a}^T_n \\
\end{pmatrix}
\begin{pmatrix}
 x_{1}  \\
 x_{2} \\
\vdots\\
 x_{m}\\
\end{pmatrix}
= \begin{pmatrix}
 \mathbf{a}^T_{1} \mathbf{x}  \\
 \mathbf{a}^T_{2} \mathbf{x} \\
\vdots\\
 \mathbf{a}^T_{n} \mathbf{x}\\
\end{pmatrix}
$$

- Can think of multiplication by a matrix $A\in \mathbb{R}^{n \times m}$ as a transformation that projects vectors from $\mathbb{R}^{m}$ to $\mathbb{R}^{n}$.
- We can represent rotations as multiplications by a square matrix. And can also use matrix-vector products to describe the calculations of each layer in a neural network.
- In code with ``ndarray``, we use the same ``nd.dot()`` function as for dot products. 
- ``nd.dot(A, x)`` with a matrix ``A`` and a vector ``x``, MXNet knows to perform a matrix-vector product. Column dimension of ``A`` must be the same as the dimension of ``x``.

In [None]:
nd.dot(A, x)
