# [Linear Algebra](http://gluon.mxnet.io/chapter01_crashcourse/linear-algebra.html)

Now that you can store and manipulate data, let’s briefly review the subset of basic linear algebra that you’ll need to understand most of the models.  
We’ll introduce all the basic concepts, the corresponding mathematical notation, and their realization in code all in one place.  
If you’re already confident in your basic linear algebra, feel free to skim or skip this chapter.  
Also, here is a [LaTex cheat sheet](https://users.dickinson.edu/~richesod/latex/latexcheatsheet.pdf) for the mathematical notation as well as a [MathJax Quick Reference](https://math.meta.stackexchange.com/questions/5020/mathjax-basic-tutorial-and-quick-reference).

In [1]:
from mxnet import nd

## Scalars

If you never studied linear algebra or machine learning, you’re probably used to working with one number at a time.  You also know how to do basic things like add them together or multiply them.  
For example, in Palo Alto, the temperature is $52$ degrees Fahrenheit.  
Formally, we call these values scalars.  
If you wanted to convert this value to Celsius, you’d evaluate the expression $c = (f - 32) * 5/9$ setting $f$ to $52$.  
In this equation, each of the terms 32, 5, and 9 is a scalar value.  
The placeholders $c$ and $f$ that we use are called variables and they stand in for unknown scalar values.

In mathematical notation, we represent scalars with ordinary lower cased letters ($x$, $y$, $z$).  
We also denote the space of all scalars as $\mathcal{R}$.  
For expedience, we’re going to punt a bit on what precisely a space is, but for now, remember that if you want to say that $x$ is a scalar, you can simply say $x \in \mathcal{R}$.  
The symbol $\in$ can be pronounced "in" and just denotes membership in a set.

In MXNet, we work with scalars by creating NDArrays with just one element.  
In this snippet, we instantiate two scalars and perform some familiar arithmetic operations with them.

In [2]:
##########################
# Instantiate two scalars
##########################
x = nd.array([3.0])
y = nd.array([2.0])

##########################
# Add them
##########################
print('x + y = ', x + y)

##########################
# Multiply them
##########################
print('x * y = ', x * y)

##########################
# Divide x by y
##########################
print('x / y = ', x / y)

##########################
# Raise x to the power y
##########################
print('x ** y = ', nd.power(x, y))

x + y =  
[ 5.]
<NDArray 1 @cpu(0)>
x * y =  
[ 6.]
<NDArray 1 @cpu(0)>
x / y =  
[ 1.5]
<NDArray 1 @cpu(0)>
x ** y =  
[ 9.]
<NDArray 1 @cpu(0)>


We can convert any NDArray to a Python float by calling its `asscalar` method.

In [3]:
x.asscalar()

3.0

## Vectors

You can think of a vector as simply a list of numbers, for example `[1.0,3.0,4.0,2.0]`.  
Each of the numbers in the vector consists of a single scalar value.  
We call these values the *entries* or *components* of the vector.  
Often, we’re interested in vectors whose values hold some real-world significance.  
For example, if we’re studying the risk that loans default, we might associate each applicant with a vector whose components correspond to their income, length of employment, number of previous defaults, etc.  
If we were studying the risk of heart attacks in hospital patients, we might represent each patient with a vector whose components capture their most recent vital signs, cholesterol levels, minutes of exercise per day, etc.  
In math notation, we’ll usually denote vectors as bold-faced, lower-cased letters ($\mathbf{u}$, $\mathbf{v}$, $\mathbf{w})$.  
In MXNet, we work with vectors via 1D NDArrays with an arbitrary number of components.

In [4]:
u = nd.arange(4)
print('u = ', u)

u =  
[ 0.  1.  2.  3.]
<NDArray 4 @cpu(0)>


We can refer to any element of a vector by using a subscript.  
For example, we can refer to the 4th element of $\mathbf{u}$ by $u_4$.  
Note that the element $u_4$ is a scalar, so we don’t bold-face the font when referring to it.  
In code, we access any element $i$ by indexing into the NDArray.

In [5]:
u[3]


[ 3.]
<NDArray 1 @cpu(0)>

## Length, Dimensionality, and Shape

A vector is just an array of numbers.  
And just as every array has a length, so does every vector.  
In math notation, if we want to say that a vector $x$ consists of n real-valued scalars, we can express this as $\mathbf{x} \in \mathcal{R}^n$.  
The length of a vector is commonly called its $dimension$.  
As with an ordinary Python array, we can access the length of an NDArray by calling Python’s in-built `len()` function.

In [6]:
len(u)

4

We can also access a vector’s length via its `.shape` attribute.  
The shape is a tuple that lists the dimensionality of the NDArray along each of its axes.  
Because a vector can only be indexed along one axis, its shape has just one element.

In [7]:
u.shape

(4,)

Note that the word dimension is overloaded and this tends to confuse people.  
Some use the *dimensionality* of a vector to refer to its length (the number of components).  
However some use the word *dimensionality* to refer to the number of axes that an array has.  
In this sense, a scalar would have 0 dimensions and a vector would have 1 dimension.  
**To avoid confusion, when we say *2D* array or *3D* array, we mean an array with 2 or 3 axes respectively.  
But if we say *:math:\`n\`-dimensional* vector, we mean a vector of length :math:\`n\`.**

In [8]:
a = 2
x = nd.array([1,2,3])
y = nd.array([10,20,30])

In [9]:
a * x


[ 2.  4.  6.]
<NDArray 3 @cpu(0)>

In [10]:
a * x + y


[ 12.  24.  36.]
<NDArray 3 @cpu(0)>

## Matrices

Just as vectors generalize scalars from order 0 to order 1, matrices generalize vectors from 1D to 2D.  
Matrices, which we’ll denote with capital letters $(A, B, C)$, are represented in code as arrays with 2 axes.  
Visually, we can draw a matrix as a table, where each entry $X_{ij}$ belongs to the $i_{th}$ row and $j_{th}$ column.

$X = \begin{bmatrix}
    x_{11} & x_{12} & x_{13} & \dots  & x_{1n} \\
    x_{21} & x_{22} & x_{23} & \dots  & x_{2n} \\
    \vdots & \vdots & \vdots & \ddots & \vdots \\
    x_{d1} & x_{d2} & x_{d3} & \dots  & x_{dn}
\end{bmatrix}$

We can create a matrix with $n$ rows and $m$ columns in MXNet by specifying a shape with two components `(n,m)` when calling any of our favorite functions for instantiating an `ndarray` such as `ones`, or `zeros`.

In [11]:
A = nd.zeros((5, 4))
A


[[ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]]
<NDArray 5x4 @cpu(0)>

We can also reshape any 1D array into a 2D ndarray by calling `ndarray`’s `reshape` method and passing in the desired shape.  
Note that the product of shape components $n * m$ must be equal to the length of the original vector.

In [12]:
x = nd.arange(20)
x


[  0.   1.   2.   3.   4.   5.   6.   7.   8.   9.  10.  11.  12.  13.  14.
  15.  16.  17.  18.  19.]
<NDArray 20 @cpu(0)>

In [13]:
A = x.reshape((5, 4))
A


[[  0.   1.   2.   3.]
 [  4.   5.   6.   7.]
 [  8.   9.  10.  11.]
 [ 12.  13.  14.  15.]
 [ 16.  17.  18.  19.]]
<NDArray 5x4 @cpu(0)>

Matrices are useful data structures: they allow us to organize data that have different modalities of variation.  
For example, returning to the example of medical data, rows in our matrix might correspond to different patients, while columns might correspond to different attributes.  
We can access the scalar elements $a_{ij}$ of a matrix $A$ by specifying the indices for the row $(i)$ and column $(j)$ respectively.  
Let’s grab the element $a_{2,3}$ from the random matrix we initialized above.

In [14]:
print('A[2, 3] = ', A[2, 3])

A[2, 3] =  
[ 11.]
<NDArray 1 @cpu(0)>


We can also grab the vectors corresponding to an entire row $a_{[i,:]}$ or a column $a_{[:,j]}$.

In [15]:
print('row 2', A[2, :])

row 2 
[  8.   9.  10.  11.]
<NDArray 4 @cpu(0)>


In [16]:
print('column 3', A[:, 3])

column 3 
[  3.   7.  11.  15.  19.]
<NDArray 5 @cpu(0)>


We can transpose the matrix through `T`.  
That is, if $B = A^T$, then $b_{ij}$ = $a_{ji}$ for any $i$ and $j$.

In [17]:
A.T


[[  0.   4.   8.  12.  16.]
 [  1.   5.   9.  13.  17.]
 [  2.   6.  10.  14.  18.]
 [  3.   7.  11.  15.  19.]]
<NDArray 4x5 @cpu(0)>

## Tensors

Just as vectors generalize scalars, and matrices generalize vectors, we can actually build data structures with even more axes.  
Tensors give us a generic way of discussing arrays with an arbitrary number of axes.  
Vectors, for example, are first-order tensors, and matrices are second-order tensors.
Using tensors will become more important when we start working with images, which arrive as 3D data structures, with axes corresponding to the height, width, and the three (RGB) color channels.  
But in this chapter, we’re going to skip past and make sure you know the basics.

In [18]:
X = nd.arange(24).reshape((2, 3, 4))
X


[[[  0.   1.   2.   3.]
  [  4.   5.   6.   7.]
  [  8.   9.  10.  11.]]

 [[ 12.  13.  14.  15.]
  [ 16.  17.  18.  19.]
  [ 20.  21.  22.  23.]]]
<NDArray 2x3x4 @cpu(0)>

In [19]:
X.shape

(2, 3, 4)

## Element-wise operations

Oftentimes, we want to apply functions to arrays.  
Some of the simplest and most useful functions are the element-wise functions.  
These operate by performing a single scalar operation on the corresponding elements of two arrays.  
We can create an element-wise function from any function that maps from the scalars to the scalars.  

In math notations we would denote such a function as $f: \mathcal{R} \rightarrow \mathcal{R}$.  
Given any two vectors $\mathbf{u}$ and $\mathbf{v}$ of the same shape, and the function $f$, we can produce a vector $\mathbf{c} = F(\mathbf{u},\mathbf{v})$ by $c_i \gets f(u_i, v_i)$ for all $i$.   
Here, we produced the vector-valued $F: \mathcal{R}^d \rightarrow \mathcal{R}^d$ by lifting the scalar function to an element-wise vector operation.  
In MXNet, the common standard arithmetic operators `(+,-,/,*,**)` have all been lifted to element-wise operations for identically-shaped tensors of arbitrary shape.  

In [20]:
u = nd.array([1, 2, 4, 8])
u


[ 1.  2.  4.  8.]
<NDArray 4 @cpu(0)>

In [21]:
v = nd.ones_like(u) * 2
v


[ 2.  2.  2.  2.]
<NDArray 4 @cpu(0)>

In [22]:
u + v


[  3.   4.   6.  10.]
<NDArray 4 @cpu(0)>

In [23]:
u * v


[  2.   4.   8.  16.]
<NDArray 4 @cpu(0)>

In [24]:
u - v


[-1.  0.  2.  6.]
<NDArray 4 @cpu(0)>

In [25]:
u / v


[ 0.5  1.   2.   4. ]
<NDArray 4 @cpu(0)>

In [26]:
u ** v


[  1.   4.  16.  64.]
<NDArray 4 @cpu(0)>

We can call element-wise operations on any two tensors of the same shape, including matrices.

In [27]:
B = nd.ones_like(A) * 3
B


[[ 3.  3.  3.  3.]
 [ 3.  3.  3.  3.]
 [ 3.  3.  3.  3.]
 [ 3.  3.  3.  3.]
 [ 3.  3.  3.  3.]]
<NDArray 5x4 @cpu(0)>

In [31]:
Z = B
Z


[[ 3.  3.  3.  3.]
 [ 3.  3.  3.  3.]
 [ 3.  3.  3.  3.]
 [ 3.  3.  3.  3.]
 [ 3.  3.  3.  3.]]
<NDArray 5x4 @cpu(0)>

In [32]:
A + Z


[[ 6.  6.  6.  6.]
 [ 6.  6.  6.  6.]
 [ 6.  6.  6.  6.]
 [ 6.  6.  6.  6.]
 [ 6.  6.  6.  6.]]
<NDArray 5x4 @cpu(0)>

In [33]:
A * Z


[[ 9.  9.  9.  9.]
 [ 9.  9.  9.  9.]
 [ 9.  9.  9.  9.]
 [ 9.  9.  9.  9.]
 [ 9.  9.  9.  9.]]
<NDArray 5x4 @cpu(0)>

## Basic properties of tensor arithmetic