# Numpy 101 (a crash course for students)

A notebook to practice with ***NumPy***. 

*DISCLAIMER: This material has to be intended as a pragmatic shortcut to move on, and should NOT stop you from attending a more complete course/tutorial*

## Python list vs NumPy array

Familiarize with _ndarrays_. Here is a comparison between a **Python list** (L) and a **NumPy array** (A).

In [None]:
L = [1,2,3]
for i in L:
    print(i)

1
2
3


In [None]:
import numpy as np

In [None]:
A = np.array([1,2,3])

In [None]:
for i in A:
    print(i)

1
2
3


They look identical. Are they?!

Try to do something to your list/array, e.g. append an item.

In [None]:
L.append(4)
L

[1, 2, 3, 4]

In [None]:
A.append(4)

AttributeError: ignored

This method does not work for arrays because there is no _append_ attribute for a NumPy array. 

Try to join lists.

In [None]:
L = [1,2,3,4]
L = L + [5]
L

[1, 2, 3, 4, 5]

In [None]:
A = np.array([1,2,3])
A = A + [4,5]    

ValueError: ignored

This also does _not_ work on NumPy arrays. 

It seems that NumPy cannot do even the most trivial things that a Pyhton list can do! **Be patient..**

Try vector addition now. 

In [None]:
L = [1,2,3,4,5]
L+L     # mmh, is it the right way to go?

[1, 2, 3, 4, 5, 1, 2, 3, 4, 5]

Not really OK..

In [None]:
L = [1,2,3,4,5]
L2 = []
for i in L:
    L2.append(i+i)
L2

[2, 4, 6, 8, 10]

Ok, a bit ugly, but it works.

What about adding a vector to itself in NumPy, then? Let's try using the "+" sign as we wanted to do before with lists.

In [None]:
A = np.array([1,2,3])
A+A

array([2, 4, 6])

**Cool! Done, and so intuitively!**

This concept naturally extends to more dimensions: if you have a N-dim array (i.e. a matrix), doing A+A works. It is still element-wise addition.

Another way to get the same result is via a x2 multiplication, which is a scalar multiplied by a vector.

In [None]:
2*L

[1, 2, 3, 4, 5, 1, 2, 3, 4, 5]

Ops.. let's try with NumPy now.

In [None]:
2*A

array([2, 4, 6])

Done!

With L it concatenates, with A it adds. 

Try an element-wise squaring of every element.

In [None]:
L**2

TypeError: ignored

Ok, not a good way to go. You can do it with a loop:

In [None]:
L2 = []
for i in L:
    L2.append(i*i) # or "i**2"
L2

[1, 4, 9, 16, 25]

OK. But may I guess already that it will be much more manageable and simpler with Numpy arrays? Let's try:

In [None]:
A**2

array([1, 4, 9])

Done!

Other examples?
   * `np.sqrt` does the element-wise square root of the input vector
   * `np.log` does the log
   * `np.exp` the exponential, etc.  

In [None]:
np.sqrt(A)

array([1.        , 1.41421356, 1.73205081])

In [None]:
np.log(A)

array([0.        , 0.69314718, 1.09861229])

In [None]:
np.exp(A)

array([ 2.71828183,  7.3890561 , 20.08553692])

Lists are useful too, and sometimes you just want those. Usually you can treat a list like an array. With NumPy you can treat an array like a vector, i.e. a mathematical object.

To do operations on lists you need to use a `for` loop, and these are very slow and should be avoided as much as possible for performance reasons.

## Dot product (or "inner product")

It is one important type of multiplication you can perform on vectors, which returns a scalar.

$\mathbf{a} \cdot \mathbf{b} = \mathbf{a}^T\mathbf{b} = \sum_{i=1}^N a_i b_i $

Its outcome (module) is:

$| a \cdot b | = |a||b| cos \theta_{ab} $

i.e. 

$cos \theta_{ab} = \frac {a^T b}{|a||b|}$

The notation with transpose implies that the vectors are by default columns, and the transposition take a vector as column and return the vector as row, then the element-wise multiplication occurs.

## <font color='red'>Exercise 1</font>

Suppose you have the vectors a and b below. How would you compute the cosine above using numpy? Can you make it with the shortest nb of code lines?

In [None]:
a = np.array([1,2])
b = np.array([2,1])

## <font color='green'>Solution to Exercise 1</font>

In [None]:
# add you solution here

## <font color='red'>Exercise 2</font>: speed comparison

If you are given a and b as in next cell, use the hint in next to next cell to build up yourself some code that compare the performances of dot products done with python lists in a loop, with np.dot(a,b) and with a.dot(b).

In [None]:
a = np.random.randn(100) # 100 random entries, standard normal distribution
b = np.random.randn(100)

In [None]:
from datetime import datetime

t0 = datetime.now()
print("this is the instruction I want to measure completion time of")
dt = datetime.now() - t0

print("It took: ", dt)

this is the instruction I want to measure completion time of
It took:  0:00:00.001641


## <font color='green'>Solution to Exercise 2</font>

In [None]:
# add you solution here

What we saw is useful for vectors, i.e. 1D arrays. What about nD arrays?

# Matrices

A matrix can be tought as a 2D array. It can be thought alternatively as a list of lists (indeed, you can use a list of lists to initialize a matrix, for example). Convention: the first index is the row, the second is the column.

A list of lists is as follows:

In [None]:
L = [[1,2], [3,4]]
L

[[1, 2], [3, 4]]

A matrix in Numpy is as follows:

In [None]:
M = np.array([[1,2], [3,4]])   # NOTE: the 2 lists must be of the same size
M

array([[1, 2],
       [3, 4]])

Or:

In [None]:
M = np.array(L)   # NOTE: the 2 lists must be of the same size
M

array([[1, 2],
       [3, 4]])

_NOTE: the printout seems to indicate already that it is adequate for matrices.._

Let's now see how to get one element of the matrix.

With a Python list of lists:

In [None]:
L[0]   # first element of the list, which is a list itself

[1, 2]

In [None]:
L[0][0]   # first element of the first list of the 2 lists in the list of lists..

1

In [None]:
L[0,0]   # error! and an interesting one!

TypeError: ignored

The same with a NumPy array works - but there is a much better notation, which helps A LOT when one will have to deal with more complicate matrices and related tasks..

In [None]:
M[0][0]

1

In [None]:
M[0,0]   # while L[0,0] did not work (as list indices must be integers, not tuple), this works!

1

Cool! It works exactly with the easy syntax I would like to see working!

Side note: Numpy tends to overkill.. there is already an actual data type for matrices..

In [None]:
M2 = np.matrix([[1,2], [3,4]])
M2

matrix([[1, 2],
        [3, 4]])

NOTE: Actually the official NumPy documentation suggests _against_ using matrix (!).

In [None]:
A2 = np.array(M2)
A2

array([[1, 2],
       [3, 4]])

Why is it so cool? See below..

One operation is the Transpose (T).

In [None]:
M3 = np.array([[1,2,3], [3,4,5]])
M3

array([[1, 2, 3],
       [3, 4, 5]])

In [None]:
M3.T

array([[1, 3],
       [2, 4],
       [3, 5]])

Done! `.T` and it works. Wow!

A nxn matrix is just a nD NumPy array. A vector is just a 1D NumPy array (after all, it is a nx1 matrix..). So a 2x2 matrix is just a 2D vector. Actually, it is better to think that **a vector / matrix is a 1D / nD mathematical object that contains numbers**. 

> From a NumPy standpoint, the only thing you should care about is that you are dealing with 1D, 2D, nD NumPy arrays, i.e. treat vectors or matrices regardless.

## Different ways to generate arrays of data

Type in a list as array content (integers here):

In [None]:
A1 = np.array([1,2,3,4,5])
A1

array([1, 2, 3, 4, 5])

Create an array of all zeros (floating point here):

In [None]:
Z = np.zeros(10)
Z

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

Create a matrix of all zeros (flating point here): easy, just indicate the dimensions.

In [None]:
Z = np.zeros((10,2))
Z

array([[0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.]])

NOTE: the input to both definitions above is a tuple containing each dimension.

Equivalent function to create arrays/matrices filled with 1s:

In [None]:
O = np.ones(10)
O

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

In [None]:
O = np.ones((10,2))
O

array([[1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.]])

Create an array (or a matrix) of random numbers:

In [None]:
R = np.random.random(10)   # uniformily distributed numbers in the [0,1( interval
R
# run this cell several times...

array([0.12037475, 0.89347665, 0.58361663, 0.64910343, 0.73290824,
       0.72619157, 0.92852114, 0.22261852, 0.76975294, 0.31571784])

In [None]:
G = np.random.randn(10)   # # gaussian distributed numbers with mean 0 and variance 1
G

array([ 0.32277303, -0.57532071,  0.21614009, -0.96114619,  0.80004796,
       -1.26276121, -0.72569263, -0.28043311,  1.30228416,  1.19066893])

NOTE: For `random.randn`, the input need to be integers. This is the only random that takes each dimension as a separate integer, the others take tuples with no problem.

So, this fails:

In [None]:
G = np.random.randn((10,10))   # gaussian distributed numbers with mean 0 and variance 1 - but with a mistake
G

TypeError: ignored

Whereas this works:

In [None]:
G = np.random.randn(100,100)   # gaussian distributed numbers with mean 0 and variance 1
G

array([[ 0.81672388, -0.59807542,  0.82273115, ..., -0.5547134 ,
         0.14755986, -0.92604466],
       [ 0.05768864,  0.38861949,  0.34427014, ..., -0.58838094,
        -2.79274728, -1.31865387],
       [ 1.26526202,  0.1947656 , -1.20343908, ..., -1.43179222,
        -0.32189814, -0.17570001],
       ...,
       [-0.30986353, -1.01633221,  0.40539997, ..., -0.05438191,
         0.11466811, -1.72935412],
       [-0.25327333,  0.22643492,  0.21713536, ...,  1.37370012,
        -0.86664678,  0.57864801],
       [-1.46350674, -0.01684094,  0.44200209, ..., -1.00508827,
        -0.87148637, -0.31609668]])

And we have functions to calculate statistics variables, e.g.:

In [None]:
G.mean()   # mean

-0.0055443057161795786

In [None]:
G.var()   # variance

1.007861340905237

NOTE: values are pretty close to the true values. 



## <font color='red'>Exercise 3</font>

Increase the number of random values to more closely match the exact values.

# Matrix products

In matrix multiplication, inner dimensions must match. If I have a matrix A of size (2,3) and a matrix B of size (3,3), I am allowed to multiply AxB but not BxA. This requirements come from the definition of a dot product:

$ C(i,j) = \sum_{k=1}^N A(i,k) \cdot B(k,j)$

where the $(i,j)$th element of C is the scalar x scalar product between raw $A(i,:)$ and column $B(:,j)$ - with $k$ acting as silent index. In NumPy, this is

    np.dot(A,B)
    
or

    A.dot(B)

Often one wants to multiply matrix elements, i.e. element-wise multiplication.

$C(i,j) = A(i,j) * B(i,j)$

We saw that this "asterisk" operator works for vectors, we want the same for matrices, in nD - i.e. we want this to work also for nD NumPy arrays. Both multidimentional arrays must be of the exact same size, though. 

> In NumPy, the "asterisk" $C(i,j) = A(i,j) * B(i,j)$) means element-by-element (element-wise) multiplication while the "dot" ($C(i,j) = \sum_{k=1}^N A(i,k) \cdot B(k,j)$) means dot product, or **inner product**, i.e. matrix multiplication.

## Other common matrix operations

Create a matrix to use.

In [None]:
A = np.array([[1,2],[3,4]])
A

array([[1, 2],
       [3, 4]])

## Matrix inverse

(plus check it is ok):

In [None]:
Ainv = np.linalg.inv(A)
Ainv

array([[-2. ,  1. ],
       [ 1.5, -0.5]])

In [None]:
np.dot(A,Ainv)   # check that their dot product is the identity matrix - equivalent to A.dot(Ainv) or Ainv.dot(A)

array([[1.0000000e+00, 0.0000000e+00],
       [8.8817842e-16, 1.0000000e+00]])

## Matrix determinant

In [None]:
np.linalg.det(A)

-2.0000000000000004

## Diagonal of a matrix

In [None]:
np.diag(A)   # it return the diagonal in a vector

array([1, 4])

A matrix of all zeros apart from a given diagonal which is a vector you have:

In [None]:
diag = [1,2]
np.diag(diag)

array([[1, 0],
       [0, 2]])

## Outer product (+ review of previously introduced product types)

Outer product comes up when e.g. you calculate the covariance of some sample vector.

$E \{(x-\bar{x})(x-\bar{x})^T\} \approx \frac{1}{N-1} \sum_{i=1}^{N} (x_{i}-\bar{x})(x_{i}-\bar{x})$

Let's review all we saw, plus add the aforementioned product.

In [None]:
a = np.array([1,2])
b = np.array([3,4])

Element-wise product: $C(i,j) = A(i,j) * B(i,j)$

In [None]:
a*b

array([3, 8])

Inner (dot) product: $C(i,j) = \sum_{k=1}^N A(i,k) \cdot B(k,j)$

In [None]:
np.dot(a,b)

11

Outer product: $C(i,j) = A(i) B(j)$ (so that Inner product is the sum product over $i$ in $A(i)B(i)$)

In [None]:
np.outer(a,b)

array([[3, 4],
       [6, 8]])

Note that its diagonal is the element-wise product:

In [None]:
np.diag(np.outer(a,b))

array([3, 8])

And the trace gives the inner product:

In [None]:
np.trace(np.outer(a,b))

11

> In case of explorations of nD operations, with n>1, take note of the following:
> 
> **numpy.dot**: for 2-D arrays it is equivalent to matrix multiplication, and for 1-D arrays to inner product of vectors (without complex conjugation). For N dimensions it is a sum product over the last axis of a and the second-to-last of b.<br>
>
> **numpy.inner**: ordinary inner product of vectors for 1-D arrays (without complex conjugation), in higher dimensions a sum product over the last axes.

## Back to the start

Re-do few thing to solidify concepts.

Here is how easily one converts a **Python list** into a **NumPy array**.

In [None]:
import numpy   # NOTE: I am deliberately NOT adding "as np" so you familiarize seeing this too..

In [None]:
mylist = [1, 2, 3]                    # Python list
myarray = numpy.array(mylist)         # NumPy array

print(mylist)
print(myarray)
print(myarray.shape)                  # nb rows and columns
print(myarray[0])

[1, 2, 3]
[1 2 3]
(3,)
1


Array notation and ranges can be used to efficiently access data in a NumPy array

In [None]:
# access values
mylist = [[1, 2, 3], [3, 4, 5]]
myarray = np.array(mylist)

print(myarray)
print(myarray.ndim)
print(myarray.size)
print(myarray.shape)
print(myarray.dtype)


print("\nFirst row: %s" % myarray[0])
print("Last row: %s" % myarray[-1])
print("Specific row and col: %s" % myarray[0, 2])        # myarray[row, column]
print("Whole col: %s" % myarray[:, 2])

[[1 2 3]
 [3 4 5]]
2
6
(2, 3)
int64

First row: [1 2 3]
Last row: [3 4 5]
Specific row and col: 3
Whole col: [3 5]


In [None]:
myarray = np.arange(4)

print(myarray)

[0 1 2 3]


In [None]:
myarray = np.arange(1,25)

print(myarray)

[ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24]


In [None]:
myarray = np.arange(1,25).reshape(2,3,4)

print(myarray)

[[[ 1  2  3  4]
  [ 5  6  7  8]
  [ 9 10 11 12]]

 [[13 14 15 16]
  [17 18 19 20]
  [21 22 23 24]]]


In [None]:
myarray = np.arange(6).reshape(1,3,2)

print(myarray)

[[[0 1]
  [2 3]
  [4 5]]]


NumPy arrays can be used directly in arithmetic.

In [None]:
myarray1 = np.array([2, 2, 2])
myarray2 = np.array([3, 3, 3])

print("Addition: %s" % (myarray1 + myarray2))
print("Multiplication: %s" % (myarray1 * myarray2))

Addition: [5 5 5]
Multiplication: [6 6 6]


In [None]:
myarray = np.arange(9).reshape(3,3)
print(myarray)
myarray+myarray

[[0 1 2]
 [3 4 5]
 [6 7 8]]


array([[ 0,  2,  4],
       [ 6,  8, 10],
       [12, 14, 16]])

In [None]:
myarray*myarray

array([[ 0,  1,  4],
       [ 9, 16, 25],
       [36, 49, 64]])

In [None]:
myarray+1

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [None]:
myarray*4+1

array([[ 1,  5,  9],
       [13, 17, 21],
       [25, 29, 33]])

Broadcasting

In [None]:
myarray2 = np.array([10,10,10])

print(myarray)
print(myarray2)
print(myarray+myarray2)

[[0 1 2]
 [3 4 5]
 [6 7 8]]
[10 10 10]
[[10 11 12]
 [13 14 15]
 [16 17 18]]


> There is a lot more to NumPy arrays but these examples give you a flavor of the efficiency they provide when you would happen to work with lots of numerical data. 

## Done. That's all for the NumPy 101 crash course.

The appetizer is over. Learn more from a good NumPy course and/or tutorial.

## What we have learnt

Basics of Numpy.

Understanding the effective use of NumPy arrays is fundamental to effective numerical computing in Python.

## Reading material

* SciPy Lecture Notes, http://www.scipy-lectures.org/
* NumPy User Guide, http://docs.scipy.org/doc/numpy/user/