# Introduction to Numpy
## Luca de Alfaro, 2021


Prepared on: Tue Sep 14 20:08:33 2021

This is a book chapter; it is not a homework assignment.  
Do not submit it as a solution to a homework assignment; you would receive no credit.


In Python, if you have two numbers, you can add them (or subtract, multiply, etc): 

In [1]:
a = 3
b = 4
a + b


7

In [2]:
a - b


-1

However, if you have two _lists_ of numbers of the same length, you cannot _add_ or _subtract_ them as if they were vectors (element-wise): 

In [3]:
a = [3, 4, 5]
b = [1, 6, 2]
a + b


[3, 4, 5, 1, 6, 2]

In [4]:
import traceback

try:
    a - b
except:
    traceback.print_exc()


Traceback (most recent call last):
  File "<ipython-input-4-f64bb40f6ed3>", line 4, in <module>
    a - b
TypeError: unsupported operand type(s) for -: 'list' and 'list'


The only way to do such operations is to loop explicitly on the elements of a list: 

In [5]:
r = []
for i, x in enumerate(a):
    r.append(x + b[i])
r


[4, 10, 7]

Well, technically you can do it in one line: 

In [6]:
list(map(lambda x: x[0] + x[1], zip(a, b)))


[4, 10, 7]

... but that is frankly horrific. 

What we would like to have is a data-type for arrays, and in fact $n$-dimensional arrays (2-dimensional arrays are what's called matrices), so that we can do operations on them in a single blow. 

And that's what Numpy gives us, among other things. 

## Numpy

[Numpy](https://numpy.org/) is the pre-eminent Python package for numerical computation; most other numerical and mathematical packages, such as [scipy](https://www.scipy.org/) are built on top of it.  Numpy's implementation is _very_ sophisticated, with a lot of attention devoted to speed and accuracy; the simple rule of thumb is: _if you can make it in Numpy, you should make it in Numpy._ 

If you are new to Numpy, after reading this very brief introduction, please head over to [learning Numpy](https://numpy.org/learn/), and in particular, to [the absolute basics for beginners](https://numpy.org/devdocs/user/absolute_beginners.html), which is in fact a fairly comprehensive introduction. 

The first step in using Numpy is to [install it](https://numpy.org/devdocs/user/absolute_beginners.html#installing-numpy) (it comes pre-installed in Colab), and then import it.  Numpy is traditionally imported as `np`: 

In [7]:
import numpy as np


Here is how we write the above in Numpy.

In [8]:
a = np.array([3, 4, 5])
b = np.array([1, 6, 2])
a + b


array([ 4, 10,  7])

In [9]:
a * b


array([ 3, 24, 10])

As you see, `np.array( ... )` turns a list into a Numpy array.  You can also generate 2D arrays in this way: 

In [10]:
np.array([[2, 3], [4, 5]])


array([[2, 3],
       [4, 5]])

And you can also easily generate arrays of specified dimensions, filled with 0, 1, and randomn numbers: 

In [11]:
z = np.zeros((4, 5))
z


array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

In [12]:
u = np.ones((4, 5))
r = np.random.random((4, 5))
r


array([[0.06716521, 0.23352666, 0.84960545, 0.91169941, 0.57243929],
       [0.13305689, 0.98448076, 0.97101174, 0.80094473, 0.14393591],
       [0.8111104 , 0.72155706, 0.73145156, 0.83966825, 0.95896137],
       [0.63570908, 0.00734718, 0.39092722, 0.71318365, 0.50115166]])

Notice that an array of size (4, 5) has _4 rows and 5 columns_. 

### Broadcasting

In Numpy, you if you have a matrix (a 2D array) of size $4 \times 5$, you can add it (subtract, etc) to: 



In [13]:
r.shape


(4, 5)

* A number.  In this case, the number is _broadcast_ to every element of the matrix, that is, the number is added to all of them:

In [14]:
r + 3


array([[3.06716521, 3.23352666, 3.84960545, 3.91169941, 3.57243929],
       [3.13305689, 3.98448076, 3.97101174, 3.80094473, 3.14393591],
       [3.8111104 , 3.72155706, 3.73145156, 3.83966825, 3.95896137],
       [3.63570908, 3.00734718, 3.39092722, 3.71318365, 3.50115166]])

* An array of size 5.  In this case, each of the array elements is _broadcast_ (repeated) to all 4 elements of the same column: 

In [15]:
r + np.array([1, 2, 3, 4, 5])


array([[1.06716521, 2.23352666, 3.84960545, 4.91169941, 5.57243929],
       [1.13305689, 2.98448076, 3.97101174, 4.80094473, 5.14393591],
       [1.8111104 , 2.72155706, 3.73145156, 4.83966825, 5.95896137],
       [1.63570908, 2.00734718, 3.39092722, 4.71318365, 5.50115166]])

* An array of size (4, 5) -- of course. 

Thus, in [broadcasting](https://numpy.org/devdocs/user/absolute_beginners.html#broadcasting) Numpy seeks to replicate elements of the array with the smaller number of dimensions, in order to "adapt" it for the operation with the higher-dimensional array. 

### Comparisons

In Numpy, you can compare two arrays, or an array and a number.  The result is an array of booleans (True/False):

In [16]:
r > 0.5


array([[False, False,  True,  True,  True],
       [False,  True,  True,  True, False],
       [ True,  True,  True,  True,  True],
       [ True, False, False,  True,  True]])

In [17]:
u = np.random.random(r.shape)
r > u


array([[False,  True, False,  True, False],
       [False,  True,  True,  True, False],
       [ True,  True,  True,  True,  True],
       [ True, False,  True, False,  True]])

### Max, min and company

In Numpy, you can find the maximum and minimum elements of an array as follows: 

In [18]:
r.max()


0.9844807613238397

or also:

In [19]:
np.max(r)


0.9844807613238397

In [20]:
r.min()


0.007347183948525737

In [21]:
r.sum()


11.97893351653547

You can also take maxima (and minima, sums) along rows, or columns.  You specify the direction by specifying the _axis_: _axis=0_ indicates the first coordinate, _axis=1_ the second, and so forth.  So if you have a matrix of size (4, 5), with 4 rows and 5 columns, such as `r`, _axis=0_ specifies to take the max along the first dimension, the one with 4 elements.  The result will have 5 elements: 

In [22]:
r.max(axis=0)


array([0.8111104 , 0.98448076, 0.97101174, 0.91169941, 0.95896137])

Thus, the axis specifies the dimension to be eliminated.  Above, the maximum was taken along each column, eliminating the rows.  If you want to take the maximum along each row, eliminating the columns, you do: 

In [23]:
r.max(axis=1)


array([0.91169941, 0.98448076, 0.95896137, 0.71318365])

Just as you can take the max and min of an array, you can also take its standard deviation, average, etc: 

In [24]:
np.std(r)


0.31717657010560657

In [25]:
r.mean()


0.5989466758267735

### Elementwise maxima and minima

You can also compute the _elementwise maxima_ between two matrices: 

In [26]:
u = np.random.random(u.shape)
r


array([[0.06716521, 0.23352666, 0.84960545, 0.91169941, 0.57243929],
       [0.13305689, 0.98448076, 0.97101174, 0.80094473, 0.14393591],
       [0.8111104 , 0.72155706, 0.73145156, 0.83966825, 0.95896137],
       [0.63570908, 0.00734718, 0.39092722, 0.71318365, 0.50115166]])

In [27]:
u


array([[0.67999056, 0.46723918, 0.75116126, 0.60940779, 0.32537677],
       [0.31515462, 0.18487964, 0.70956537, 0.12925641, 0.42620673],
       [0.7550092 , 0.30916278, 0.44060395, 0.40360701, 0.848729  ],
       [0.32562806, 0.09036037, 0.57615683, 0.37699371, 0.63463777]])

In [28]:
np.maximum(r, u)


array([[0.67999056, 0.46723918, 0.84960545, 0.91169941, 0.57243929],
       [0.31515462, 0.98448076, 0.97101174, 0.80094473, 0.42620673],
       [0.8111104 , 0.72155706, 0.73145156, 0.83966825, 0.95896137],
       [0.63570908, 0.09036037, 0.57615683, 0.71318365, 0.63463777]])

Note the difference between `np.max`, which computes the maximum of the elements of a matrix, and `np.maximum`, which computes the maximum between elements of corresponding matrices, returning a matrix. 

### A word on types

Numpy has its [type system](https://numpy.org/doc/stable/user/basics.types.html), so you can create matrices and arrays of integers, floating points, unsigned integers, etc: 

In [29]:
a = np.array([3, 4, 5], dtype=np.uint8)
