# The Python ecosystem - The NumPy library

[NumPy](http://www.numpy.org/) is the fundamental package for scientific computing with Python. It contains among other things:
* a powerful N-dimensional array object
* sophisticated (broadcasting) functions
* tools for integrating C/C++ and Fortran code
* useful linear algebra, Fourier transform, and random number capabilities

_Pleae note that this walkthrough is heavily inspired by a [tutorial](http://cs231n.github.io/python-numpy-tutorial/) by [Justin Johnson](https://cs.stanford.edu/people/jcjohns/)._

In [1]:
import numpy as np

### The array object

A numpy array is a grid of values, all of the same type, and is indexed by a tuple of nonnegative integers. 

We can initialize numpy arrays from nested Python lists, and access elements using square brackets:

In [2]:
x = np.array([1,3,5,7,9,11,13])
x

array([ 1,  3,  5,  7,  9, 11, 13])

The number of dimensions is the rank of the array; the shape of an array is a tuple of integers giving the size of the array along each dimension.

In [3]:
x.shape

(7,)

Numpy also provides many functions to create arrays

In [4]:
# Create an array of all zeros
a = np.zeros((2,2))   
print(a.shape)
a                       

(2, 2)


array([[0., 0.],
       [0., 0.]])

In [5]:
# Create an array of all ones
b = np.ones((4,2))
print(b.shape)
b

(4, 2)


array([[1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.]])

In [6]:
# Create a constant array
c = np.full((2,2), 12)  
c

array([[12, 12],
       [12, 12]])

In [7]:
# Create a 2x2 identity matrix
d = np.eye(4)        
d

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

In [8]:
# Create an array filled with random values
# np.random.seed(111)  # uncomment for reproducible results
e = np.random.random((3,3))  
e

array([[0.81768162, 0.85405222, 0.65261923],
       [0.31569218, 0.76027126, 0.12403365],
       [0.67056707, 0.57487074, 0.39317592]])

There are many more ways to create an array (see [the documentation](https://docs.scipy.org/doc/numpy/reference/routines.array-creation.html#routines-array-creation) for further details).

### Array indexing

Numpy offers several ways to index into arrays.

* __Integer array indexing__: Integer array indexing allows you to construct arbitrary arrays using the data from another array.

* __Slicing__: Similar to Python lists, numpy arrays can be sliced. Since arrays may be multidimensional, you must specify a slice for each dimension of the array

* __Boolean array indexing__: Boolean array indexing lets you pick out arbitrary elements of an array. Frequently this type of indexing is used to select the elements of an array that satisfy some condition. 

In [9]:
aa = np.array([[1,2], [3, 4], [5, 6]])
print(aa.shape)
aa

(3, 2)


array([[1, 2],
       [3, 4],
       [5, 6]])

__Integer array indexing__

In [10]:
aa[0]

array([1, 2])

In [11]:
aa[0,1]

2

In [12]:
aa[[0,1], [1,1]]

array([2, 4])

In [13]:
aa

array([[1, 2],
       [3, 4],
       [5, 6]])

In [14]:
aa[[0, 1, 2], [0, 1, 0]]

array([1, 4, 5])

#### Slicing

In [15]:
bb = np.array(range(12)).reshape(3,4)
print(bb.shape)
bb

(3, 4)


array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

Pull out the subarray consisting of the first 2 rows of the 2$^\text{nd}$ and 3$^\text{rd}$ columns

In [16]:
bb[:2, 1:3]

array([[1, 2],
       [5, 6]])

We can also mix integer indexing with slice indexing. However, doing so will yield an array of lower rank than the original array. 

In [17]:
# Rank 1 view of the second row of bb
bb_ = bb[1, :]
print(bb_.shape)
bb_

(4,)


array([4, 5, 6, 7])

In [18]:
# Rank 2 view of the second row of bb
_bb = bb[1:2, :]
print(_bb.shape)
_bb

(1, 4)


array([[4, 5, 6, 7]])

We can make the same distinction when accessing columns of an array

In [19]:
bb

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [20]:
_bb = bb[:, 1]
print(_bb.shape)
_bb

(3,)


array([1, 5, 9])

In [21]:
_bb = bb[:, 1:2]
print(_bb.shape)
_bb

(3, 1)


array([[1],
       [5],
       [9]])

### Boolean array indexing

In [22]:
cc = np.linspace(start=5, stop=25, num=16).reshape((4,4))
print(cc.shape)
cc

(4, 4)


array([[ 5.        ,  6.33333333,  7.66666667,  9.        ],
       [10.33333333, 11.66666667, 13.        , 14.33333333],
       [15.66666667, 17.        , 18.33333333, 19.66666667],
       [21.        , 22.33333333, 23.66666667, 25.        ]])

In [23]:
cc > 10

array([[False, False, False, False],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True]])

In [24]:
cc[cc > 10]

array([10.33333333, 11.66666667, 13.        , 14.33333333, 15.66666667,
       17.        , 18.33333333, 19.66666667, 21.        , 22.33333333,
       23.66666667, 25.        ])

In [25]:
cc[(cc > 10) & (cc < 22)]

array([10.33333333, 11.66666667, 13.        , 14.33333333, 15.66666667,
       17.        , 18.33333333, 19.66666667, 21.        ])

One useful trick with array indexing is selecting or mutating:

In [26]:
cc[(cc > 10) & (cc < 22)] = -999
cc

array([[   5.        ,    6.33333333,    7.66666667,    9.        ],
       [-999.        , -999.        , -999.        , -999.        ],
       [-999.        , -999.        , -999.        , -999.        ],
       [-999.        ,   22.33333333,   23.66666667,   25.        ]])

### Array math

Basic mathematical functions operate elementwise on arrays, and are available both as operator overloads and as functions in the numpy module.

In [27]:
x = np.array([[1,2],[3,4]], dtype=np.float64)
y = np.array([[5,6],[7,8]], dtype=np.float64)
print("x:\n", x)
print("--------------")
print("y:\n", y)

x:
 [[1. 2.]
 [3. 4.]]
--------------
y:
 [[5. 6.]
 [7. 8.]]


In [28]:
# Elementwise sum; both produce the array
print(x + y)
print("--------------")
print(np.add(x, y))

[[ 6.  8.]
 [10. 12.]]
--------------
[[ 6.  8.]
 [10. 12.]]


In [29]:
# Elementwise difference; both produce the array
print(x - y)
print("--------------")
print(np.subtract(x, y))

[[-4. -4.]
 [-4. -4.]]
--------------
[[-4. -4.]
 [-4. -4.]]


In [30]:
# Elementwise product; both produce the array
print(x * y)
print("--------------")
print(np.multiply(x, y))

[[ 5. 12.]
 [21. 32.]]
--------------
[[ 5. 12.]
 [21. 32.]]


In [31]:
# Elementwise division; both produce the array
print(x / y)
print("--------------")
print(np.divide(x, y))

[[0.2        0.33333333]
 [0.42857143 0.5       ]]
--------------
[[0.2        0.33333333]
 [0.42857143 0.5       ]]


In [32]:
# Elementwise square root; produces the array
np.sqrt(x)

array([[1.        , 1.41421356],
       [1.73205081, 2.        ]])

In numpy `*` is elementwise multiplication, not matrix multiplication. We instead use the `dot` function to compute inner products of vectors, to multiply a vector by a matrix, and to multiply matrices. `dot` is available both as a function in the numpy module and as an instance method of array objects.

In [33]:
# vector v and w
v = np.array([9,10])
w = np.array([11, 12])
# 2x2 matrix
bb = y

print("v:\n", v)
print("dim:\n", v.shape)
print("--------------")
print("w:\n", w)
print("dim:\n", w.shape)
print("--------------")
print("--------------")
print("aa:\n", aa)
print("dim:\n", aa.shape)
print("--------------")
print("bb:\n", bb)
print("dim:\n", bb.shape)
print("--------------")


v:
 [ 9 10]
dim:
 (2,)
--------------
w:
 [11 12]
dim:
 (2,)
--------------
--------------
aa:
 [[1 2]
 [3 4]
 [5 6]]
dim:
 (3, 2)
--------------
bb:
 [[5. 6.]
 [7. 8.]]
dim:
 (2, 2)
--------------


$$M_{p\times q} = A_{p\times n} \times B_{n \times q}$$

In [34]:
# Inner product of vectors
print(v.dot(w))
print("--------------")
print(np.dot(v, w))

219
--------------
219


In [35]:
# Matrix / vector product
print(aa.dot(v))
print("--------------")
print(np.dot(aa, v))

[ 29  67 105]
--------------
[ 29  67 105]


In [36]:
# Matrix / matrix product
print(aa.dot(bb))
print("--------------")
print(np.dot(aa, bb))

[[19. 22.]
 [43. 50.]
 [67. 78.]]
--------------
[[19. 22.]
 [43. 50.]
 [67. 78.]]


Numpy provides many useful functions for performing computations on arrays; such as `sum`, `mean`, `max`, `min` and others. You can find the full list of mathematical functions provided by numpy in [the documentation](https://docs.scipy.org/doc/numpy/reference/routines.math.html).



In [37]:
x

array([[1., 2.],
       [3., 4.]])

In [38]:
print(np.sum(x))  # Compute sum of all elements
print(np.sum(x, axis=0))  # Compute sum of each column
print(np.sum(x, axis=1))  # Compute sum of each row

10.0
[4. 6.]
[3. 7.]


In [39]:
print(np.mean(x))  # Compute mean of all elements
print(np.mean(x, axis=0))  # Compute mean of each column
print(np.mean(x, axis=1))  # Compute mean of each row

2.5
[2. 3.]
[1.5 3.5]


In [40]:
print(np.min(x))  # Compute minimum of all elements
print(np.min(x, axis=0))  # Compute minimum of each column
print(np.min(x, axis=1))  # Compute minimum of each row

1.0
[1. 2.]
[1. 3.]


In [41]:
print(np.max(x))  # Compute maximum of all elements
print(np.max(x, axis=0))  # Compute maximum of each column
print(np.max(x, axis=1))  # Compute maximum of each row

4.0
[3. 4.]
[2. 4.]


### Some more useful basic numpy array methods

Apart from computing mathematical functions using arrays, we frequently need to reshape or otherwise manipulate data in arrays.

In [42]:
dd = np.arange(start=1, stop=2100, step=100).reshape((7,3))
dd

array([[   1,  101,  201],
       [ 301,  401,  501],
       [ 601,  701,  801],
       [ 901, 1001, 1101],
       [1201, 1301, 1401],
       [1501, 1601, 1701],
       [1801, 1901, 2001]])

**Transpose an array**

In [43]:
print("Dimensions: ", dd.shape)
dd_transposed = dd.T
print("Dimensions after transpose ", dd_transposed.shape)
dd_transposed

Dimensions:  (7, 3)
Dimensions after transpose  (3, 7)


array([[   1,  301,  601,  901, 1201, 1501, 1801],
       [ 101,  401,  701, 1001, 1301, 1601, 1901],
       [ 201,  501,  801, 1101, 1401, 1701, 2001]])

**Reshape an array**

In [44]:
print(dd.shape)
dd_reshaped = dd.reshape(-1,1)
print(dd_reshaped.shape)
dd_reshaped

(7, 3)
(21, 1)


array([[   1],
       [ 101],
       [ 201],
       [ 301],
       [ 401],
       [ 501],
       [ 601],
       [ 701],
       [ 801],
       [ 901],
       [1001],
       [1101],
       [1201],
       [1301],
       [1401],
       [1501],
       [1601],
       [1701],
       [1801],
       [1901],
       [2001]])

In [45]:
# returns the array, flattened
print(dd.shape)
dd_flat = dd.ravel()  
print(dd_flat.shape)
dd_flat

(7, 3)
(21,)


array([   1,  101,  201,  301,  401,  501,  601,  701,  801,  901, 1001,
       1101, 1201, 1301, 1401, 1501, 1601, 1701, 1801, 1901, 2001])

**Stacking together different arrays**
Several arrays can be stacked together along different axes.

In [46]:
ee = np.floor(10*np.random.random((2,2)))
print("ee:\n", ee)

ff = np.floor(10*np.random.random((2,2)))
print("ff:\n", ff)

ee:
 [[5. 6.]
 [2. 1.]]
ff:
 [[2. 5.]
 [2. 8.]]


In [47]:
# vertical stack
np.vstack((ee,ff))

array([[5., 6.],
       [2., 1.],
       [2., 5.],
       [2., 8.]])

In [48]:
# horizontal stack
np.hstack((ee,ff))

array([[5., 6., 2., 5.],
       [2., 1., 2., 8.]])

>  __Final note:__ Please be aware that we only scratched the surface of the functionalities of the numpy library. Check out the official [numpy tutorial](https://docs.scipy.org/doc/numpy-dev/user/quickstart.html) for a dive into numpy.

Further you may easily explore numpy's modules and submodules by typing 

    np.

into the cell below and press the <kbd>TAB</kbd> key for tab completion.

In [49]:
import numpy as np
# np.       