## <center>Scientific Programming - 7MRI0020 - 2021/2022</center>


## <center>Week 05 - Scientific Libraries - Part 01</center>


### <center>School of Biomedical Engineering & Imaging Sciences</center>
### <center>King's College London</center>

# Numpy

* Numpy provides fast numeric arrays
* Provide the `ndarray` type storing contiguous memory arrays for numeric types rather than as Python objects
* Computation with Numpy can be incredibly fast versus Python which stores numbers as individual objects
* Operators for `ndarray` implement a range of mathematical operations allowing vectorisation of computation

* Python array of arrays:

In [None]:
pyarr = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

* Defined as a `list` containing 3 more lists
* each list contains 3 numbers stored as individual Python objects

* Convert to a Numpy array, storing values as a single object:

In [None]:
import numpy as np

a = np.array(pyarr, np.float32)
print(repr(a))

* Stored as a contiguous block of data in memory, each item is a value and not Python objects

* Arrays have properties describing their shape and contents:

In [None]:
print(a.size)  # total number of values
print(a.shape)  # array shape, size of each dimension
print(a.dtype)  # data type, all stored values have this type
print(a.ndim)  # number of dimensions, ie. len(a.shape)
print(a.itemsize)  # size of each data value in bytes
print(a.nbytes)  # size of whole array in bytes, ie. a.size*a.itemsize

* Every array has a dtype defining the type of values stored
* Integer values (`np.int32`, `np.uint8`, etc.)
* Float values (`np.float32`, `np.float64`)
* Boolean values (`np.bool`)
* Complex values, structure types, strings, etc.

In [None]:
print(np.array([[-1.2, 2], [3.4, -4]], dtype=np.float64))  # double
print(np.array([[-1.2, 2], [3.4, -4]], dtype=np.uint16))  # unsigned short

* Various functions exist for creating arrays:

In [None]:
print(np.eye(2))  # identity matrix of given rank
print(np.full((2, 2), 42))  # matrix of given size filled with value
print(np.random.rand(2, 2))  # random array

* Why use Numpy?
* Fast math operations!

In [None]:
large_list = list(range(1000000))
%timeit sum(large_list)

large_array = np.array(large_list, np.int32)
%timeit np.sum(large_array)

* Various functions exist to reshape arrays into new ones containing the same data:

In [None]:
a = np.arange(6)
print(a.reshape(2, 3))  # dimensions (6,) to (2,3)
print(a[:,None])  # dimensions (6,) to (6,1)

* The type of an array can be changed with `astype` with the associated type conversions done internally:

In [None]:
print(np.array([1.2, 3.4]).astype(np.int32))
print(np.array([1.2, 3.4]).astype(np.float16))
print(np.array([1.2, 0, 3.4]).astype(np.bool))

* Note that this creates a new array with newly allocated memory

* We're familiar with array indexing using the `:`, `::` syntax
* Ellipsis `...` can be used to indicate `:` for zero or more dimensions
* Eg. given a n-dimensional array `arr`, the expression `arr[0,...,1]` is equivalent to accessing the index 0 at dimension 0, everything in every subsequent dimension, except taking index 1 in the last dimension

In [None]:
a = np.random.rand(5, 6, 7, 8)  # 4D array
print(a[0, :, :, 1].shape)
print(a[0, ..., 1].shape)  # equivalent to above
print(a[..., 1, 2].shape)  # take all of first 2 dimensions
print(a[1, 2, ...].shape)  # take all of last 2 dimensions

* Indexing can be done programmatically with the `slice` class
* Constructor accepts `start`, `end`, and `step` index values, so index `a:b:c` is equivalent to `slice(a,b,c)`
* `None` used when omitting an index, so `:b:c` (meaning from the start up to `b` in `c` steps) is `slice(None,b,c)`
* `slice(None)` equivalent to `:`

* A tuple of slices can be used to provide the indices to an array:

In [None]:
a = np.diag(np.arange(1, 10))  # 1-9 on the diagonal of a 9x9 array

print(a[1:8:2, 3::1].shape)
slices = (slice(1, 8, 2), slice(3, None, 1))
print(a[slices].shape)  # equivalent to above

slices = (slice(None, None, 3),) * a.ndim
print(a[slices])  # take every 3rd value from every dimension

* Tuples of indices can also be provided:

In [None]:
print(a[2, 2])  # get single value
print(a[[2], [2]])  # get array with single value

# get array of 2 values containing arr[2,2] and arr[3,3]
print(a[(2, 3), (2, 3)])

print(a[(0, 1, 2, 3), (0, 1, 2, 3)])

* When Numpy arrays are sliced, a view is returned
* This is a shallow copy of the original which uses the original allocated memory
* Changes to the view affect the original
* Deep copying can be done with the `copy` method

In [None]:
a = np.arange(10)
print(a)

b = a[3:6]
b[:] = 0  # assign 0 to every position in b

print(a)

* Views prevent unnecessary memory copying, but be aware of the side-effects of sharing data

* Multiple dimensions can be specified between `[]` brackets, this invokes one operation
* Using multiple bracket sets means multiple operations, be aware of inefficient creating/copying when doing this

In [None]:
a = np.random.rand(1000, 1000)
print(a[500, 500])  # get single value
print(a[500][500])  # get view of row then get value in view

%timeit a[500, 500]
%timeit a[500][500]

* Operators are overloaded on arrays to implement vectorized mathematical operations
* Eg. `__add__` (`+`) implements element-wise addition, `-` is element-wise subtraction, etc.
* Boolean operators (`==`, `<=`, etc.) are element-wise and produce arrays of boolean values

In [None]:
a = np.random.rand(1000, 1000)

b = a + a  # add arrays together
b = a + 10  # add constant values to arrays
b = 10 + a  # right-hand side operations work too

* These implement per-element operations and produce new arrays
* Matrix operators also provided:

In [None]:
b = a @ a  # matrix product
b = a.dot(a)  # same thing
b = a.T  # transpose

* Assignment operators modify existing arrays rather than create new ones
* Represents another important efficiency concern, choose one method or the other depending on need

In [None]:
b += 10  # add to b, nothing new created
b *= a  # multiply, semantically equivalent to b=b*a but faster
# b @= a # not supported yet

* Many more operations are defined as methods of `array` or as library functions:

In [None]:
b = a.sum()  # sum of all values
b = a.min(axis=1)  # minimal values along axis 1
b = np.linalg.inv(a)  # matrix inverse

* Universal functions (ufunc) operate elementwise on arrays
* They accept Python as well as Numpy data
* These implement a range of important mathematical functions

In [None]:
print(np.exp(1))
print(np.exp([1, 2]))
print(np.exp(np.arange(1, 3)))

* Boolean operators produce arrays of bool values
* Useful for testing properties of arrays, especially with helper functions

In [None]:
a = np.arange(10)
print(a > 4)  # contains True for values satisfying inequality
print(np.all(a > 4))  # test if all values are True

* A boolean array of the same size as the target can be used to index values
* Produces a view of wherever in target the index array is True

In [None]:
a = np.arange(10)
testop = (a % 2) == 0  # True for all even values
print(testop)

print(a[testop])  # get values where testop is True
a[testop] = -1  # can assign to these places
print(a)

* Arrays can be treated like iterables:

In [None]:
a = np.diag(np.arange(1, 5))
for i, row in enumerate(a):
    print(i, row)

* Expansion syntax with `*` works for arrays:

In [None]:
def print_shapes(*arrays):
    for i in arrays:
        print(i.shape)


print_shapes(*a)

* Iterating over every value in an array can be done in various ways:

In [None]:
for i in range(a.shape[0]):
    for j in range(a.shape[1]):
        print(a[i, j], end=" ")

In [None]:
for i in a.flat:
    print(i, end=" ")

* A iterator of indices can be created with an array's shape:

In [None]:
for i, j in np.ndindex(*a.shape):
    print(a[i, j], end=" ")

* `np.nditer` is a complex object capable of many iteration operations:

In [None]:
print(list(np.nditer(a)))

* Can be used to modify arrays as well:

In [None]:
a = np.diag([1, 2, 3])
with np.nditer(a, op_flags=["readwrite"]) as it:
    for i in it:
        i[...] += 10

print(a)

* Remember that numeric operators/functions are vector operations, so you don't ever have to do this:

In [None]:
a = np.diag([1, 2, 3])
b = np.ones_like(a)
c = np.zeros_like(a)

for i, j in np.ndindex(*a.shape):
    c[i, j] = a[i, j] + (1 / b[i, j] + 1)

In [None]:
# Just do this
c = a + (1 / b + 1)

* Many more features and functions provided by the library
* Reading documentation strongly encouraged: https://docs.scipy.org/doc/numpy/index.html
* Getting help in Jupyter can be done by giving an object name in a cell with `?` after it:

      np.array?
      
* This will bring up a documentation window

# That's it! Questions?

## Next: Exercises

## Tomorrow: matplotlib and pandas

