# __NumPy__

NumPy is the fundamental Python package for scientific computing and data science that implements __multi-dimensional, homogeneous arrays__, together with a wide variety of mathematical functions, linear algebra routines, random number generators, and more. The central data structure is the `ndarray` object (n-dimensional array) that wraps arrays of arbitrary structure with a coherent interface that allows to express most array operations in a rather __mathematical syntax__, i.e. without loops, independent of the number and size of dimensions. These operations are executed with __compiled and vectorized code__ that relieves the programmer from the burden of tedious low-level optimization.

In [None]:
import numpy as np

## Motivation

__Naive Python__ matrix multiplication

In [None]:
def matrix_multiplication(A, B, C):
    for k in range(len(A[0])):
        for i in range(len(A)):
            t = A[i][k]
            for j in range(len(B[0])):
                C[i][j] += t * B[k][j]

__Dot-product NumPy__ matrix multiplication

In [None]:
def matrix_multiplication_dot(A, B, C):
    for i in range(np.shape(A)[0]):
        for j in range(np.shape(B)[1]):
            C[i,j] = np.dot(A[i,:], B[:,j])

__Matmul Numpy__ matrix multiplication

In [None]:
def matrix_multiplication_matmul(A, B, C):
    return A@B # np.matmul(A, B, out=C)

__Benchmark__

In [None]:
%timeit matrix_multiplication_matmul(An, Bn, Cn)

In [None]:
%timeit matrix_multiplication_dot(An, Bn, Cn)

In [None]:
%timeit matrix_multiplication(Ap, Bp, Cp)

## Array Creation

Numpy provides a large number of methods to create arrays of any structure and data type, be it native array creation, converting other data structures, or joining existing arrays. A complete list of them is given here [array creation routines](https://numpy.org/doc/stable/reference/routines.array-creation.html). Routines such as `empty`, `ones`, and `full` create arrays from scratch with a given __value__ and __shape__. The __shape__ of an array stores its dimensional structure in a tuple of integers, such that the n-th number stands for the number of elements in the n-th dimension, e.g. the shape `(1,2,3)` stands for a three-dimensional array with one, two, and three elements in the first, second, and third dimension, respectively. The initial __value__ is either given by the function's name, e.g. `ones`, or passed as a parameter to `full`. Uninitialized arrays can be created with `empty`.

In [None]:
a = np.empty((10))
b = np.ones((5,5))
c = np.full((1,2,3), 7)

In [None]:
print(a)

In [None]:
print(c.shape)

***
Functions such as `ones_like` are provided in order to create an array with a given data type and the same shape as an already existing array. Specifically, these functions expect an __array-like__ object, which is essentially anything that can be read as an ordinary array, whether it is a NumPy array, a Python scalar, or a list of lists. NumPy tries its best to convert arbitrary objects into arrays.

In [None]:
d = np.ones_like(c)

In [None]:
print(d)
print(d.shape)

In [None]:
dic = {'a': 1, 'b': 2, 'c': 3}
print(dic)
print(np.array(dic.items()))

***
Additionally to filling a newly created array with a given value, arrays can also be created by specifying a __sequence__.

* `arange` returns evenly spaced values in the interval [`start`, `stop`) with `step` between consecutive values,
* `linspace` returns `num` evenly spaced values in the interval [`start`, `stop`), and 
* `logspace` return numbers spaced evenly on a `base` log scale in the interval `start`, `stop`).

These functions also work in multiple dimensions by expanding these parameters to shape-like tuples.

In [None]:
m = np.linspace((1,2,3), (8,16,24), 8)
print(m)

***
Conveniently, arrays can be __created from Python sequences__ with `np.array`. A list, a list of lists or even deeper nested lists, result in a one-dimensional, a two-dimensional, or a higher-dimensional NumPy array, respectively. Numpy does not differntiate between __lists and tuples__ in this case, however, ragged nested sequences sould be handled with care.

In [None]:
t = ((1,3), (6,7))
npt = np.array(t)
print(npt)

In [None]:
lt = [(1,3), [8,7]]
print(np.array(lt))

In [None]:
l = [[1,2], [[4,5], [6, 7]]]
print(np.array(l, dtype=object))

In [None]:
rns = [(1,2,3), [4]]
print(np.array(rns, dtype=object))

***
The following __array properties__ can be queried directly:

* numeric type of elements,
* number of dimensions,
* array shape,
* number of bytes per element, and
* number of bytes for whole array,

while several others are available via the `flags` property.

In [None]:
a.dtype

In [None]:
a.ndim

In [None]:
a.shape

In [None]:
a.itemsize

In [None]:
a.nbytes

In [None]:
a.flags

***

## Array Manipulation: Reshaping, Joining, and Splitting

While NumPy arrays have a __fixed size at creation__, they can be reshaped, extended, and separated by implicitly creating a new array and deleting the old one or simply creating a `view` on an array. __Reshaping__ allows to give an array a new dimensional structure without changing its underlying memory layout (if possible). One-dimensional arrays can be reshaped arbitrarily and naturally multi-dimensional ones can be __flattened__ into one dimension as well.

In [None]:
a = np.arange(0, 20)
b = np.reshape(a, (4,5))
print(b)

In [38]:
c = np.reshape(a, (2,10), 'F')
print(c)

[[ 0  2  4  6  8 10 12 14 16 18]
 [ 1  3  5  7  9 11 13 15 17 19]]


To flatten an array, either `flatten` or `ravel` can be used. The former creates a copy, while the latter returns a view.

In [None]:
f = a.ravel()
print(f)

In [None]:
f[3] = 666
print(f)
print(a)

__Additional axis__ can be added to an array with `newaxis`.

In [None]:
print(a)
b = a[:,np.newaxis]
print(b)

Naturally, combining and splitting existing arrays is possible as well. Joining arrays along certain axes is done with `concatenate`, `stack` and others, but note that most of them are simply convenience functions built on top of the former. `concatenate` joins arrays along existing axis, while `stack` inserts a new axis. The inverse operation `split` separates an array into multiple subarrays.

In [None]:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
aa = np.array([[1, 2, 3], [4, 5, 6]])
bb = np.array([[7, 8, 9], [10, 11, 12]])

In [None]:
np.concatenate((a, b))

In [None]:
np.stack((a, b), axis=0)

In [None]:
np.stack((a, b), axis=1)

In [None]:
np.vstack([a, b])

In [None]:
np.hstack([a, b])

## Indexing

__Indexing__ allows to extract or referenece (non-)contiguous segments of an array. Numpy generalizes Python's native indexing from one-dimensional to multi-dimensional arrays and also distinguishes between basic and advanced indexing. __Basic indexing__ (a.k.a slicing) creates __views__ of an array with the help of slice objects `[start:stop:step]`, just as in native Python. Views behave like references to the array from which they are created and thus writing to a view means directly writing to the underlying array. __Advanced indexing__ returns a __copy__ of the segment and is used whenever the selection is not performed with slice objects, e.g. with an integer array.

In [None]:
a = np.arange(0, 10)

In [None]:
b = a[2:7]
print(b)

In [None]:
b[3] = 123
print(a)

In [None]:
c = a[0:4:2]
print(c)

__Negative indices__ are a convenient way to index arrays in reverse, i.e. beginning from the last element.

In [None]:
last = a[-1]
print(last)

As in native Python, a __semicolon__ can be used instead of a specific start or end index. 

In [None]:
print(a[:5])
print(a[3::3])

With __boolean indexing__ each array element can be tested for a given condition. The result is a boolean array.

In [None]:
aleqfive = (a <= 5)
print(aleqfive)

Such boolean arrays can serve as a selection __mask__.

In [None]:
agrtfive = np.invert(aleqfive)
print(a[agrtfive])

In [None]:
ind = [1, 3, 9]
aind = a[ind]
print(aind)

In [None]:
aind[1] = 1337
print(aind)
print(a)

These indexing techniques naturally generalize to multidimensional arrays by using __commas__ to seperate dimensions.

In [None]:
b = np.array([np.arange(0,5),np.arange(5,10),np.arange(10,15),np.arange(15,20),np.arange(20,25)])
print(b)

In [None]:
print(b[::2,::2])

In [None]:
print(b[1::3,1::3])

__Fancy indexing__ facilitates access and modification of complicated subarrays by using arrays instead of scalars as indices. The resulting array has the same shape as the index array rather than the shape of the accessed array.

In [None]:
a = np.arange(0, 20)
ind = [1, 3, 15]
print(a[ind])

An __ellipsis__ expands the number of semicolons such that all dimensions are indexed.

In [None]:
c = np.array([[[1, 2],
               [3, 4]],
              [[5, 6],
               [7, 8]]])
print(c[0])

In [None]:
print(c[..., 1])

In [None]:
print(c[:,:,1])

## Data Types

NumPy extends Python's type system by several variations of [numerical data types](https://numpy.org/doc/stable/reference/arrays.scalars.html). Most of them have platform-specific definitions which is why they are represented by fixed-size aliases. With some exposure to C or C++, these types are quite familiar. Per default, a Python `int` gets converted to a `np.int32` or `np.int64` type, depending on the operating system and the Python version. Python's `float` is an IEEE 754-standardized C `double` and therefpre represented in NumPy as `np.float64`.

NumPy's array creation functions have a keyword argument `dtype` that allows to set the data type.

In [None]:
np.array([2, 3, 4], dtype=np.uint16).nbytes

Booleans in NumPy are much faster then native Python booleans. NumPy requires 1 byte for each boolean value, while Python stores a pointer to constant and therefore requires 4 to 8 bytes per value.

In [None]:
import sys

sys.getsizeof([True, False, False])

In [None]:
np.array([True, False, False]).nbytes

## Array Computations and Universal Functions

A great deal of reductions on arrays are already included in NumPy. 
Universal functions are functions that operate element-wise on arrays, i.e. they essentially vectorize functions over arrays  that usuallyusually operate on scalars. Arithmetic operations are an example of such functions. Additional universal functions can also be implemented either as a C extension of NumPy or by the help of `np.vectorize`.

In order to deal with arrays whose shapes do not match, NumPy __broadcasts__ the smaller array across the larger one. NumPy compares the arrays' shapes element-wise by working from right to left through the dimension. Two dimensions are compatible if they are equal or one of them is 1, but arrays do not need to have the same number of dimensions. 

In [None]:
a = np.array([1, 2, 3])
b = 5
a * b

In [None]:
a = np.array([[1, 2, 3], [4, 5, 6]])
b = np.array([1, 2, 3])
a * b

In [None]:
a = np.array([1, 2, 3])
b = np.array([1, 2, 3, 4])
a * b

Writing a new universal functions is done by first defining the function and the converting it with `np.vectorize`.

However, the following statement is taken from the [NumPy documentation](https://numpy.org/doc/stable/reference/generated/numpy.vectorize.html):
> “The `vectorize` function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.”

In [None]:
def addTo(x, y):
    return x+y


vaddTo = np.vectorize(addTo)

a = np.array([[1, 2, 3], [4, 5, 6]])
vaddTo(a, 5)

In [None]:
def someFun(v, ls):
    rst = v
    for x in ls:
        rst = rst + x
    for x in ls:
        rst = rst * x
    return rst


vsomeFun = np.vectorize(someFun, excluded=['ls'])

a = np.array([[1, 2, 3], [4, 5, 6]])
vsomeFun(v=a, ls=[1, 1, 2])

## Matrices

`np.matrix` is a strictly two-dimensional subclass of the the n-dimensional`np.ndarray` and thus inherits all attributes and methods. However, some methods are overwritten to be more convenient for matrix operations. Several matrix-specific operations are also and form a small [Matrix library](https://numpy.org/doc/stable/reference/routines.matlib.htmlhttps://numpy.org/doc/stable/reference/routines.matlib.html). NumPy's documentation recommends to use `np.ndarray` wherever possible and switch to `np.matrix` only for very good reasons; a decision guide can be found here: [Matrix or Array?](https://numpy.org/devdocs/user/numpy-for-matlab-users.html#array-or-matrix-which-should-i-use)

In [None]:
m = np.mat([[1, 2, 3], [4, 9, 6], [7, 8, 9]])
n = np.mat([[7, 9, 8], [3, 1, 2], [5, 6, 94]])
assert issubclass(np.matrix, np.ndarray)

In [None]:
m.T

In [None]:
m.I

In [None]:
m.H

In [None]:
m**2

In [None]:
m*n