# NumPy

According to the [documentation](https://numpy.org/doc/stable/) NumPy is:

> NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object,
> various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays,
> including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms,
> basic linear algebra, basic statistical operations, random simulation and much more.

This notebook provides a short introduction the the basic concepts of NumPy. The main goal is to enable you to 
understand the NumPy code that is used in the following notebooks. This introduction is by now means complete. 
Additional learning resources can be found in the [References](#References) at the end of this notebook.

In oder to use NumPy it first needs to be installed. This can be done using e.g. `pip install numpy` in the shell
or by executing a cell containing `!pip install numpy` inside a Jupyter notebook. 

In [None]:
!pip install numpy

## Why NumPy?
One of the core features of NumPy is improved performance compared to plain Python when working with large vectors or matrices. 
Large multi-dimensional matrices and mathematical operations on them are core building blocks of artificial 
[Neural Networks](https://en.wikipedia.org/wiki/Neural_network_(machine_learning)).

The performance improvement of NumPy is demonstrated in the following cells. 

In [None]:
import numpy as np

numpy_array = np.arange(1000000)
python_list = list(range(1000000))

In [None]:
%timeit squares = numpy_array ** 2

In [None]:
%timeit squares = [x ** 2 for x in numpy_array]

## NumPy arrays

The central data structure in NumPy are multi-dimensional arrays, often referred to as `ndarrays`. In theses arrays all elements have
the same data type. This data type is called the array's `dtype`. The ndarrays support most of the 
[common sequence operations](https://docs.python.org/3/library/stdtypes.html#typesseq-common).

In [None]:
a = np.array([1,2,3,4,5,6])
b = np.array([[1,2,3],[4,5,6],[7,8,9]])

In [None]:
print(a[1])
print(b[2][1])

print(b[1:])

print(5 in a)
print(min(a))
print(min(b))

However, not all operations are supported by all ndarrays. For example, the `min()` and `max()` functions only work 
for one-dimensional arrays. For arrays with two or more dimensions the corresponding `numpy.min()` and `numpy.max()` 
functions need to be used. These functions enable to specify the array dimensions for which the minimum and maximum 
values are calculated.

In [None]:
print(np.min(b))
print(np.min(b,axis=0))
print(np.max(b,axis=1))

### Creating arrays
The simplest way to create ndarrays is from Python lists. This approach was already shown above. In addition NumPy provides functions
to create arrays filled with zeros, arrays filled with ones or arrays from ranges. Furthermore, the `np.linspace()` function
creates arrays with linearly spaced values in a specific interval. 

In [None]:
print(np.zeros(3))
print(np.ones([3,4]))

print(np.arange(10))
print(np.linspace(0,20,9))

### Shape and size of an array
NumPy provide special attributes to get the shape, size or data type of an array. As an example, lets create a three-dimensional array:

In [None]:
a = np.array([[[0,1,2,3],
               [4,5,6,7]],
              [[0,1,2,3],
               [4,5,6,7]],
              [[0,1,2,3],
               [4,5,6,7]]])

Using the `ndim` get the dimensions of the array, using `size` the number of elements and using `shape` the size of each dimension. 

In [None]:
print(f"The array has {a.ndim} dimensions.")
print(f"It contains {a.size} elements.")
print(f"The shape of the array is:", a.shape)

## Array operations

### Basic arithmetic operations
One of the interesting features of ndarrays is, that basic arithmetic operations are applied to every element. 
Using Python lists a loop or a list comprehension (just a short hand notation for a loop) are necessary to perform 
a operation on every element.

In [None]:
l = list(range(10))

squares = []
for x in l:
    squares.append(x ** 2)

squares = [x ** 2 for x in l]


Using NumPy the same operation can be expressed using one line.

In [None]:
squares = a ** 2
squares

The same is true for all other basic operations.

In [None]:
print(a - 5)
print((a ** 2) < 30)
print(10 * a)

### Matrix operations
Numpy also provides an implementation of the [dot product](https://en.wikipedia.org/wiki/Matrix_multiplication) for matrices. 
The dot product can be calculated using the `@` operator ot the `dot()` method. The cell below shows a few examples of calculating the dot product.

In [None]:
m = np.array([[1,2,3,4], [5,6, 7, 8]])
n = m.T
v = np.array([1,2,3,4]
print(m @ v)
print(n.dot(m))
print(m.dot(n))

## Mathematicl formulas
One of the biggest advantages of using NumPy is the ease of implementing mathematical formulas. Consider, for example, the formula
for the [mean square error](https://en.wikipedia.org/wiki/Mean_squared_error):

$\text{MSE} = \frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2$.

This formula can be implemented using the Python code below. Note, that for the purpose of demonstration 
the arrays `predictions` and `labels` are initialized with some dummy values.

In [None]:
predictions = np.ones(3)
labels = np.array([1,2,3])

error = (1/predictions.size) * np.sum(np.square(predictions - labels))
print(error)

## References
The content of this notebook is mainly based on the following documents:
- [NumPy Quickstart](https://numpy.org/devdocs/user/quickstart.html)
- [NumPy: the absolute basics for beginners](https://numpy.org/doc/stable/user/absolute_beginners.html).

A curated list of [NumPy](https://numpy.org/) learning resources is available [here](https://numpy.org/learn/).

The complete list of functionality provided by NumPy can be found in the [NumPy API Reference](https://numpy.org/doc/stable/reference/index.html).