# s01: numpy
<br>
<br>
<img src="img/numpy_logo.png" width="200px">
<br>
<br>

This is a quick introduction to the Numpy package.

## Objectives of this session:

- Understand the Numpy array object
- Be able to use basic NumPy functionality
- Understand enough of NumPy to seach for answers to the rest of your questions ;)

So, we already know about python lists, and that we can put all kinds of things in there. But in scientific usage, lists are often not enough. They are slow and not very flexible.

In [2]:
# first things first, import the package
import numpy as np

## What is an array?

For example, consider `[1, 2.5, 'asdf', False, [1.5, True]]` - this is a Python list but it has different types for every element. When you do math on this, every element has to be handled separately.

NumPy is the most used library for scientific computing. Even if you are not using it directly, chances are high that some library uses it in the background. NumPy provides the high-performance multidimensional array object and tools to use it.

An array is a ‘grid’ of values, with all the same types. It is indexed by tuples of non negative indices and provides the framework for multiple dimensions. An array has:

- [dtype](https://numpy.org/doc/stable/reference/arrays.dtypes.html#arrays-dtypes) - data type. Arrays always contain one type
- [shape](https://numpy.org/doc/stable/glossary.html#term-shape) - shape of the data, for example 3×2 or 3×2×500 or even 500 (one dimensional) or `[]` (zero dimensional).
- `data` - raw data storage in memory. This can be passed to C or Fortran code for efficient calculations.

## Creating arrays

There are different ways of creating arrays (numpy.array(), numpy.ndarray.shape, numpy.ndarray.size):

In [None]:
a = np.array([1,2,3])               # 1-dimensional array 
b = np.array([[1,2,3],[4,5,6]])     # 2-dimensional array 

b.shape                             # the shape (rows,columns)
b.size                              # number of elements

In addition to above ways of creating arrays, there are many other ways of creating arrays depending on content (`numpy.zeros()`, `numpy.ones()`, `numpy.arange()`, `numpy.linspace()`):

In [None]:
np.zeros((2, 3))             # 2x3 array with all elements 0
np.ones((1,2))               # 1x2 array with all elements 1
np.full((2,2),7)             # 2x2 array with all elements 7
np.eye(2)                    # 2x2 identity matrix

np.arange(10)                # Evenly spaced values in an interval
np.linspace(0,9,10)          # same as above, see exercise

c = np.ones((3,3))
d = np.ones((3, 2), 'bool')  # 3x2 boolean array

## Array maths and vectorization

Clearly, you can do math on arrays. Math in NumPy is very fast because it is implemented in C or Fortran - just like most other high-level languages such as R, Matlab, etc do.

By default, basic arithmetic (+, -, *, /) in NumPy is element-by-element. That is, the operation is performed for each element in the array without you having to write a loop. We say an operation is “vectorized” when the looping over elements is carried out by NumPy internally, which uses specialized CPU instructions for this that greatly outperform a regular Python loop.

Note that unlike Matlab, where * means matrix multiplication, NumPy uses * to perform element-by-element multiplication and uses the @ symbol to perform matrix multiplication:

In [3]:
a = np.array([[1,2],[3,4]])
b = np.array([[5,6],[7,8]])

# Addition
c = a + b
d = np.add(a,b)

# Standard stats
d_mean = d.mean()
print(f"The mean of d is {d_mean}")


The mean of d is 9.0
