<a href="https://colab.research.google.com/github/DataWitchcraft/python4sci/blob/main/08_Numpy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Numpy

## Introduction

NumPy array is a data container. It is similar to Python lists, but it's specialised for working on numerical data. NumPy is at the center of scientific Python ecosystem and it is a work-horse of many scientific libraries including scikit-learn, scikit-image, matplotlib, SciPy.

To use NumPy we need to start python interpreter and import `numpy` package -- it's customary the use the following import statement, which will make all NumPy functions available under the `np` prefix:

In [2]:
import numpy as np

If `numpy` was installed correctly, this should not produce any messages. Let's create a simple three-element NumPy array:

In [None]:
x = np.array([2, 1, 5])
x

One of the advantages of NumPy is that it allows to apply functions (called `ufunc`s) to all elements of an array without the need of for loops:

In [None]:
np.sin(x)

This is not only convenient but also more efficient than iterating through the elements using for loops. Similarly, we can add scalars to all elements or multiply them by a constant:


In [None]:
x + 1

To construct an array with pre-defined elements we can also use one of the built-in helper functions. `np.arange` works like Python built-in `range`, but it returns an array; `np.ones` and `np.zeros` returns arrays of 0s or 1s; `np.random.rand` creates an array of random number from an interval [0, 1]:

In [None]:
np.arange(5)

In [None]:
np.ones(5)

In [None]:
np.zeros(5)

In [None]:
np.random.rand(5)

We can also construct a two- or more dimensional arrays:

In [None]:
x = np.array([[1, 2], [5, 6]])
x

In [None]:
np.ones((2, 2))

Alternatively, a n-dimensional array can be obtained by reshaping a 1-D array:

In [None]:
a = np.arange(9)
a

In [None]:
a.reshape(3,3)

Note that in this case we used a method of the array itself called `reshape` rather than a function from NumPy module (`np.reshape`). Both ways are possible and it's usually only a matter of convenience which one we choose in a particular case.

In [None]:
### EXERCISE

# Create a 5x5 square array with number 5 on the diagonal and zeros otherwise. Consider using `np.eye` function (you can check the docstring).

## Operations

Multiplication of two arrays is elementwise. For example, to calculate a square of each element we may use:

In [5]:
a = np.arange(3)
print("a = ", a)

b = a * a
b

a =  [0 1 2]


array([0, 1, 4])

Matrix products are calculated using the `np.dot` function:

In [None]:
np.dot(a, a)

For 1-D arrays the same result can be obtained by:

In [None]:
np.sum(a * a)

### Axis-based reductions

The `np.sum` function sums all elements regardless of the number of array dimensions:


In [6]:
b = np.arange(9).reshape(3,3)
b

In [None]:
np.sum(b)

If you want to sum only columns or rows, you need to pass the index of the axis over which you want to sum:

In [None]:
np.sum(b, 0)

In [None]:
np.sum(b, 1)

Other similar reduction functions are `np.min`, `np.max` or `np.mean`:

In [None]:
np.min(b)

In [None]:
np.min(b, 0)

In [None]:
np.min(b, 1)

You can also find the index of the minimum element in each axis:

In [None]:
np.argmin(b, 0)

### Sorting

NumPy also implement various sorting algorithms. To sort elements of an array you can use `np.sort` functions:

In [None]:
a = np.random.rand(4)
a

In [None]:
np.sort(a)

Similarly to the reduction functions, you can also pass the axis index to sort along: 

In [None]:
b = a.reshape(2, 2)
b

In [None]:
np.sort(b, 0)

In [None]:
np.sort(b, 1)

`np.argsort` returns the order of elements in a sorted array:

In [None]:
np.argsort(a)

### Special modules

NumPy also provides extra modules implementing basic numerical methods:

* `np.linalg` -- linear algebra,
* `np.fft` -- fast Fourier transform,
* `np.random` -- random number generators.

In [None]:
### EXERCISE

# Generate a 10 x 3 array of random numbers (using `np.random.rand`). 
# From each row, find the column index of the element closest to 0.75. Make use of np.abs and np.argmin. 
# The result should be a one-dimensional array of integers from 0 to 2.


In [None]:
### EXERCISE - hard

# Solve the following system of linear equations using `np.linalg.solve`. Test the solution.
# x + 3y = 3
# 5x - y = 6

## Indexing

### Integer indexing and slicing

Individual items of an array can be accessed by the integer index of the element (starting with 0):

In [None]:
a = np.arange(10)
print("a =", a)
np.array([a[0], a[2], a[-1]])

For two- or more dimensional arrays multiple indices should be specified:

In [None]:
b = np.arange(6).reshape(2,3)
b

In [None]:
b[1,2]

Slicing allows to extract sub-arrays of multiple elements from an array. It's defined by three integers separated by a colon, i.e. `start:end:increment`. Any of the integers can be skipped in which case they are replaced by defaults (0 for `start`, end of array for `end` and 1 for `increment`):

In [None]:
c = np.arange(9)
c

In [None]:
c[1:3]

In [None]:
c[:3]

In [None]:
c[1:]

You can also assign elements with slices and indexes:

In [None]:
print("c = ", c)
c[1:8:2]=1000
c

In [None]:
### EXERCISE 

# Create a 3x4 array x of  values from 0 to 11. Create another array y as follows: y = x[2]
# Try to guess - What happens when you modify y; does ``x`` also change? 
# Now try `y = x[:2]` and modify it's first element. What happens now? Can you explain why?

In [None]:
### EXERCISE - hard

# Create an array of zeros and fill it with a checkerboard pattern with of size 8x8.
 

![](https://github.com/paris-swc/advanced-numpy-lesson/raw/gh-pages/fig/checkerboard.svg)

### Boolean mask

Sometimes we may want to select array elements based on their values. For this case boolean mask is very useful. The mask is an array of the same length as the indexed array containg only `False` or `True` values:




In [None]:
a = np.arange(4)
print("a =", a)
mask = np.array([False, True, True, False])
a[mask]

In most cases the mask is constructed from the values of the array itself. For example, to select only odd numbers we could use the following mask:


In [None]:
odd = (a % 2) == 1
print("odd =", odd)
a[odd]

This could be also done in a single step:


In [None]:
a[(a % 2) == 1]

Indexing with a mask can be also useful to assign a new value to a sub-array:

In [None]:
a[(a % 2) == 1] = -1
a

In [None]:
### EXERCISE

# Rectify an array 3x4 (replace negative elements with zeros) of random numbers from normal distribution (generated with `np.random.randn`) using boolean indexing.

#### Exercise

What are the final values of `a` and `b` at the end of the following program? Explain why.

```
a = np.arange(5)
b = a[a < 3]
b[::2] = 0
```

1. `a = [0, 1, 2, 3, 4], b = [0, 1, 2]`

1. `a = [0, 1, 0, 3, 4], b = [0, 1, 0]`

1. `a = [0, 0, 2, 3, 4], b = [0, 0, 2]`

1. `a = [0, 1, 2, 3, 4], b = [0, 1, 0]`

1. `a = [0, 1, 2, 3, 4], b = [0, 1, 0, 3, 0]`

### Fancy indexing

Indexing can be done with an array of integers. In this case the same index can be also repeated several times:

In [None]:
a = np.arange(0, 100, 10)
print("a =", a)
a[[2, 3, 2, 4, 2]]

New values can be also assigned with this kind of indexing:

In [None]:
a[[9, 7]] = -100
a

When a new array is created by indexing with an array of integers, the new array has the same shape than the array of integers. Note that fancing indexing returns a copy and not a view.

In [None]:
a = np.arange(10)
idx = np.array([[3, 4], [9, 7]])
print("idx.shape =", idx.shape)
a[idx]

Fancy indexing is often used to re-order or sort data. You can easily obtain the indices required to sort data using `np.argsort`:

In [None]:
a = np.random.randint(10, size=5)
print("a =", a)
i = np.argsort(a)
a[i]

Let `x = np.array([1, 5, 10])`.

Which of the following will show [1, 10]:

1. `x[::2]`

1. `x[[1, 3]]`
 
1. `x[[0, 2]]`
 
1. `x[0, 2]`
 
1. `x[[1, -1]]`

1. `x[[False, True, False]]`

For each statement predict whether it returns a copy or a view.