# Introduction to NumPy Arrays

In this notebook we will showcase some basic to intermediate NumPy array manipulations which are the basis of many data processing approches in science.
We start by importing the packages we plan to use.

In [None]:
import numpy as np
import matplotlib
import cv2

## Python lists

Python already has a basic container for arrays. To demonstrate in what way NumPy arrays are different than the built-in Python lists we start by reviewing some basic use cases of Python lists.                 
We can declare a new list by enclosing some collection of objects in brackets `[` `]`:`

In [44]:
a = [1, 2, 3]
a

[1, 2, 3]

In [8]:
b = ['a', 'b', 'c']
b

['a', 'b', 'c']

Python lists can accept mixed type elements. 

In [10]:
c = [1, 2, 'c']
c

[1, 2, 'c']

If we want to create a matrix (a 2D list) we create a list of lists.

In [12]:
d = [[1, 2, 3], [4, 5, 6]]
d

[[1, 2, 3], [4, 5, 6]]

We access elements of a list by using the getitem operator, denoted by adding `[]` after a variable name:

In [14]:
a[0]

1

For multi-dimensional lists this returns a row.

In [15]:
d[0]

[1, 2, 3]

So to access an element we need to access the row-index and then the column-index of the elemnent we want.

In [16]:
d[0][0]

1

In [17]:
d[1][0]

4

The main benefit of built-in lists is that they are mutable. This means we can dynamically add more elements to that list.           
For example, to add a single number to list `a=[1, 2, 3]` we can do:

In [18]:
a.append(4)
a

[1, 2, 3, 4]

To add an collection of elements we can use the extend method:

In [19]:
a.extend([5, 6, 7])
a

[1, 2, 3, 4, 5, 6, 7]

We can even add elements that do not contain the same type as others in a list:

In [20]:
a.append('a')
a

[1, 2, 3, 4, 5, 6, 7, 'a']

### So what's the issue?

This is all fantastic and grand. So why do we need NumPy arrays then and what's the difference?

For starters, let's say we wanted to multiple all elements of a list by some number?                 
What do you think the following command will return as a result?

In [21]:
a = [1]*7
a

[1, 1, 1, 1, 1, 1, 1]

Why do you think this is what they decided the result should be?

What about the following?

In [46]:
a + a

[1, 2, 3, 1, 2, 3]

Think also, how would you select a column of elements from a big array?

In [72]:
a = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
a

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

## NumPy Arrays

There are many (many) ways to create a NumPy array.         
For example, we can create them from lists

In [22]:
a = np.array([1, 2, 3])
a

array([1, 2, 3])

In [26]:
b = np.array( [[1,2,3], [4,5,6]] )
b

array([[1, 2, 3],
       [4, 5, 6]])

What do you think the following will result in?

In [25]:
b = np.array([1, 2, 'a'])
b

array(['1', '2', 'a'], dtype='<U21')

Obviously, the first difference between NumPy arrays and lists is that NumPy arrays can only contain elements of the same type. 

There are *very* many ways to create numpy arrays. You are not expected to know all of them. Let's take a look at some other ways to create arrays nonetheless.          
We can create an array of zeros by specifying the "shape", i.e. the dimensions, of the matrix:

In [28]:
np.zeros((3, 2))

array([[0., 0.],
       [0., 0.],
       [0., 0.]])

An array of ones in some shape:

In [29]:
np.ones((2, 3))

array([[1., 1., 1.],
       [1., 1., 1.]])

We can create an array of N equally separated numbers in some range (inclusive) as:

In [32]:
np.linspace(0, 10, 6)

array([ 0.,  2.,  4.,  6.,  8., 10.])

Or we can arange numbers in some range by specifying a step:

In [33]:
np.arange(0, 10, 2)

array([0, 2, 4, 6, 8])

The last two methods, the `linspace` and `arange`, are already pretty useful. But they are even more powerful when combined with another method.       
The `reshape` method reorders the given array into the given shape. Note that the number of elements has to equally divide into the given shape.

In [60]:
a = np.arange(0, 10, 1)
a.reshape((5, 2))

array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])

A neat trick is to let NumPy pick the size of one of the dimensions:

In [61]:
a = a.reshape((2, -1))
a

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

This is hardly all of the possible ways to create NumPy arrays, for all the cool specialized functions check out the NumPy Manual:     
https://numpy.org/doc/stable/user/basics.creation.html                
which I recommend heartily as a reference if ever you get stuck. If you need something, chances are NumPy already has it and you just have to find it.

Here are two more ways that I find (oddly) practical:

In [84]:
np.eye(3, 3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [86]:
np.diag((1, 2, 3))

array([[1, 0, 0],
       [0, 2, 0],
       [0, 0, 3]])

## NumPy Arrays do math

The big reason why NumPy arrays are more useful to us than lists is that they actually do math. 

In [63]:
a = np.ones((10, ))
b = np.arange(1, 11, 1)
print(a)
print(b)

[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[ 1  2  3  4  5  6  7  8  9 10]


In [52]:
a + b

array([ 2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11.])

In [64]:
a + 1

array([2., 2., 2., 2., 2., 2., 2., 2., 2., 2.])

In [65]:
a*7

array([7., 7., 7., 7., 7., 7., 7., 7., 7., 7.])

In [53]:
a/b

array([1.        , 0.5       , 0.33333333, 0.25      , 0.2       ,
       0.16666667, 0.14285714, 0.125     , 0.11111111, 0.1       ])

In [54]:
b.max()

10

In [55]:
b.min()

1

In [56]:
b.mean()

5.5

In [57]:
b.std()

2.8722813232690143

## NumPy Arrays do better indexing

Now this is one of those things that really make NumPy arrays significantly better. 

Indexing refers to selecting elements of array. NumPy is **really** good at helping us select all kinds of combinations of elements in a very concise way.           
Starting from the basic element access:

In [74]:
a = np.arange(0, 100, 1).reshape((10, 10))
a

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
       [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
       [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
       [70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
       [80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
       [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])

Same as lists, we can select rows and elements:

In [75]:
row = a[0]
row

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [70]:
a[0][0]

0

In [71]:
# or a shorter way
a[0, 0]

0

By using something called a **slice** we can also select ranges. 

For example from 2nd element to 8th element of a row:

In [76]:
row[2:8]

array([2, 3, 4, 5, 6, 7])

Or, selecting first two rows of our matrix:

In [77]:
a[:2]

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]])

Or, selecting all rows from 2nd onward

In [78]:
a[2:]

array([[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
       [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
       [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
       [70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
       [80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
       [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])

Like we showed above, we can access elements as `a[0,0]`. So would slices work the same way?

In [79]:
a[2:, 0]

array([20, 30, 40, 50, 60, 70, 80, 90])

So that returns us all the values from the first column of the rows that are larger than the 2nd.

Or, a more clearer and more useful example, let's select the first column of our matrix:

In [83]:
col = a[:,1]
col

array([ 1, 11, 21, 31, 41, 51, 61, 71, 81, 91])

NumPy arrays take this even a step further - we can use some very complicated indices. For example other arays and boolean arrays can serve as indices too!

In [90]:
col[[5, 6, 7]]

array([51, 61, 71])

In [92]:
a[[1, 2, 3], [1, 2, 3]]

array([ 0, 11, 22])

How would we combine this with something we saw above to select all elements on the diagonal?

How would we create an array with a plus sign across it?

A change of pace a bit. What are booleans?

We already mentioned we can index arrays using booleans. So let's take a look:

In [104]:
row

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [108]:
idx = [False] * 10
idx[1] = True
idx

[False, True, False, False, False, False, False, False, False, False]

In [107]:
row[idx]

array([1])

Here is how this translates if we made the same selection on our big matrix `a`:

In [109]:
a[idx]

array([[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]])

So why go through the bother of creating a boolean array instead of just asking for any of the `a[0, 1]` or `row[1]` and `a[1]`?

Most of the time it won't be us creating the array manually. Let's take a look:

In [120]:
a = np.zeros((9, 9))
a

array([[0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0.]])

In [123]:
a[2:7, 2:7] = np.ones((5, 5))
a

array([[0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 1., 1., 1., 1., 1., 0., 0.],
       [0., 0., 1., 1., 1., 1., 1., 0., 0.],
       [0., 0., 1., 1., 1., 1., 1., 0., 0.],
       [0., 0., 1., 1., 1., 1., 1., 0., 0.],
       [0., 0., 1., 1., 1., 1., 1., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0.]])

In [124]:
a > 0

array([[False, False, False, False, False, False, False, False, False],
       [False, False, False, False, False, False, False, False, False],
       [False, False,  True,  True,  True,  True,  True, False, False],
       [False, False,  True,  True,  True,  True,  True, False, False],
       [False, False,  True,  True,  True,  True,  True, False, False],
       [False, False,  True,  True,  True,  True,  True, False, False],
       [False, False,  True,  True,  True,  True,  True, False, False],
       [False, False, False, False, False, False, False, False, False],
       [False, False, False, False, False, False, False, False, False]])

In [125]:
a[a>0]

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1.])

What did we get back in the previous line? The indices or elements?

Let's take a look at one more **really** useful function:

In [126]:
np.where(a>0)

(array([2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6,
        6, 6, 6]),
 array([2, 3, 4, 5, 6, 2, 3, 4, 5, 6, 2, 3, 4, 5, 6, 2, 3, 4, 5, 6, 2, 3,
        4, 5, 6]))

What did we get back in the previous line this time?

# What is this useful for?

Images are also represented by arrays. Black and white images are 2D arrays. Color images are 2D arrays where each element is an array of 3 numbers - R, G and B or 4 numbers - RGB and alpha. 

A lot of operators in image processing are really just multiplication and division operations involving a kernel of some kind. Kernels are just 2D matrices that give some kind of weights or a map over neighbouring pixels that are involved in the operation.

Image segmentation is the process of partitioning a digital image into multiple image segments that mark distinct objects. For example, stars from galaxies and lines. 

Basically, most of what we will do will involve some kind of arrays, therefore it's good to understand how they work underneath the surface.       
Here are some of the kernels, for example, that exist in OpenCV - the computer vision library that we will use.

In [129]:
cv2.getStructuringElement(cv2.MORPH_RECT, (5, 5))

array([[1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1]], dtype=uint8)

In [130]:
cv2.getStructuringElement(cv2.MORPH_CROSS, (5, 5))

array([[0, 0, 1, 0, 0],
       [0, 0, 1, 0, 0],
       [1, 1, 1, 1, 1],
       [0, 0, 1, 0, 0],
       [0, 0, 1, 0, 0]], dtype=uint8)

In [131]:
cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5, 5))

array([[0, 0, 1, 0, 0],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [0, 0, 1, 0, 0]], dtype=uint8)