# Introduction to NumPy Arrays

In this notebook we will showcase some basic to intermediate NumPy array manipulations which are the basis of many data processing approaches in science.
We start by importing the packages we plan to use.

In [None]:
import numpy as np
import matplotlib
import cv2

## Python lists

Python already has a basic container for arrays. To demonstrate in what way NumPy arrays are different than the built-in Python lists we start by reviewing some basic use cases of Python lists.                 
We can declare a new list by enclosing some collection of objects in brackets `[` `]`:`

In [None]:
a = [1, 2, 3]
a

In [None]:
b = ['a', 'b', 'c']
b

In [None]:
c = [1, 2, 'c']
c

If we want to create a matrix (a 2D list) we create a list of lists.

In [None]:
d = [[1, 2, 3], [4, 5, 6]]
d

We access elements of a list by using the getitem operator, denoted by adding `[]` after a variable name:

In [None]:
a[0]

For multi-dimensional lists this returns a row.

In [None]:
d[0]

So to access an element we need to access the row-index and then the column-index of the element we want.

In [None]:
d[0][0]

In [None]:
d[1][0]

The main benefit of built-in lists is that they are mutable. This means we can dynamically add more elements to that list.           
For example, to add a single number to list `a=[1, 2, 3]` we can do:

In [None]:
a.append(4)
a

To add an collection of elements we can use the extend method:

In [None]:
a.extend([5, 6, 7])
a

We can even add elements that do not contain the same type as others in a list:

In [None]:
a.append('a')
a

### So what's the issue with the builtins?

This is all fantastic and grand. So why do we need NumPy arrays then and what's the difference?

For starters, let's say we wanted to multiple all elements of a list by some number?                 
What do you think the following command will return as a result?

In [None]:
a = [1]*7
a

Why do you think this is what they decided the result should be?

What about the following?

In [None]:
a + a

Think also, how would you select a column of elements from a big array?

In [None]:
a = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
a

## NumPy Arrays

There are many (many) ways to create a NumPy array.         
For example, we can create them from lists

In [None]:
a = np.array([1, 2, 3])
a

In [None]:
b = np.array( [[1,2,3], [4,5,6]] )
b

What do you think the following will result in?

In [None]:
b = np.array([1, 2, 'a'])
b

Obviously, the first difference between NumPy arrays and lists is that NumPy arrays can only contain elements of the same type. 

There are *very* many ways to create numpy arrays. You are not expected to know all of them. Let's take a look at some other ways to create arrays nonetheless.          
We can create an array of zeros by specifying the "shape", i.e. the dimensions, of the matrix:

In [None]:
np.zeros((3, 2))

An array of ones in some shape:

In [None]:
np.ones((2, 3))

We can create an array of N equally separated numbers in some range (inclusive) as:

In [None]:
np.linspace(0, 10, 6)

Or we can arrange numbers in some range by specifying a step:

In [None]:
np.arange(0, 10, 2)

The last two methods, the `linspace` and `arange`, are already pretty useful. But they are even more powerful when combined with another method.       
The `reshape` method reorders the given array into the given shape. Note that the number of elements has to equally divide into the given shape.

In [None]:
a = np.arange(0, 10, 1)
a.reshape((5, 2))

A neat trick is to let NumPy pick the size of one of the dimensions:

In [None]:
a = a.reshape((2, -1))
a

This is hardly all of the possible ways to create NumPy arrays, for all the cool specialized functions check out the NumPy Manual:     
https://numpy.org/doc/stable/user/basics.creation.html                
which I recommend heartily as a reference if ever you get stuck. If you need something, chances are NumPy already has it and you just have to find it.

Here are two more ways that I find (oddly) practical:

In [None]:
np.eye(3, 3)

In [None]:
np.diag((1, 2, 3))

## NumPy Arrays do math

The big reason why NumPy arrays are more useful to us than lists is that they actually do math. 

In [None]:
a = np.ones((10, ))
b = np.arange(1, 11, 1)
print(a)
print(b)

In [None]:
a + b

In [None]:
a + 1

In [None]:
a*7

In [None]:
a/b

In [None]:
b.max()

In [None]:
b.min()

In [None]:
b.mean()

In [None]:
b.std()

## NumPy Arrays do better indexing

Now this is one of those things that really make NumPy arrays significantly better. 

Indexing refers to selecting elements of array. NumPy is **really** good at helping us select all kinds of combinations of elements in a very concise way.           
Starting from the basic element access:

In [None]:
a = np.arange(0, 100, 1).reshape((10, 10))
a

Same as lists, we can select rows and elements:

In [None]:
row = a[0]
row

In [None]:
a[0][0]

In [None]:
# or a shorter way
a[0, 0]

By using something called a **slice** we can also select ranges. 

For example from 2nd element to 8th element of a row:

In [None]:
row[2:8]

Or, selecting first two rows of our matrix:

In [None]:
a[:2]

Or, selecting all rows from 2nd onward

In [None]:
a[2:]

Like we showed above, we can access elements as `a[0,0]`. So would slices work the same way?

In [None]:
a[2:, 0]

So that returns us all the values from the first column of the rows that are larger than the 2nd.

Or, a more clearer and more useful example, let's select the first column of our matrix:

In [None]:
col = a[:,1]
col

NumPy arrays take this even a step further - we can use some very complicated indices. For example other arrays and Boolean arrays can serve as indices too!

In [None]:
col[[5, 6, 7]]

In [None]:
a[[1, 2, 3], [1, 2, 3]]

How would we combine this with something we saw above to select all elements on the diagonal?

How would we create an array with a plus sign across it?

A change of pace a bit. What are booleans?

We already mentioned we can index arrays using booleans. So let's take a look:

In [None]:
row

In [None]:
idx = [False] * 10
idx[1] = True
idx

In [None]:
row[idx]

Here is how this translates if we made the same selection on our big matrix `a`:

In [None]:
a[idx]

So why go through the bother of creating a boolean array instead of just asking for any of the `a[0, 1]` or `row[1]` and `a[1]`?

Most of the time it won't be us creating the array manually. Let's take a look:

In [None]:
a = np.zeros((9, 9))
a

In [None]:
a[2:7, 2:7] = np.ones((5, 5))
a

In [None]:
a > 0

In [None]:
a[a>0]

What did we get back in the previous line? The indices or elements?

Let's take a look at one more **really** useful function:

In [None]:
np.where(a>0)

What did we get back in the previous line this time?

# What is this useful for?

Images are also represented by arrays. Black and white images are 2D arrays. Color images are 2D arrays where each element is an array of 3 numbers - R, G and B or 4 numbers - RGB and alpha. 

A lot of operators in image processing are really just summation and multiplication operations involving a kernel of some kind. Kernels are just 2D matrices that give some kind of weights or a map over neighboring pixels that are involved in the operation.

Image segmentation is the process of partitioning a digital image into multiple image segments that mark distinct objects. For example, stars from galaxies and lines. 

Basically, most of what we will do will involve some kind of arrays, therefore it's good to understand how they work underneath the surface.       
Here are some of the kernels, for example, that exist in OpenCV - the computer vision library that we will use.

In [None]:
cv2.getStructuringElement(cv2.MORPH_RECT, (5, 5))

In [None]:
cv2.getStructuringElement(cv2.MORPH_CROSS, (5, 5))

In [None]:
cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5, 5))

## Quick Matplotlib rundown

To prove to you that images are basically just arrays of numbers we can display. To do this though, we'll need a plotting library of sorts. The most popular one is called `matplotlib`. With it, it's real easy to dish out some basic plots quickly, but also create some really detailed and intricate ones too. 

It would be pretty hard to grok the complete matplotlib tutorial in short form here so I will only show couple of most basic plots below but for any additional resources you should definitely go and look at their premade tutorials: https://matplotlib.org/stable/tutorials/index.html

Specifically their most [basic one](https://matplotlib.org/stable/tutorials/introductory/usage.html#sphx-glr-tutorials-introductory-usage-py) and their tutorial on [showing images](https://matplotlib.org/stable/tutorials/introductory/images.html#sphx-glr-tutorials-introductory-images-py) are good ones to start with. Here I'll only use `imshow` to plot an image. 

Can you guess in advance what it would look like based on the slice only?

In [None]:
import matplotlib.pyplot as plt

img = np.zeros((100, 100))
img[::6] = 255
plt.imshow(img, cmap="gray")

# Summary

* Images are basically 2D arrays of values
    * manipulating them requires manipulating arrays
* Python built-in data types are pretty cool
    * Lists are not only mutable, they are also dynamically allocated - which means we can change element values as well as shorten or extend the list and have a list of things that don't have the same type
    * Lists don't behave in expected ways when mathematical operators are used on them
* NumPy is a powerful array and math library
    * "Numerical Python" pretty much covers all your numerical requirements
    * nearly all-powerful indexing
    * behaves as expected when mathematical operators are used on it (usually they apply element-wise)
* Matplotlib plots things 
    * pretty cool