# Image Analysis with Python
## Session 2: Scientific Python Primer

__Toby Hodges__  
[toby.hodges@embl.de](mailto:toby.hodges@embl.de)  
[bio-it.embl.de](https://bio-it.embl.de)

In this session:
- Give beginners some experience of working with specialised modules
- Learn to work with `numpy` arrays
- Learn some basic plotting with `matplotlib`
- __If we have time__ look at some `pandas`

### Part 1: Arrays & `numpy`

In [None]:
# A reminder about import statements
import numpy as np

In [None]:
# The numpy.ndarray object
numbers = list(range(8))
print(numbers)

In [None]:
# some more arrays
print(numarray) # can be considered a 'row vector'
column = np.array([[10],
                   [11],
                   [12],
                   [13],
                   [14],
                   [15]]) # a 'column vector'

In [None]:
# ndarray stands for "n-dimensional array"
array3d = np.array(
    [
        [[0, 1, 2], [3, 4, 5]],
        [[6, 7, 8], [9, 0, 1]],
        [[2, 3, 4], [5, 6, 7]]
    ])

Monochromatic image files can be considered as 2D numeric arrays, where the value at each position corresponds to the intensity of the associated pixel in the image.

<img alt="A section of an image file shown as an array of intensity values"  src="images/figC.png">

Often, you will have intensity information for more than one colour. These arrays for different "channels", can be treated as layers in an additional dimension in a single array. E.g. an RGB image will have three 2D arrays, one for each of the colours. Image arrays may become 4-dimension, when 'Z-stack' images, of multiple layers in space, are considered, and yet another dimension may also be added when time-series images (i.e. video) are captured.

But wait! Couldn't we achieve the same thing with the `list` objects that are built into Python? What's the advantage of using arrays for this? And, while we're on the subject, what the heck is Numpy anyway?

Numpy is a "numeric Python" library, designed to work with numeric data. Where the Python Standard Library contains (lots of) general-purpose functionality for writing Python scripts, numpy was developed with specialist numeric applications in mind. As such, many of the functions and objects that Numpy contains are much better-suited to the kind of operations that you will regularly perform while working with image data.

For example, numpy arrays take up much less RAM than an equivalent `list` object, and mathematical operations on arrays tend to work much faster than for a standard `list`.

You may not care about these things now, and I admit that they're not particularly exciting. There are other, more practical advantages to using arrays though...

To demonstrate, let's keep looking at some more simple arrays. __But first!__ an exercise to make sure that everyone's awake.

##### Exercise 1

Fill in the blanks in the code below, to triple every value in the list, `list2d`.

In [None]:
list2d =[[0.1, 3.8, 8.9],
         [4.4, 9.0, 5.2],
         [3.1, 8.4, 5.2]]
print(list2d)

In [None]:
pos    = ___
subpos = ___
for sublist ___ list2d:
    ___ number in ___:
        list2d[pos][subpos] = ___ ___ 3
        ___ += 1
    subpos = 0
    ___ ___ ___

In [None]:
# Optional: ask Toby to show you how to achieve the same thing with enumerate()

Now, let's look at how to achieve the same thing with an array.

In [None]:
# using arrays for element-wise multiplications (copy the list2d above)


Much easier! You can do lots of mathematical operations, element-wise, on arrays just as easily. And matrix arithmetic too! (_Disclaimer: I am not a mathematician._)

In [None]:
# matrices


In [None]:
# back to arrays


What is this `float64`? Floating point numbers (numbers represented with decimal places) can be stored with varying levels of precision. A 64-bit number is more precise than a 32-bit number, which is more precise than a 16-bit number, and so on. The same applies to integers. On a computer, integers can only exist between a limited range, which depends on how many bits they are stored in. The exact range depends on whether the integer is 'signed' (includes information about whether the value is above or below zero) or 'unsigned' (positive or zero values only).

The datatype of your array determines how precise your values are, and how much memory the array is using up in the background. You also need to be aware of what happens with integer values when you exceed their limits in either direction...

In [None]:
# looking at the effects of going outside the possible range of a uint8


That `np.zeros_like()` function is pretty useful. We'll come back to it later. For now, let's have a look at a few other functions that Numpy provides for creating arrays.

In [None]:
np.arange(25)

In [None]:
range_array = np.arange(25)
range_array.shape = (5,5)
print(range_array)

In [None]:
np.ones((8,8), dtype='int16')

In [None]:
np.random.normal(size=(4,4))

In [None]:
np.linspace(0, 10, 10, dtype='float16')

##### Exercise 2

You will need to look at the documentation to do complete of the steps below... Use `help(thing)` or `thing?` to see the help page for an object/function/module called `thing`. `dir(thing)` shows you a list of all the attributes and methods available for `thing`. You can also find the Numpy documentation online at https://docs.scipy.org/doc/numpy-1.14.2/.

a) create a random array of 12 integers between 0 and 1000, store it as a variable (you can choose the name)  
b) change the data in the array to 32-bit floating point numbers  
c) divide the data in the array by 3.3  
d) raise every element in the array to the power of 2  

#### Part 2: Array Indexing

Select parts of an array in the same way as you might do with a standard `list`.

In [None]:
print(array2d)


In [None]:
# accessing a row

In [None]:
# accessing a single field/cell

In [None]:
# making 'slices'

Ok... But the real power comes from 'masking', or 'Boolean indexing'. This is where that `zeros_like()` function becomes relevant.

In [None]:
big_random_2darray = np.random.randint(100, size=(16,16))
print(big_random_2darray)

In [None]:
# get an equivalent array of zeros

In [None]:
# selecting using this zeros array


In [None]:
# Boolean selection


That example was pretty contrived, so let's look at something more realistic, masking some background value.

In [None]:
# creating a Boolean mask based on a condition


#### Part 3: Visualising arrays with `matplotlib`

Ok, now that we're familiar with arrays and Numpy, it's time to look at a bit of plotting. Despite the fact that images can be thought of as numbers, they're still easist to understand when visualised.

In [None]:
# this is only necessary when working in a Jupyter notebook
%matplotlib inline

In [None]:
import matplotlib.pyplot as plt

In [None]:
# using plt.imshow to visualise an array
plt.imshow(big_random_2darray)

In [None]:
# even larger array


#### Part 4: Plotting with `matplotlib.pyplot`

The general approach to drawing plots with Matplotlib is to build up the plot before finally rendering it with `plt.show()`. In the Jupyter notebook, things work a little differently, and you will probably see the `Figure` object appearing under each cell as you build it up. In this case, it's important that you remember that __you will need to run `plt.show()` when working in a less interactive environment/writing your own scripts__.

In [None]:
# make some random numbers for plotting

# we also need a variable, which we'll call bar_positions, to determine where the bars should appear on the x-axis.
# the value of this variable should be an array of numbers, from 0 to n-1 where n is the number of data points we have


In [None]:
# plot those random numbers as bars


Ok, great. We have some bars. But the plot is pretty rudimentary: it's lacking axis labels, for a start. Let's add some, and put alphabetical letters as the x-axis tick labels...

In [None]:
# customising our plot


We can add more customisations to our plot, e.g. adding a title, changing the colour of the bars.

In [None]:
# further customisation


That's a bit better, and this is barely scratching the surface of the options for plotting in Matplotlib. It's not the most user-friendly plotting library (check out Seaborn, Pandas, Plotnine, and others if you'd like to explore the options), but Matplotlib provides absolute control over how your plots look. Check out the documentation for more information: https://matplotlib.org/api/pyplot_summary.html.

##### Exercise

Let's make a second data series, again of random numbers, and add that to our plot too, so that the bars are stacked on top of one another. You will need to look at the documentation of the `plt.bar()` function to figure out how to stack the bars correctly. Create the plot, containing the stacked bars, with the second series plotted in a different colour from the first.

__Bonus:__ for extra points (you've been keeping score, right?), add a legend to the figure. Give the two data series whatever names you like. (I think Clark and Audrey are good names.)

We'll close the Matplotlib section out with a look at scatter plots. Let's plot the same two series that we made before.

In [None]:
# two series on a scatter plot

Wow, that was easy! What if we want to use our two series as X & Y coordinates for a single data series?

In [None]:
# plotting points as a single series on X- & Y-axes

Ok, the last thing that we'll do with Maplotlib today is look at how to arrange multiple plots into a single figure.

In [None]:
# subplots!

In [None]:
# do we still have time to look at some pandas?

<img src='images/panda1.png' alt='a panda' width=500 /> <img src='images/panda2.png' alt='another panda' width=500 /> <img src='images/panda3.png' alt='a third panda' width=500 />
_Image credit:_

1. _This image was originally posted to Flickr by popofatticus at https://www.flickr.com/photos/49503214348@N01/2478623520. It was reviewed on 10 August 2008 by FlickreviewR and was confirmed to be licensed under the terms of the cc-by-2.0._
2. This image was originally posted to Flickr by No Dust at https://www.flickr.com/photos/30073301@N00/3171164610. It was reviewed on 20 August 2009 by FlickreviewR and was confirmed to be licensed under the terms of the cc-by-2.0.
3. This image was originally posted to Flickr by greggoconnell at https://www.flickr.com/photos/72511655@N00/146912493. It was reviewed on 4 August 2007 by FlickreviewR and was confirmed to be licensed under the terms of the cc-by-2.0.