# Change Detection

In this workshop, we're going to load up and compare data from before, during and after the 2003 Canberra bushfires - and hopefully detect some changes.  (You may have guessed that bit.  Good work!)  Specifically, we'll work with imagery to cover:

- thresholds in reflectance and indicies
- calculating the difference between two images
- various ways to select subsets of our data
- calculating the mean and variance

Let's get started (WIP notebook)

In [None]:
import numpy as np
import scipy.io

import matplotlib.pyplot as plt
import seaborn
%matplotlib inline

Wait a second - we'll need to fetch our data first!  Today we're using the exact same data as the MatLab tutorials, so [go to this dropbox page](https://www.dropbox.com/sh/443m0s4e9x0m4d4/AAAv_HKT9F1yCm9j4D4T2wDqa/data?dl=0) and download the three `NDVI_(year)_(month).mat` files.

In [None]:
# Checks that the files exist:
from os import path
if all(path.isfile(f + '.mat') for f in ['NDVI_2002-12', 'NDVI_2002-12', 'NDVI_2002-12']):
    print('Yep, you downloaded all the data.  Good work!')
else:
    raise FileNotFoundError("You're missing some of the data!")

As Albert notes in the Matlab tutorials, this is a private file format used by MatLab and not really usable by anyone else.  We're going to use it anyway - to demonstrate that Python is awesome and show you why proper file formats and metadata are important (use them yourself, and complain to people who don't!).

We'll be using the `scipy.io` module, which is the input/output tools for [SciPy](https://docs.scipy.org/doc/scipy/reference/tutorial/general.html) - Python's general answer to Matlab for basic scientific and engineering computation.  SciPy operates on Numpy arrays (or anything that can be converted to one), and is very generalised.  Remember: it's usually best to use high-level or specialised tools, and fill any gaps with the more fundamental tools.  But for loading odd formats like Matlab files, it's hard to beat:

In [None]:
scipy.io.loadmat('NDVI_2002-12')

That's... not much metadata, is it?  We know that we just want the NDVI though, so let's load that from each of the `.mat` files we downloaded:

In [None]:
# NDVI relative to the 2003 Canberra bushfires
before = scipy.io.loadmat('NDVI_2002-12')['NDVI']
during = scipy.io.loadmat('NDVI_2003-01')['NDVI']
after = scipy.io.loadmat('NDVI_2003-02')['NDVI']

In [None]:
# What can we inspect about this data?
# Try typing a name, then a ".", and pressing the tab key to see a list of attributes and attached functions,
# arrows to move (or type a letter) and enter to fill it out.  'tab' completes partial names too!
# e.g.: .shape, .size, .max(), .min(), .mean()
before.dtype

**A brief digression on digital number formats**

What you need to know: *use integer types for integer data, floats for fractional data or if you have missing values.  Use dtypes with plenty of bits to avoid errors or imprecise calculations.*

Why does the data type matter - aren't all numbers basically the same?

That's true in maths class, where you probably learned about integers (whole numbers), rationals (fractions), and stranger things like irrational or complex numbers (which aren't relevant today).  These have lovely properties like "can have any number of digits" and "adding, multiplying, or dividing any two fractions results in a fraction".  Here's where it gets tricky:

- Computers represent everything as bits, meaning that all integers must be represented in binary (aka base-2).  In binary you count `0, 1, 10, 11, 100, 101, ...` instead of `0, 1, 2, 3, 4, 5, ...`.
- Computers represent *everything* as bits.  This has to include the sign (positive or negative?), and where one number starts and ends.
- The most efficient way to do this is to have an array of numbers, all of which take a fixed number of bytes (a group of eight bits).  The *array* is then labelled as either unsigned (positive-only) or signed (first bit is one == negative).  *This is a good time to ask any questions.*
- Choosing your data type is therefore very important - are negative values possible?  Will your maximum value ever be above `2**8=256`?  If so, you need more than one byte!  If you do a calculation with an unrepresentable output, results may vary: "undefined" for unsigned ints (ie you get an error, or maybe Bad Things happen), while signed ints will "wrap around" to the other end of the range (breaking relationships like `n + 1 > n`!).

The `before` array isn't integers though - it's a floating point number (to be exact: little-endian four-byte float, often called 'single-precision', google terms or ask me for details you don't need).  [Floating-point numbers are complicated (wikipedia)](https://en.wikipedia.org/wiki/Floating-point_arithmetic), but the gist is that they use scientific notation in binary, with some special cases for infinity and 'not a number' (eg `1/0 -> nan`).

Consider representing numbers in base-10 scientific notation, with at most three characters: one each for the sign, significand, and exponent.  So zero is `0.0 * 10^0`, or `+00`.  Three is `+30` (`3 * 10^0`), and negative three hundred is `-32`... but so is negative three-hundred-and-four!

This loss of precision in cumulative over your calculations, but luckily the standard sizes of 32 and 64 bit floats are pretty good - with 32 bits you can measure the distance from Canberra to the equator with a precision of a meter or so, and with 64 bits you can measure from the Sun to Pluto... in *micrometers*.  The usual approach is to store data in 32bit (`f4`), and calculate in `f4` unless you're doing very many or very sensitive calculations.

For NDVI, which is mostly between zero and one (and occasionally -4 .. +8), `f4` is verging on overkill.  (We use it anyway because it's standard, and handles missing data natively)

## Time to make maps again

Well, look at the image at least - it's not a map without metadata!

In [None]:
# This is the terrible way to do it:  can you see the missing data?
plt.imshow(during)  # Add your own before and after maps if you want - "Insert > Insert Cell Below"

## This is a prettier way: remove the "#" and rerun to see it.
plt.imshow(during, cmap='viridis')
plt.colorbar()

You will also see some strange features in the during image, including an apparently 
very straight boundary between low and high NDVI areas. This is because of the 
monthly 'compositing' done here. There are often more than one image available 
for any given month, because the Landsat revisit time is 16 days. If an area 
cannot be seen in one image because of cloud or smoke, the other image in that 
month can sometimes be used. While useful, it can lead to potentially confusing 
artefacts such as the line and some 'out of place' pixels.

I'll also show the before, during, and after views with the same shape and colour map for a side-by-side comparison.

In [None]:
# Make the figures larger, as if for a presentation
seaborn.set_context('talk')

# Create a figure with several subplots - three columns
figure, ax_s = plt.subplots(ncols=3)
plt.title('NDVI in Canberra before, during, and after a bushfire')
# For each (data, subplot) pair, show an image of the data on the axis
for data, ax in zip([before, during, after], ax_s):
    # Note that if we don't specify a colourmap and min and max values,
    # missing data is invisible and we can't compare images
    ax.imshow(data, cmap='viridis', vmin=0, vmax=0.9)

The image is an NDVI image of the Cotter and Namadgi National Park area 
in the ACT, derived from composite for the month December 2002, a month before 
bushfires swept through the area. This monthly composite was calculated by combining 
all available and valid surface reflectances from individual Landsat scenes 
in Geoscience Australia's atmospherically corrected Landsat Data Cube.

You will now see the (former) pine plantations around the Cotter dam in 
the brightest yellow near the top of the image, and the various dams in dark blue. 
You will further notice that the pastures outside Namadgi NP have a dark blue 
colour (i.e. they have low NDVI). That is because the grass had dried off in 
December 2002.  You wil notice that most of the NDVI had disappeared after the fire.