# Image Analysis with Python

In this workshop, we're going to do some simple analyses of a Landsat image.

TODO: link to xarray docs, outline learning outcomes / data theme / techniques for project

Below, I've written some demonstration code to:

1. load a Landsat image
2. slice the array (eg monochrome images, mean colour)
3. view the image
4. calculate and view NDVI

And after you run this, the second half of the workshop is to write your own analysis of a MODIS image!

First, let's import (load up) the libraries (packages of code) that we want to use for this task:

In [None]:
# NumPy for arrays, and Xarray for gridded geospatial datasets
import numpy as np
import xarray as xr

# To draw some images, and with nice styles
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

Next, we'll open our dataset for this workshop - a tile from the Landsat 7 ETM, with a short time axis.
We'll also adjust the metadata a bit by relabelling bands with colour names, and discarding the coordinate reference so that all our variables are the same kind of data.

*Note that `xarray`, our 'library' for working with gridded data in NetCDF files, is perfectly happy with a URL instead of a filename.
Because this data is provided via OpenDAP (**D**ata **A**ccess **P**rotocol), `xarray` will automatically avoid downloading data until you need it - so opening very large collections of files only transfers a little metadata, and taking subsets is usually quite efficient.*

In [None]:
# Sample URL - full data expected mid-2017
data_url = 'http://dapds00.nci.org.au/thredds/dodsC/uc0/rs0_dev/multiple_band_variables/LS7_ETM_NBAR_P54_GANBAR01-002_089_078_2015_153_-26.nc'
# Open the dataset, but only download contents as needed - at this stage, metadata
data = xr.open_dataset(data_url)

# Save the metadata, then clear it
attrs = data.attrs
data.attrs = {}
# Drop the coordinate reference system variable (it's not a band)
data = data.drop('crs')
# Translate band numbers to names
data.rename({
        'band1': 'blue',
        'band2': 'green',
        'band3': 'red', 
        'band4': 'nir',
        'band5': 'swir1',
        'band6': 'swir2'
    }, inplace=True)

# Display a quick summary
data

Hmm, what's this?  Where most Python objects display their entire contents, think of numbers or strings, it wouldn't be useful to do that for a whole dataset. We'd also have to download a lot more data - if you see `...` for the data variables, that's because the data hasn't been fetched to your computer yet!

Breaking down the summary, we have three parts: dimensions, coordinates, and data variables.

TODO expand this explanation

In [None]:
# TODO:  explain how this is stacking arrays along a new axis
arr = data.isel(time=0).to_array(dim='band')
arr

In [None]:
arr.plot.imshow(robust=True, col='band', col_wrap=3)
# TODO: illustrate and explain difference between plt.imshow(...) and (...).plot.imshow()

In [None]:
# Let's check out the time axis (todo - explore metadata in more detail further up or down)
data.time