# Trend Analysis

In this workshop, we're going to explore a 'data cube' - a medium-sized\* vegetation dataset with x, y, and time dimensions.

- What is a data cube anyway?  Is there higher-dimensional data?
- Reducing data to a 1D timeseries
- Calculating summaries along various dimensions
- Dealing with unevenly-spaced observations and missing data
- Drawing awesome plots

In [None]:
# First things first - as usual, we import the tools
import xarray as xr
import numpy as np
import pandas as pd  # "Python ANd Data AnalysiS" - like Excel, but better

import matplotlib.pyplot as plt
import seaborn
%matplotlib inline

In the course theory you will have heard how passive microwave observations from a series of satellite radiometers can be used to develop time series of a measure called Vegetation Optical Depth, and from this, global annual maps of above-ground biomass. We did this as part of a journal article that you can find in the reading material (Liu et al., 2015). Here we are going to look at these time series and do some trend analysis.

In [None]:
# I put the data on NCI for us, so you don't have to download it again.
data = xr.open_dataset('http://dapds00.nci.org.au/thredds/dodsC/ub8/au/RegionTimeSeries/VOD_NCC2015_VOD_1993-2012.nc')
data

By now, you should be familiar with the display above.  Perhaps you recognise the creator_name in the attributes metadata?

An excellent piece of free software to visualise, explore and map NetCDF is *Panoply*, developed by NASA. [You can download it here](http://www.giss.nasa.gov/tools/panoply/). To avoid any problems with downloading and installing, we will not be using it in the tutorial, but if you will be using the netCDF data type it is strongly recommended for visual exploration and even for publishing nice-looking maps - it's *much* easier than MatLab, and still faster than Python (grumble grumble).

*We're skipping a lot of stuff here, where xarray automatically handles things that are tedious and error-prone in Matlab.  Nice choice to use Python instead!*

Because the dataset has only one attribute we're interested in, let's work with the data array instead of the data set (conceptually, a set of data arrays that happens to only have one entry).  Because this data is relatively small at 14MB, we'll also download the lot to save time later.

In [None]:
VOD = data.VOD
VOD.load()
VOD

Remember that this array still has plenty of metadata - for example, you can see the time of each of the time steps by inspecting `VOD.time` in a new cell ('Insert > Insert Cell Below' in the menu).

In [None]:
# Just to show off, let's make a grid with a VOD map for every timestep
VOD.plot.imshow(robust=True, col='time', col_wrap=5)

Why is the world map on its side?  That's just how the data is stored in this file!  You can change the order of dimensions using [`VOD.transpose('time', 'lat', 'lon')`](http://xarray.pydata.org/en/stable/reshaping.html#reordering-dimensions) (for example), but in this notebook we're going to be reducing the dimensionality of the data and analysing it in a more traditional format (tables! timeseries! statistics!) so it doesn't matter much.  This is of course only possible because we can use the metadata to operate on dimensions by name - much better than having to remember if latitude is `1` or `2` in every file!

## This workshop notebook is incomplete.  

Please ask Zac for suggestions if you're working this far ahead!