# Seasonal Vegetation Anomalies



## Background

Understanding how the vegetated landscape responds to longer-term environmental drivers such as the El Nino Southern Oscillation (ENSO) or climate change, requires the calculation of seasonal anomalies. Seasonal anomalies subtract the long-term seasonal mean from a time-series, thus removing seasonal variability and highlighting change related to longer-term drivers. 

## Description

This notebook will calculate the seasonal anomaly for any given season and year. The long-term seasonal climatologies (both mean and standard deviation) for the vegetation index `NDVI` have been pre-calculated and are stored on disk. Given an AOI, season, and year, the script will calculate the seasonal mean for one of these indices and subtract the seasonal mean from the long-term climatology, resulting in a map of standardised vegetation anomalies for your AOI.  Optionally, the script will output a geotiff of the result. 

**IMPORTANT NOTES:** 

* It is a convention to establish climatologies based on a 30-year time range to account for inter-annual and inter-decadal modes of climate variability (often 1980-2010). As the landsat archive only goes back to 1987, the climatologies here have been calculated using the date-range `1988 - 2010` (inclusive).  While this is not ideal, a 22-year climatology should suffice to capture the bulk of inter-annual and inter-decadal variability, for example, both a major El Nino (1998) and a major La Nina (2010) are captured by this time-range.

* Files & scripts for running datacube stats to calculate vegetation climatologies are located here: `'/g/data/r78/cb3058/dea-notebooks/vegetation_anomlies/dcstats'`. 

* The pre-computed climatologies are stored here: `/g/data/r78/cb3058/dea-notebooks/vegetation_anomalies/results/NSW_NDVI_climatologies_<mean>`.  The script below will use this string location to grab the data, so shifting the climatology mosaics to another location will require editing the `anomalies.py` script.

* So far, NDVI climatolgies have been produced for the full extent of NSW only. 

## Technical details

* **Products used:** 'ga_ls5t_ard_3', 'ga_ls7e_ard_3', 'ga_ls8c_ard_3'


## Getting Started

To run this analysis, go to the `Analysis Parameters` section and enter the relevant details, then run all the cells in the notebook. If running the analysis multiple times, only run the `Set up dask cluster` and `import libraries` cells once.

## Import libraries

In [1]:
import xarray as xr
from datacube.helpers import write_geotiff
import matplotlib.pyplot as plt
import geopandas as gpd
import sys
import os

sys.path.append('../Scripts')
from dea_plotting import display_map, map_shapefile
from anomalies import calculate_anomalies, load_ard
from dea_dask import create_local_dask_cluster

%load_ext autoreload
%autoreload 2

### Set up local dask cluster

Dask will create a local cluster of cpus for running this analysis in parallel. If you'd like to see what the dask cluster is doing, click on the hyperlink that prints after you run the cell and you can watch the cluster run.

In [2]:
create_local_dask_cluster()

0,1
Client  Scheduler: tcp://127.0.0.1:35296  Dashboard: http://127.0.0.1:8787/status,Cluster  Workers: 1  Cores: 8  Memory: 30.67 GB


## Analysis Parameters

The following cell sets the parameters, which define the area of interest and the season to conduct the analysis over. The parameters are:

* `shp_fpath`: Provide a filepath to a shapefile that defines your AOI, if not using a shapefile then put `None` here.
* `lat`, `lon`, `buffer`: If not using a shapefile to define the AOI, then use a latitide, longitude, and buffer to define a query 'box'.
* `year`: The year of interest, e.g. `'2018'`
* `season`:  The season of interest, e.g `'DJF'`,`'JFM'`, `'FMA'` etc
* `name` : A string value used to name the output geotiff, e.g 'NSW'
* `dask_chunks` : dictionary of values to chunk the data using dask e.g. `{'x':3000, 'y':3000}`

In [3]:
shp_fpath = "data/NSW_and_ACT.shp" #"data/nmdb_individual_catchments/PAROO RIVER.shp"
lat, lon, buff = -33.999, 150.258, 0.5
year = '2018'
season = 'JJA'
name='NSW'
dask_chunks = {'x':3000, 'y':3000}

### Examine your area of interest

In [4]:
if shp_fpath is not None:
    map_shapefile(gpd.read_file(shp_fpath), attribute='NSW_STAT_1')
else:
    display_map(y=(lat-buffer, lat + buffer), x=(lon-buffer, lon + buffer))

Label(value='')

Map(basemap={'url': 'http://server.arcgisonline.com/ArcGIS/rest/services/World_Imagery/MapServer/tile/{z}/{y}/…

## Calculate the anomaly for the AOI

For large queries (e.g > 10,000 x 10,000 pixels), the code will take several minutes to run.  Queries larger than ~25,000 x 25,000 pixels may start to fail due to memory limitations (several (42,000 x 35,000 x 52) runs covering all of NSW has been successfully run on the VDI). Check the x,y dimensions in the lazily loaded output to get idea of how big your result will be before you run  the `.compute()` cell.

In [5]:
#Lazily run calculations, this will check for errors before
# we actually compute the results
anomalies = calculate_anomalies(shp_fpath=shp_fpath,
                                query_box=(lat,lon,buff),
                                year=year,
                                season=season,
                                dask_chunks=dask_chunks)

print(anomalies)

extracting data based on shapefile extent
Finding datasets
    ga_ls5t_ard_3
    ga_ls7e_ard_3 (ignoring SLC-off observations)
    ga_ls8c_ard_3
Applying pixel quality/cloud mask
Returning 58 time steps as a dask array
['2018-06-01T23:59:50.903914000' '2018-06-03T23:47:04.911934000'
 '2018-06-05T00:30:44.280167000' '2018-06-05T23:35:07.106422000'
 '2018-06-07T00:18:23.569419000' '2018-06-09T00:06:03.483406000'
 '2018-06-10T23:53:43.324239000' '2018-06-12T23:40:59.245962000'
 '2018-06-14T00:24:39.608354000' '2018-06-16T00:12:19.389808000'
 '2018-06-17T23:59:59.102667000' '2018-06-19T23:47:14.858499000'
 '2018-06-21T00:30:55.122730000' '2018-06-21T23:35:18.440436000'
 '2018-06-23T00:18:34.761697000' '2018-06-25T00:06:14.371763000'
 '2018-06-26T23:53:53.917409000' '2018-06-28T23:41:09.503032000'
 '2018-06-30T00:24:49.719017000' '2018-07-02T00:12:29.170689000'
 '2018-07-04T00:00:08.577198000' '2018-07-05T23:47:24.016040000'
 '2018-07-07T00:31:04.121990000' '2018-07-07T23:35:27.317186000'
 

In [6]:
%%time
anomalies = anomalies.compute()

CPU times: user 18min 51s, sys: 1min 17s, total: 20min 8s
Wall time: 1h 9min 7s


## Export geotiff

In [None]:
# Write geotiff to a location
write_geotiff(name+'_ndvi_'+year+"_"+season+ '_standardised_anomalies.tif', anomalies)

## Plot the result

If your AOI is very large, plotting the result can crash the notebook. In that case, its better to export the geotiff and view it in QGIS or ArcGIS

In [None]:
anomalies.std_anomalies.plot(figsize=(10,10), vmin=-2.0, vmax=2.0, cmap='BrBG')

plt.title(season+ ", " +year)
plt.show()