# Seasonal Vegetation Anomalies

Notebook is only compatible on the `NCI` as it uses Landsat Collection 2

## Background

Understanding how the vegetated landscape responds to longer-term environmental drivers such as the El Nino Southern Oscillation (ENSO) or climate change, requires the calculation of seasonal anomalies. Seasonal anomalies subtract the long-term seasonal mean from a time-series, thus removing seasonal variability and highlighting change related to longer-term drivers. 

## Description

This notebook will calculate the seasonal anomaly for any given season and year. The long-term seasonal climatologies for the vegetation indices, `MSAVI`, and `NDVI` (NDVI hasn't been run yet), have been pre-calculated and are stored on disk. Given an AOI, season, and year, the script will calculate the seasonal mean for one of these indices and subtract the seasonal mean from the long-term climatology, resulting in a map of vegetation anomalies for your AOI.  Optionally, the script will output a geotiff of the result. 

**IMPORTANT NOTES:** 

* It is a convention to establish climatologies based on a 30-year time range to account for inter-annual and inter-decadal modes of climate variability (often 1980-2010). As the landsat archive only goes back to 1987, the climatologies here have been calculated using the date-range `Dec 1987 - Feb 2011` (inclusive).  While this is not ideal, a 22-year climatology should suffice to capture the bulk of inter-annual and inter-decadal variability, for example, both a major El Nino (1998) and a major La Nina (2010) are captured by this time-range.
* Currently, the pre-computed climatologies are only `MSAVI mean`, the `MSAVI Standard Deviation` has not been pre-computed, hence it is not yet possible to calculate `standardised anomalies`. `NDVI anomalies` have not yet been run.
* The pre-computed climatologies do not include LS7 after the SLC failure (2003).
* `.yaml` files for running datacube stats to calculate vegetation climatologies are located here: `'/g/data/r78/cb3058/dea-notebooks/vegetation_anomlies/dcstats'`. There is a different .yaml file for each season, and a corresponding custom datacube-statistics python file for each season; these are located here: `'/g/data/r78/datacube_stats/custom_stats'` 
* The pre-computed climatologies are stored here: `/g/data/r78/cb3058/dea-notebooks/vegetation_anomalies/results`.  The script below will use this string location to grab the data, so shifting the climatology mosaics to another location will require editing the `anomalies.py` script in the `src` folder. 
* So far, climatolgies have been produced for the `Northern Murray-Darling Basin` for the `SON` and `JJA` seasons; the `DJF` season has a mosaic that is 2/3 complete (59 out of 90 albers tiles). The `NW Queensland` region has mosaicked climatologies for `JJA` and `DJF`, but these are also only 2/3 complete (60 out of 90 albers tiles processed).

## Technical details

* **Products used:** ls5_nbart_albers, ls7_nbart_albers, ls8_nbart_albers
* **Analyses used:** NDVI, MSAVI, seasonal anomalies

## Getting Started

To run this analysis, go to the `Analysis Parameters` section and enter the relevant details, then run all the cells in the notebook. If running the analysis multiple times, only run the `Set up dask cluster` and `import libraries` cells once.

## Import libraries

In [1]:
import xarray as xr
from datacube.helpers import write_geotiff
import matplotlib.pyplot as plt
import geopandas as gpd
import sys
import os
sys.path.append('src')
from anomalies import calculate_anomalies, load_ard
sys.path.append('../Scripts')
from dea_plotting import display_map, map_shapefile
from dea_dask import create_local_dask_cluster

%load_ext autoreload
%autoreload 2

### Set up local dask cluster

Dask will create a local cluster of cpus for running this analysis in parallel. If you'd like to see what the dask cluster is doing, click on the hyperlink that prints after you run the cell and you can watch the cluster run.

In [2]:
#delete old client if one still exists
create_local_dask_cluster()

0,1
Client  Scheduler: tcp://127.0.0.1:46305  Dashboard: http://127.0.0.1:8787/status,Cluster  Workers: 1  Cores: 8  Memory: 30.67 GB


## Analysis Parameters

The following cell sets the parameters, which define the area of interest and the season to conduct the analysis over. The parameters are:

* `from_shape`: If providing a shapefile to the define the area of interest, set this parameter to `True`, otherwise set to `False`.
* `shp_fpath`: If you set `from_shape` to True, provide a filepath to the shapefile.
* `lat`, `lon`, `buffer`: If not using a shapefile to define the AOI, then use a latitide and longitude point and buffer to define a query 'box'.
* `year`: The year of interest, e.g. `'2018'`
* `season`:  The season of interest, e.g 'DJF','JFM', 'FMA' etc

In [3]:
#input parameters
from_shape = True
shp_fpath = "data/nmdb_individual_catchments/BORDER_RIVERS_NSW.shp"
lat, lon, buff = -33.999, 150.258, 0.5
year = '2018'
season = 'JJA'

#dask chunk size, shouldn't need to change
dask_chunks = {'x':'auto', 'y':'auto'}

### Examine your area of interest

In [4]:
# If you're using a shapefile, run this cell
# map_shapefile(gpd.read_file(shp_fpath), attribute='BNAME')

In [5]:
# If your specifying a lat, lon and buffer, run this cell
# display_map(y=(lat-buffer, lat + buffer), x=(lon-buffer, lon + buffer))

## Calculate the anomaly for the AOI

For large queries (e.g > 4000 x 4000 pixels), the code will take several minutes to run.  Queries much larger than ~15,000 x 15,000 pixels will start to fail due to memory limitations (that said, a 19,000 x 15,000 pixel catchment has been successfully run on the VDI). Check the x,y dimensions in the lazily loaded output to get idea of how big your result will be before you run  the `.compute()` cell.

In [10]:
#Lazily run calculations, this will check for errors before
# we actually compute the results
anomalies = calculate_anomalies(from_shape=from_shape,
                                shp_fpath=shp_fpath,
                                query_box=(lat,lon,buff),
                                year=year,
                                season=season,
                                products=['ga_ls8c_ard_3'],
                                dask_chunks=dask_chunks)

print(anomalies)

extracting data based on shapefile extent
Finding datasets
    ga_ls8c_ard_3
Applying pixel quality/cloud mask
Returning 24 time steps as a dask array
calculating vegetation indice
calculating anomalies
<xarray.Dataset>
Dimensions:        (x: 10974, y: 6210)
Coordinates:
  * x              (x) float64 1.598e+06 1.598e+06 ... 1.927e+06 1.927e+06
  * y              (y) float64 -3.219e+06 -3.219e+06 ... -3.405e+06 -3.405e+06
    band           int64 1
Data variables:
    std_anomalies  (y, x) float32 dask.array<chunksize=(1696, 4347), meta=np.ndarray>
Attributes:
    crs:      epsg:3577
    units:    1


In [11]:
#this cell will actually compute the above calculations
anomalies = anomalies.compute()

## Export geotiff

In [None]:
# Write geotiff to a location
write_geotiff('ndvi_'+year+"_"+season+ '_standardised_anomalies.tif', anomalies)

## Plot the result

If your AOI is very large, plotting the result can crash the notebook. In this case, its better to export the geotiff and view it in QGIS or ArcGIS

In [None]:
anomalies.std_anomalies.plot(figsize=(10,10), vmin=-2.0, vmax=2.0, cmap='BrBG')

plt.title(season+ ", " +year)
plt.show()