# Annual updating of AusEFlux <img align="right" src="https://github.com/cbur24/AusEFlux/blob/master/results/banner_picture.png?raw=True" width="40%">

Text describing how to use this notebook goes here

Basically, just run the notebook. There are 4 steps to run. 

Pay attention to the `Analysis Parameters` sections and ensure paths etc. are correct

***
**Ideal compute environment:**

Assuming 5-km resolution

- NCI's 'normal' queue
- X-large (24 cores, 95GiB)
- Python 3.10.0
- Python venv: `/g/data/os22/chad_tmp/AusEFlux/env/py310`
- Folders: `gdata/os22+gdata/ub8+gdata/xc0+gdata/gh70`
***
> **Expected completion time to run all steps: ~3 hours**

## Import libraries and set up Dask

In [None]:
import warnings
warnings.simplefilter(action='ignore')

import sys
sys.path.append('/g/data/os22/chad_tmp/AusEFlux/src/')
from _utils import start_local_dask

In [None]:
client = start_local_dask(mem_safety_margin='2Gb')
client

## Step 1: Spatiotemporal harmonisation of input datasets

Most datasets are originally from here: https://dapds00.nci.org.au/thredds/catalog/ub8/au/catalog.html

**Expected completion time ~2hrs**

### Analysis Parameters

* `base`: Path to where most of the data is stored
* `results`: Path to store interim datasets after they have undergone harmonisatin
* `year_start`: The first year in the series to predict. If predicting for a single year, make `year_start` and `year_end` the same.
* `year_end`: The last year in the series to predict. If predicting for a single year, make `year_start` and `year_end` the same.

In [None]:
base = '/g/data/ub8/au/'
results='/g/data/os22/chad_tmp/AusEFlux/data/interim/'
year = 2023

### Run harmonisation



In [None]:
from _harmonisation import spatiotemporal_harmonisation

In [None]:
spatiotemporal_harmonisation(year=year,
                             year_end=year,
                             base_path=base,
                             results_path=results,
                             verbose=True
                                )

## Step 2: Create feature datasets

Combine results of the spatiotemporal harmonisation into temporally stacked netcdf files, and create new features/variables based on the climate (e.g. anomalies) and remote sesning (e.g veg fractions) datasets. 

**Expected completion time 5 mins**

### Analysis Parameters

* `base`: Path to where the harmonised datasets output from Step 1 are stored. 
* `results`: Path to store temporally stacked netcdf files i.e. where the outputs of Step 2 will be stored
* `exclude`: Variables to exclude from combining. i.e. Some of the variables in `/interim` output in Step 1 are not needed hereafter.

In [None]:
base = '/g/data/os22/chad_tmp/AusEFlux/data/interim/'
results='/g/data/os22/chad_tmp/AusEFlux/data/5km/'
exclude = ['.ipynb_checkpoints', 'kTavg', 'Tmax', 'Tmin', 'EVI']

### Run step 2

In [None]:
from _feature_datasets import create_feature_datasets

In [None]:
create_feature_datasets(base=base,
                       results_path=results,
                       exclude=exclude,
                       verbose=True
                       )

## Step 3: Predict ensemble

Using the ensemble of models, we will generate an ensemble of gridded predictions.

**Expected completion time 30 mins**

### Analysis Parameters

* `base`: Path to where the harmonised datasets output from Step 1 are stored. 
* `results_path`: Path to store temporally stacked netcdf files i.e. where the outputs of Step 2 will be stored
* `year_start`: The first year in the series to predict. If predicting for a single year, make `year_start` and `year_end` the same.
* `year_end`: The last year in the series to predict. If predicting for a single year, make `year_start` and `year_end` the same.
* `models_folder`: where are the models stored?
* `features_list`: Where are the list of features used by the model?

In [None]:
model_var = 'GPP'
base = '/g/data/os22/chad_tmp/AusEFlux/'
year_start, year_end='2023','2023'
results_path = f'{base}results/predictions/ensemble/annual_update/{t1}/{model_var}/'
models_folder = f'{base}results/models/ensemble/{model_var}/'
features_list = f'{base}results/variables.txt'

### Run Step 3


In [None]:
from _ensemble_prediction import predict_ensemble

In [None]:
predict_ensemble(
   base=base,
   model_var=model_var,
   models_folder=models_folder,
   features_list=features_list
   year_start=year_start,
   year_end=year_end
   compute_early=True,
   verbose=True
)

## Step 4: Combine ensembles

Ran an ensemble of predictions, now we need to compute the ensemble median and the uncertainty range.

This step will also output production ready datasets with appropriate metadata

**Expected completion time 10 mins**

### Analysis Parameters

* `base`: Path to where the harmonised datasets output from Step 1 are stored. 

In [None]:
model_var = 'GPP'
base = '/g/data/os22/chad_tmp/AusEFlux/'
year = '2023'
predictions_folder= f'{base}results/predictions/ensemble/annual_update/{year}/{model_var}/'

#metadata for export
full_name = 'Gross Primary Productivity'
version = 'v1.2'
units = 'gC/m2/month'