# Workflow Guide

This guide will go through the main functions of the rainfall workflow to show how it can be executed.

The main file of this project is "workflow.py". When it is run, these functions execute:

    daily_jobs()
    check_for_bad_smips()
    create_parameter_files()
    create_forecast_files(date)
    create_shuffled_forecasts()
    
As these functions cannot be run in their entirety for the sake of an example, their equivalents will be demonstrated here. While those functions would run through all (or bounded) data, consisting of thousands of sets of coordinates, this guide will use one set of coordinates.  



In [1]:
import workflow

## 0. Update data: daily_jobs()

Before starting working with the data, update it. This function pulls ACCESS-G files from NCI (and restricts the data to Australia only), opens SMIPSv.05 and regrids the latest data to grids matching ACCESS-G, and saves both types of data as single files and to an aggregated file. The aggregated versions will be used in processing. 

Locations of all types of file are in settings.py. 

Running daily_jobs() on 18/11/2019, ACCESS-G data is collected until yesterday, and SMIPS data until the day before yesterday. 

In [8]:
workflow.daily_jobs()

['20191117']
Connection succesfully established ... 
File: 2019/ACCESS_G_accum_prcp_fc_2019111712.nc written




['20191116']
/OSM/CBR/LW_SOILDATAREPO/work/SMIPSRegrid/2019/SMIPS_blnd_prcp_regrid_20191116.nc saved
SMIPS aggregation is already up to date
ACCESS-G aggregation is already up to date
Daily jobs done


## 0.5 check_for_bad_smips()
A hopefully temporary function, this deals with errors found in several SMIPS dates. SMIPS rainfall is compared to a maximum threshold of 900 - if a date has values above this, that date's data is removed in SMIPS.nc.

In [2]:
import time
import settings

coords = [-35.85938, 148.3594]
lat = coords[0]
lon = coords[1]

## 1. Fit model: create_parameter_files()

The first step in the workflow is to fit the bjpmodel on the data and save the parameters associated with each grid cell. 

In [6]:
import parameter_cube

fname = settings.params_filename(lat, lon)
start = time.time()
parameter_cube.generate_forecast_parameters(lat, lon)
print('time to generate parameters: ', time.time()-start, ' seconds')

Timezone found


  censor_idx = fit_data <= censor
  missing_idx = np.abs(fit_data - self.MISSING_DATA_VALUE) < 1E-6


NetCDF Cube doesn't exist at  temp/params/grids/params_-35.85938_148.3594.nc
time to generate parameters:  76.63444638252258  seconds


## 2. Forecast: create_forecast_files(date)

The next step is to read the saved parameters back into memory and create and save a forecast for your chosen (probably today's) date.

In [7]:
import datetime
import transform
import forecast_cube

date = datetime.date(2019, 11, 17)

start = time.time()
mu, cov, tp = parameter_cube.read_parameters(lat, lon)
fname = settings.forecast_filename(date, lat, lon)
transform.transform_forecast(lat, lon, date, mu, cov, tp)
print('time to generate forecast: ', time.time()-start, ' seconds')

Timezone found
NetCDF Cube doesn't exist at  temp/forecast/grids/forecast_20191117_-35.85938_148.3594.nc
Timezone found
Timezone found
Timezone found
Timezone found
Timezone found
Timezone found
Timezone found
Timezone found
time to generate forecast:  1.478586196899414  seconds


## 2.5 Restore spatial correlations in forecast: create_shuffled_forecasts()
The forecast is actually made up of 1000 ensemble members of forecasts. Next, use the Schaake shuffle to restore spatial correlations between grid points - get areas close to each other to have similar precipitation. 

In [8]:
import source_cube

start = time.time()
date_sample = source_cube.sample_date_indices()
lat_dict, lon_dict = source_cube.get_lat_lon_indices()
lat_i = lat_dict[round(float(lat), 2)]
lon_i = lon_dict[round(float(lon), 2)]
transform.shuffle(lat_i, lon_i, date_sample)
print('time to shuffle forecast: ', time.time()-start, ' seconds')

NetCDF Cube doesn't exist at  temp/forecast/shuffled/shuffledforecast_20191101_-35.859375_148.35938.nc
time to shuffle forecast:  3.279794216156006  seconds


And that is all the work in the workflow at the moment. 

Next: hydrological model, soil moisture API. 