# Load Post-Event Evaluation Data
This notebook walks through the steps to load NWM operational forecasts and verifying observations into a cache of parquet files to enable interactive, visual evaluation of the forecasts. The steps of the loading process include the following:

1. Start a cluster
2. Initialize configurations and select a new or prior event
3. Define/view event specs (name, dates, and regional extents)
4. Choose datasets to load
5. Load the data
(Repeat steps 4-5 for each forecast configuration)

### Load packages 

In [None]:
import post_event_utils as pu
from pathlib import Path
import panel as pn
pn.extension()

## 1. Start a cluster
Before loading data, start a cluster of nodes for distributed computing to make the loading faster. If running on the TEEHR Hub (detected automatically based on JupyterHub global username 'jovyan') use 4 workers for a small instance and 16 workers for a large instance. (Automatic detection and setting workers to be added).  The cluster will remain active until you shut it down manually (client.close() or shut down the server.

**To monitor data loading progress**:
- Click on the Dashboard URL that appears after running the cell below - **Not currently working on TEEHR Hub**
- Go to the dashboard after launching the data loading step further below

In [None]:
if 'client' not in locals():
    client = pu.get_client()
client

## 2. Initialize configurations and select a new or prior event

- Read the directory configurations and filenames from ```post_event_config.json``` (see the respository README for instructions to customize the config file and download necessary geometry files if running locally)
- Read event definitions file ```ROOT_DIR/post-event/events/event_definitions.json```, choose to 'define new event' or select a previously defined 

In [None]:
config = pu.Config(Path("../config/post_event_config.json"))
config.select_event

## 3. Define/view event specs (name, event dates, and regional extents)

- The event name is used as the directory name to organize parquet files by event (for faster querying) and for reference in subsequent notebooks
- The dates selected in this step are the start and end dates **of the event** (use the same date for a single-day event)
- The regional extent can be selected as one or more HUC2s, a lat/lon bounding box, or both (e.g., to select a portion of a HUC2)  -  **currently CONUS only**

In [None]:
config.get_event_specs()
config.get_geometry()
event_selector = pu.get_event_selector(config)
pu.build_event_selector_dashboard(event_selector)

# Some known issues with the selection map
# - BoxEdit holoviews stream tool not working, latlon limits is a work around
# - Rerendering is slow after changes to lat/lon limits
# - Selected HUC2s disappear if lat/lon limits are changed AFTER HUC2s selected - see notes in post_event_utils.py

## 4. Choose datasets to load
- Define the datasets you want to load for the above defined event 
- All data available on a given **date** will be loaded (if an hour is specified, it is ignored).
- If the NWM forecast configuration is not 'none', dates are reference/issue dates. If forecast_config is 'none', dates are value dates
  
*Currently only set up for CONUS - other domains to be added  
*Currently only set up for streamflow and precipitation - other variables to be added (if warranted)

In [None]:
config.update_event_specs(event_selector)
data_selector = pu.DataSelector(config=config)
pu.build_data_selector_dashboard(data_selector)

## 5. Load data

**Streamflow**:  
- Load the streamflow data for the data sources (forecast and/or observed) and reach set defined above.

**Mean areal precipitation**:  
- Load the forcing (mean areal precipitation) data for the data sources (forecast and/or observed) and polygons defined above. 
- MAP polygon options are currently HUC10s and USGS gage basins in this notebook, however grid weights can be externally calculated in TEEHR for any polygon layer.

In [None]:
pu.launch_teehr_streamflow_loading(config, data_selector)
pu.launch_teehr_precipitation_loading(config, data_selector)

In [None]:
client.close()