# Load Single Event Evaluation Data
#####
This notebook walks through the steps to load NWM operational forecasts and verifying observations into a cache of parquet files to enable interactive, visual evaluation of the forecasts. The steps of the loading process include the following:

<ol>
    <li>Start a cluster</li>
    <li>Initialize configurations and select a new or prior event</li>
    <li>Define/view event specs (name, dates, and regional extents)</li>
    <li>Choose datasets to load</li>
    <li>Load the data</li>
</ol>
(Repeat steps 4-5 for each forecast configuration)  


### Load packages 

In [None]:
from pathlib import Path
import panel as pn
pn.extension()

from postevent import config
from postevent.setup import (
    load,
    class_event,
    build_event,
    class_data,
    build_data,
)

# temporary
import importlib

## 1. Start a cluster
Before loading data, start a cluster of nodes for distributed computing to make the loading faster. If running on the TEEHR Hub (detected automatically based on JupyterHub global username 'jovyan') use 4 workers for a small instance and 16 workers for a large instance. The cluster will remain active until you shut it down manually (client.close()) or shut down the server.

In [None]:
if 'client' not in locals():
    client = load.get_client()
client

## 2. Initialize configurations and select a new or prior event

<ul>
    <li>Read path configurations and filenames from <code>post_event_config.json</code> (see the respository README for instructions to customize the config file and download necessary geometry files if running locally)</li>
    <li>Read event definitions file <code>ROOT_DIR/post-event/events/event_definitions.json</code>, choose to 'define new event' or select a previously defined event to load more data for that event</li>
</ul>

In [None]:
importlib.reload(config)
paths = config.Paths("post_event_config_teehrhub.json")                   
paths.event_name_selector = paths.event_name_selector_with_new
paths.event_name_selector

#### Update based on event selections and proceed

In [None]:
paths.update_loading_options()
event = config.Event(paths) 
geo = config.Geo(paths, event)

## 3. Define event information (name, event dates, and regional extents)

<ul>
    <li>The event name is used as the directory name to organize parquet files by event (for faster querying) and for reference in subsequent notebooks</li>
    <li>The dates selected in this step are the start and end dates <b>of the event</b> (use the same date for a single-day event)</li>
    <li>The regional extent can be selected as one or more HUC2s, a lat/lon bounding box, or both (e.g., to select a portion of a HUC2) - currently CONUS only</li>
</ul>


In [None]:
importlib.reload(class_event)
importlib.reload(build_event)

event_selector = class_event.EventSelector(
    dir_name = event.dir_name, 
    event_start_date = event.event_start_date, 
    event_end_date = event.event_end_date,
    event=event,
    paths=paths,
    region=class_event.RegionSelector(
        geo=geo)
)
build_event.build(event_selector)

####   Update event specs based on above selections to proceed with data loading

In [None]:
event_selector.update_event_specs()

## 4. Choose datasets to load

<ul>
    <li>Define the datasets you want to load for the above defined event</li>
    <li>All data available on a given <b>date</b> will be loaded (if an hour is specified, it is ignored)</li>
    <li>If the NWM forecast configuration is not 'none', dates are reference/issue dates. If forecast_config is 'none', dates are value dates </li>  
</ul>
  
*Currently only set up for CONUS - other domains to be added  
*Currently only set up for streamflow and precipitation - other variables to be added (if warranted)

In [None]:
importlib.reload(class_data)
importlib.reload(build_data)

data_selector = class_data.DataSelector_NWMOperational(
    paths=paths,
    event=event,
    dates=config.Dates(
        paths=paths, 
        event=event)
)
build_data.build(data_selector, geo)

## 5. Create the parquet filesets

##### Streamflow:
<ul>
    <li>Load the streamflow data for the data sources (forecast and/or observed) and reach set defined above</li>
</ul>

##### Mean areal precipitation:  
<ul>
    <li>Load the forcing (mean areal precipitation) data for the data sources (forecast and/or observed) and polygons defined above</li>
    <li>MAP polygon options are currently HUC10s and USGS gage basins in this notebook, however grid weights can be externally calculated in TEEHR for any polygon layer</li>
</ul>

In [None]:
importlib.reload(load)
load.launch_teehr_streamflow_loading(paths, event, data_selector)
load.launch_teehr_precipitation_loading(paths, event, geo, data_selector)