# Postpro-observed notebook
This notebook post-processes model output data in a HEC-DSS file. Model output data must be post-processed before creating plots using the plotting notebook. Post-processing does the following:
1. Read DSS data matching each station (identified by DSS B part). For each constituent (Flow, Stage, EC), for each station, all matching DSS paths will be identified.
2. After reading data in the matching DSS paths, the data in the paths will be combined into a single DSS data set, with priority given to shorter time interval data. If there are 15MIN data, all of it will be used. Any gaps will be filled with 1HOUR data. Any remaining gaps will be filled with 1DAY data. The data will be writen to a new dss file with "_calib_postpro" appended to the end of the filename. Example: if the input DSS file is "flow.dss", the output DSS file will be "flow_calib_postpro.dss". 
3. The output DSS file will contain the following:
    a. The original data set
    b. An amplitude data set
    c. A Godin filtered data set
    d. A high value data set (local maximum values)
    e. A low value data set (local minimum values)

Required input
=============
1. One or more location (.csv) files, containing information about the data to be processed. There should be 3 fields in this file: "DSM2 ID", "CDEC ID", and "Station Name". Headers for these fields must be included. The values in the "DSM2 ID" column must match a DSS B parts in the DSS file. Here is an excerpt from an example location file:
    DSM2 ID	CDEC ID	Station Name
    RSAC101	RVB	Sacramento River at Rio Vista
    RSAC092	EMM	Sacramento River at Emmaton 
    RSAC081	CLL	Sacramento River at Collinsville 
2. One of more data (.dss) files, containing the data you need to process. The data in this file will only be processed if the location file contains a "DSM2 ID" matching its B part.

Usage
======
1. In the "Setup for EC", "Setup for Flow", and "Setup for Stage" cells, set the variables to correctly identify the data to be processed.
2. Execute the entire notebook.

In [None]:
import pydsm
from pydsm import postpro

# Dask related functions
Dask uses parallel processing, which will significantly reduce runtime.
However, messages printed to stdout will not be displayed in the notebook.  
This includes messages indicating that plots will not be created for        
certain locations due to missing DSS data. These messages will be displayed 
in the conda prompt window. The use of dask with network drives is not      
recommended--some processes may fail.                                       
This notebook writes DSS files, which does not work in Windows when         
using dask. It works well in Linux. use_dask is set to False by default.    
If using Linux, setting use_dask to True will increase speed.               

In [None]:
# for Windows, should be False
use_dask = False

# Create Dask cluster

In [None]:
import dask
from dask.distributed import Client, LocalCluster

class DaskCluster:
    def __init__(self):
        self.client=None
    def start_local_cluster(self):
        cluster = LocalCluster(n_workers=8, threads_per_worker=1, memory_limit='6G') # threads_per_worker=1 needed if using numba :(
        self.client = Client(cluster)
    def stop_local_cluster(self):
        self.client.shutdown()
        self.client=None
        
def run_all(processors):
    tasks=[dask.delayed(postpro.run_processor)(processor,dask_key_name=f'{processor.study.name}::{processor.location.name}/{processor.vartype.name}') for processor in processors]
    if use_dask:
        dask.compute(tasks)
    else:
        dask.compute(tasks, scheduler='single-threaded')

# Start Dask cluster

In [None]:
cluster = DaskCluster()
cluster.start_local_cluster()
cluster.client

# Specify input DSS files

In [None]:
study_file_map = {
                'DSM2v8.2.0_noSMCD': './model_output/historical_v82b1.dss',
                'DSM2v8.2.0_SMCD': './model_output/v8_2_0_cal_extTo2019_smcd/hist_v82_19.dss'
                  }

# Setup for EC

In [None]:
for study_name in study_file_map:
    dssfile=study_file_map[study_name]
    locationfile='./location_info/calibration_ec_stations.csv'
    vartype='EC'
    units='mmhos/cm'
    observed=False
    processors=postpro.build_processors(dssfile, locationfile, vartype, units, study_name, observed)
    print(f'Processing {vartype} for study: {study_name}')
    run_all(processors)

# Setup for FLOW

In [None]:
for study_name in study_file_map:    
    dssfile=study_file_map[study_name]
    locationfile='./location_info/calibration_flow_stations.csv'
    vartype='FLOW'
    units='cfs'
    observed=False
    processors=postpro.build_processors(dssfile, locationfile, vartype, units, study_name, observed)
    run_all(processors)

# Setup for STAGE

In [None]:
for study_name in study_file_map:    
    dssfile=study_file_map[study_name]
    locationfile='./location_info/calibration_stage_stations.csv'
    vartype='STAGE'
    units='feet'
    observed=False
    processors=postpro.build_processors(dssfile, locationfile, vartype, units, study_name, observed)
    run_all(processors)

# Stop the Dask cluster. Make sure this always runs at end of processing.

In [None]:
cluster.stop_local_cluster()