Susan is running various configurations of version 202111 that include a simulation of the Iona Island Wastewater Treatment Plant Deep Sea Outfall. Since those are "research run results" in contrast to collections of daily results files from long-running hindcasts the handling of the results files and the Reshapr model profile(s) is a little different.
Note
This section serves as a guide for use of Reshapr for other "research run" applications.
Notable differences include:
- The research runs are executed on an HPC cluster in multi-day segments.
For the Iona wastewater case the runs were done on
graham
. Initial runs were 5 days long for debugging, tunning, and initial analysis development by Jake. Subsequent runs were 1 month long because that fits well in the 12-hour walltime scheduler partition ongraham
. - The run results are downloaded from the HPC cluster to research storage on :file:`/ocean/$USER/` or :file:`/data/$USER/`. For the Iona wastewater case the results were downloaded to directory trees in :file:`/data/sallen/results/MEOPAR/wastewater/` such as :file:`/data/sallen/results/MEOPAR/wastewater/long_run/`.
- The multi-day run results files like :file:`/data/sallen/results/MEOPAR/wastewater/long_run/SalishSea_1h_20180101_20180131_grid_T.nc` must be split into 1-day files stored in date-named subdirectories like :file:`/data/sallen/results/MEOPAR/wastewater/long_run/01jan18/SalishSea_1h_20180101_20180101_grid_T.nc`. At the moment, the beast way to do that is via the SalishSeaCast automation :py:mod:`nowcast.workers.split_results` worker. Only Doug and Susan have the necessary permissions to run that worker. Please ask them for help if you need to split results from another research run.
- The Reshapr model profile is maintained by the user doing the analysis rather than it being included in the Reshapr code repository. Please see the :ref:`IonaWastewaterModelProfile` section below for details.
Store your model profile and extraction configuration YAML files in a Git repository such as your analysis repository so that you can commit your changes to them and push them to GitHub to document your analysis history and make it reproducible. Here is an example from :file:`analysis-doug`:
analysis-doug/
├── ...
├── notebooks
│ ├── ...
│ └── wastewater
│ ├── extract_biology.yaml
│ └── model_profiles
│ └── SalishSeaCast-202111-wastewater-salish.yaml
Store the results of your extractions outside of a Git repository, for example, :file:`/ocean/dlatorne/MOAD/extractions/`. Extracted netCDF files are large binary files. Do not try to push them to GitHub. If you commit them and push them to GitHub you will quickly exceed file and repository size limits. They are products of the extraction process described by your model profile and extraction configuration YAML files. So, having those YAML files under version control is sufficient to enable you to reproduce the extracted netCDF files.
Grab a copy of the model profile YAML file that Doug created: https://github.com/SalishSeaCast/analysis-doug/blob/main/notebooks/wastewater/model_profiles/SalishSeaCast-202111-wastewater-salish.yaml Store your copy of that file in your analysis repository and commit it.
Grab a copy of the sample extraction configuration YAML file that Doug created: https://github.com/SalishSeaCast/analysis-doug/blob/main/notebooks/wastewater/extract_biology.yaml Store your copy of that file in your analysis repository. Edit 2 lines of that file
- line 5 that starts with
model profile:
to set the absolute path to your copy of the model profile YAML file - line 33 that starts with
dest dir:
to set the absolute path to your directory where you will store the results of your extractions
Commit your modified file.
In a terminal session on salish
,
activate your reshapr
conda environment,
and do a test extraction.
For Doug,
that looks like:
cd /ocean/dlatorne/MEOPAR/analysis-doug/
analysis-doug$ conda activate reshapr
(/home/dlatorne/conda_envs/reshapr) analysis-doug$ reshapr extract notebooks/wastewater/extract_biology.yaml
2023-10-19 12:13:43 [info ] loaded config config_file=notebooks/wastewater/extract_biology.yaml
2023-10-19 12:13:43 [info ] loaded model profile model_profile_yaml=/ocean/dlatorne/MEOPAR/analysis-doug/notebooks/wastewater/model_profiles/SalishSeaCast-202111-wastewater-salish.yaml
2023-10-19 12:13:48 [info ] dask cluster dashboard dashboard_link=http://127.0.0.1:8787/status dask_config_yaml=/ocean/dlatorne/MOAD/Reshapr-10jul23/cluster_configs/salish_cluster.yaml
2023-10-19 12:13:49 [info ] extracting variables
2023-10-19 12:13:49,882 - distributed.nanny - WARNING - Restarting worker
2023-10-19 12:13:50 [info ] wrote netCDF4 file nc_path=/ocean/dlatorne/MOAD/extractions/SalishSeaCast_wastewater_day_avg_biology_20180101_20180102.nc
2023-10-19 12:13:50 [info ] total time t_total=7.281958341598511
Be sure to use the path (relative or absolute) to your extraction YAML file in the :command:`reshapr extract` command.
Here is the contents of the example :file:`extract_biology.yaml` file:
# Reshapr configuration to extract day-averages of interesting biology variables
# near Iona Island wastewater outfall
dataset:
model profile: /ocean/dlatorne/MEOPAR/analysis-doug/notebooks/wastewater/model_profiles/SalishSeaCast-202111-wastewater-salish.yaml
time base: day
variables group: biology
dask cluster: salish_cluster.yaml
start date: 2018-01-01
end date: 2018-01-02
extract variables:
- ammonium
- nitrate
- diatoms
selection:
depth:
# NOTE: use depth level numbers, not depths in meters
depth max: 30
grid y:
y min: 430
y max: 471
grid x:
x min: 280
x max: 321
extracted dataset:
name: SalishSeaCast_wastewater_day_avg_biology
description: Day-averaged ammonium, nitrate & diatoms extracted from SalishSeaCast v202111
NEMO model with wastewater outfalls
dest dir: /ocean/dlatorne/MOAD/extractions/
As you build your collection of extraction YAML files remember to give them descriptive names and to commit them with messages that explain what they are for. That ensures that your analysis progress will be well documented and reproducible.
You can change the start and/or end dates for the extraction by editing the start date:
and/or end date:
lines in the YAML file.
Alternatively,
you can use the --start-date
and/or --end-date
command-line options in the
:command:`reshapr extract` command to override the start and/or end dates in the YAML file.
Use :command:`reshapr extract --help` to see the details of how to do that.
You can change the variables that you extract by changing the variable group:
name in line 5,
and the list of variables names in the lines following the extract variables:
key at line 13.
To learn the names of the available variable groups and the variables in them,
use the :command:`reshapr info` command with the path and file name of your model profile.
For example:
reshapr info /ocean/dlatorne/MEOPAR/analysis-doug/notebooks/wastewater/model_profiles/SalishSeaCast-202111-wastewater-salish.yaml
/ocean/dlatorne/MEOPAR/analysis-doug/notebooks/wastewater/model_profiles/SalishSeaCast-202111-wastewater-salish.yaml:
SalishSeaCast version 202111 NEMO with wastewater outfalls results
on storage accessible from salish.
variable groups from time intervals in this model:
day
biology
chemistry
biology growth rates
grazing
light
mortality
physics tracers
vvl grid
hour
biology
chemistry
light
physics tracers
turbulence
u velocity
v velocity
vvl grid
w velocity
Please use reshapr info model-profile time-interval variable-group
(e.g. reshapr info SalishSeaCast-201905 hour biology)
to get the list of variables in a variable group.
Please use reshapr info --help to learn how to get other information,
or reshapr --help to learn about other sub-commands.
shows the lists of variable groups, divided into day-averaged and hour-averaged collections. From that we can see the list of variables in the day-averaged physics tracers variable group with:
reshapr info /ocean/dlatorne/MEOPAR/analysis-doug/notebooks/wastewater/model_profiles/SalishSeaCast-202111-wastewater-salish.yaml day physics tracers
/ocean/dlatorne/MEOPAR/analysis-doug/notebooks/wastewater/model_profiles/SalishSeaCast-202111-wastewater-salish.yaml:
SalishSeaCast version 202111 NEMO with wastewater outfalls results
on storage accessible from salish.
day-averaged variables in physics tracers group:
- sossheig : Sea Surface Height [m]
- votemper : Conservative Temperature [degree_C]
- vosaline : Reference Salinity [g kg-1]
- sigma_theta : Potential Density (sigma_theta) [kg m-3]
- e3t : T-cell Thickness [m]
Please use reshapr info --help to learn how to get other information,
or reshapr --help to learn about other sub-commands.
You can change the depth,
y direction,
and x direction limits of your extraction by editing the selection:
section that starts on
line 18.
Remember that Python uses 0-based indexing and that Python intervals are open on the right.
So,
to get the the y grid point from 430 to 470 you need to use:
selection:
grid y:
y min: 430
y max: 471
You can change the beginning of the file name that your extracted netCDF dataset file will be
written to and the description in its metadata by editing the name:
and description:
values
in lines 30 and 31.
With SalishSeaCast_wastewater_day_avg_biology
as the value of name:
,
and extraction for 2018-01-01 to 2018-01-31 will produce a netCDF file called
:file:`SalishSeaCast_wastewater_day_avg_biology_20180101_20180131.nc`.
You can change the directory where your extracted netCDF dataset files will be written to
by editing the dest dir:
value in line 33.
As noted in :ref:`FileOrganizationAndExecutingExtractions`,
do not store extracted netCDF dataset files in a Git repository or try to commit and push them
to GitHub - they are too large.
Here is the contents of the :file:`SalishSeaCast-202111-wastewater-salish.yaml` file:
description: SalishSeaCast version 202111 NEMO with wastewater outfalls results
on storage accessible from salish.
time coord:
name: time_counter
y coord:
name: y
x coord:
name: x
# Chunking scheme used for the netCDF4 files
# Note that coordinate names (keys) are conceptual here.
# They are replaced with actual coordinate names in files in the code;
# e.g. time is replaced by time_counter for dataset loading
chunk size:
time: 24
depth: 40
y: 898
x: 398
geo ref dataset:
path: https://salishsea.eos.ubc.ca/erddap/griddap/ubcSSnBathymetryV21-08
y coord: gridY
x coord: gridX
extraction time origin: 2007-01-01
results archive:
path: /data/sallen/results/MEOPAR/wastewater/long_run/
datasets:
day:
biology:
file pattern: "{ddmmmyy}/SalishSea_1d_{yyyymmdd}_{yyyymmdd}_biol_T.nc"
depth coord: deptht
chemistry:
file pattern: "{ddmmmyy}/SalishSea_1d_{yyyymmdd}_{yyyymmdd}_chem_T.nc"
depth coord: deptht
biology growth rates:
file pattern: "{ddmmmyy}/SalishSea_1d_{yyyymmdd}_{yyyymmdd}_prod_T.nc"
depth coord: deptht
grazing:
file pattern: "{ddmmmyy}/SalishSea_1d_{yyyymmdd}_{yyyymmdd}_graz_T.nc"
depth coord: deptht
light:
file pattern: "{ddmmmyy}/SalishSea_1d_{yyyymmdd}_{yyyymmdd}_chem_T.nc"
depth coord: deptht
mortality:
file pattern: "{ddmmmyy}/SalishSea_1d_{yyyymmdd}_{yyyymmdd}_graz_T.nc"
depth coord: deptht
physics tracers:
file pattern: "{ddmmmyy}/SalishSea_1d_{yyyymmdd}_{yyyymmdd}_grid_T.nc"
depth coord: deptht
vvl grid:
file pattern: "{ddmmmyy}/SalishSea_1d_{yyyymmdd}_{yyyymmdd}_grid_T.nc"
depth coord: deptht
hour:
biology:
file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_biol_T.nc"
depth coord: deptht
chemistry:
file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_chem_T.nc"
depth coord: deptht
light:
file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_chem_T.nc"
depth coord: deptht
physics tracers:
file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_grid_T.nc"
depth coord: deptht
turbulence:
file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_grid_W.nc"
depth coord: depthw
u velocity:
file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_grid_U.nc"
depth coord: depthu
v velocity:
file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_grid_V.nc"
depth coord: depthv
vvl grid:
file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_grid_T.nc"
depth coord: deptht
w velocity:
file pattern: "{ddmmmyy}/SalishSea_1h_{yyyymmdd}_{yyyymmdd}_grid_W.nc"
depth coord: depthw
When you create new model profile YAML files remember to give them descriptive names and to commit them with messages that explain what they are for. That ensures that your analysis progress will be well documented and reproducible.
To work with model results in a different directory tree,
change the value of path:
in the results archive:
section on line 31.
For example,
if Susan does model runs with alkalinity added to the Iona wastewater discharge,
she might store the run results in
:file:`/data/sallen/results/MEOPAR/wastewater/alkalinity_added/`.
If you are changing the model results path in a model profile,
you should seriously consider storing the profile in a new file with a different name,
updating the description:
at the top of the file,
and committing it to version control.