Simple cli wrapper#18
Conversation
…isplay run directory during dry-run and modify 00_fetch_s3_and_prepare_run_dir.sh to resolve and use absolute run directory for artifact downloads and extractions.
staging to workflow CLI - magic-ensemble: --config is now required; supports use_apptainer (run prepare steps inside a container) and pecan_dispatch (select how ensemble members are submitted/executed) - workflow_manifest.yaml: defines available dispatch modes (local-gnu-parallel, slurm-dispatch) with appropriate host XML for native and apptainer execution; S3 resources consolidated - Prep scripts: accept CLI flags instead of env vars; stage user-provided external files (e.g. template.xml) into the run directory before prepare steps run - tools/patch_xml.py: utility to patch elements in PEcAn XML config files in-place - 01_ERA5_nc_to_clim.R: ERA5 met inputs now looked up by grid cell center rather than site id - example_user_config.yaml: documents new user-facing options (use_apptainer, pecan_dispatch, external_paths) Relates to: https://github.com/orgs/ccmmf/discussions/182
dlebauer
left a comment
There was a problem hiding this comment.
This looks like a good step toward standardizing the workflow interfaces. The pattern of separating user-facing config from internal workflow details seems like the right direction.
A few things would help clarify the intent and direction. Not to bog down this PR - this works well as MVP proof of concept, but anything not addressed here should be captured in one or more follow up issues.
- In the past we've discussed separating data prep from the rest of the analysis workflows. Is that still a viable option, or is there a rationale for keeping data prep combined with ensembles?
- It's unclear why this lives under 2a_grass - what is the path from here to using this in the workflows that are our core deliverables?
- A README that explains the approach would make it easier to understand and adapt. Including the overall design, what is the role of the cli, config files, execution graph; boundaries between config files, manifest, and template.xml. What general patterns and specific components will be reused when adapting this to other workflows? This can wait until the next iteration, but wanted to make sure it is on the map.
- What is the plan for testing individual components and overall integration?
If this works now and is ready to implement, I'm good with the general pattern. My main question is how robust and extensible this will be. After implementing both the targets and custom workflows, do you have any insights on what to look for and how we would know if this gets to a level of complexity where we would consider a more standard workflow solution?
|
I'm going to answer some of these while working on some other stuff, as i can; re:
2a_grass was the most mature ensembles workflow I had; definitely feel that we should attempt to extract the pieces of the various stages and put them together under the "ensembles" concept. I didn't want to make heavy edits to the workflow itself at the same time I was putting together the CLI. |
- Stage external inputs to manifest-defined destinations rather than source basename; enforce that each external_paths key has a matching manifest.paths entry and error if not - Make get_val() fall through to defaults for missing config keys instead of erroring; add explicit post-resolution required check for run_dir only - Remove spurious check_aws calls from prepare and run-ensembles commands - Reorganize workflow_manifest.yaml: move steps block to top, normalize to 4-space indentation throughout - Add magic-ensemble-DEVELOPERS.md (architecture, internals, dispatch) and magic-ensemble-README.md
|
addressed repo organization in proposal at https://github.com/orgs/ccmmf/discussions/217 |
dlebauer
left a comment
There was a problem hiding this comment.
Looks good to me!
Thanks for adding the READMEs. They are a great start. I think that we can iteratively improve these as we go and as we get feedback.
Will CLI tools be separated into different repositories? It seems we would want to avoid maintaining multiple places that document the overall architecture.
| <revision>git</revision> | ||
| <delete.raw>TRUE</delete.raw> | ||
| <binary>sipnet.git</binary> | ||
| <binary>/usr/local/bin/sipnet.git</binary> |
There was a problem hiding this comment.
is this change intended to require either a) that the binary is located at /usr/local/bin and /or assume that this is within a container?
There was a problem hiding this comment.
This is a good catch. I believe this will induce a bug in some situations, and we will require the use of the XML patch tool to ensure functionality in different run contexts.
I will add a change to this PR.
There was a problem hiding this comment.
Can you say more about why this would need a patch tool? Certainly we need to be able to set the binary path to match the run context, but why not set it in xml_build.R?
Replaces the host-only `patch_dispatch()` function with a generic `patch_xml_block()` that accepts an XML tag name and yq paths for both plain and Apptainer variants. Uses this to patch both the `<host>` block (dispatch config) and the new `<model>` block (SIPNET binary path) in a single prepare pass. Adds `sipnet_model.model_xml` and `sipnet_model.model_xml_apptainer` to the workflow manifest, selecting the Apptainer variant (absolute binary path inside the container) when `use_apptainer=true`. Updates developer docs to reflect the new calling convention and extensibility pattern.
| # Show path for user: relative to INVOCATION_CWD if under it, else absolute | ||
| report_path() { | ||
| local abs_path="$1" | ||
| if [[ -n "$INVOCATION_CWD" && "$abs_path" == "$INVOCATION_CWD"/* ]]; then | ||
| echo "${abs_path#"$INVOCATION_CWD"/}" | ||
| else | ||
| echo "$abs_path" | ||
| fi | ||
| } | ||
|
|
||
| if [[ ! -f "$MANIFEST" ]]; then | ||
| echo "00_fetch_s3_and_prepare_run_dir: Manifest not found: $MANIFEST" >&2 | ||
| exit 1 | ||
| fi | ||
|
|
||
| if ! command -v yq &>/dev/null; then | ||
| echo "00_fetch_s3_and_prepare_run_dir: yq is required to read the manifest." >&2 | ||
| exit 1 | ||
| fi | ||
|
|
||
| cd "$REPO_ROOT" | ||
|
|
||
| # Resolve a path relative to run_dir (RUN_DIR may be absolute or relative to REPO_ROOT). | ||
| resolve_run_path() { |
There was a problem hiding this comment.
Is there an inconsistency between "absolute or REPO_ROOT" and "absolute or INVOCATION_CWD" here? The latter sounds more like how the PEcAn functions expect to work, but maybe in this CLI they'll wind up being the same?
| # LandTrendr TIFs: bucket + key from s3.median_tif and s3.stdv_tif | ||
| median_key_prefix=$(yq eval '.s3.median_tif.key_prefix' "$MANIFEST") | ||
| median_filename=$(yq eval '.s3.median_tif.filename' "$MANIFEST") | ||
| stdv_key_prefix=$(yq eval '.s3.stdv_tif.key_prefix' "$MANIFEST") | ||
| stdv_filename=$(yq eval '.s3.stdv_tif.filename' "$MANIFEST") | ||
| median_s3_key=$(s3_key "$median_key_prefix" "$median_filename") | ||
| stdv_s3_key=$(s3_key "$stdv_key_prefix" "$stdv_filename") | ||
| median_s3_uri="s3://${s3_bucket}/${median_s3_key}" | ||
| stdv_s3_uri="s3://${s3_bucket}/${stdv_s3_key}" |
There was a problem hiding this comment.
Heads up that this may eventually need to support multiple years for the validation workflow. Not certain yet, though
| ((lat + 0.25) %/% 0.5) * 0.5, "N_", | ||
| ((abs(lon) + 0.25) %/% 0.5) * 0.5, "W" | ||
| ) | ||
| ) |
There was a problem hiding this comment.
Historical note since this is the only R file touched here: These are changes backported from the phase 3 version of this script. 👍
| # resources. Fixed paths, S3 resources, and step I/O are defined in | ||
| # workflow_manifest.yaml and are not overridden here. |
There was a problem hiding this comment.
What are the upper and lower bounds on number of distinct config files a user will need to look at to understand the full workflow at this point?
| n_ens: 20 | ||
| n_met: 10 | ||
| ic_ensemble_size: 100 |
There was a problem hiding this comment.
n_ens and ic_ensemble_size can probably be combined here
| <prerun>cp data/events.in @RUNDIR@</prerun> | ||
| </model> | ||
|
|
||
| # Apptainer (not in user config) |
There was a problem hiding this comment.
Is the "not in user config" part a convenience or a requirement? eg would editing this to point to a different Docker org / tag be a valid way of testing with alternate releases?
There was a problem hiding this comment.
Should the filename be magic-ensemble.sh?
| set -euo pipefail | ||
|
|
||
| # --- Repo root, manifest, and invocation CWD --- | ||
| SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" |
| landtrendr_raw_files="${landtrendr_raw_files}${run_dir}/${segment}" | ||
| done < <(yq eval '.paths.landtrendr_raw_files' "$MANIFEST" | tr ',' '\n') | ||
|
|
||
| # --- Pre-execution: AWS S3 tools check --- |
There was a problem hiding this comment.
Probably a bigger question than this workflow, but: Once CARB has all the files we've delivered them onto their own servers, it should conceptually be possible for the whole workflow to run without any S3 access. How many changes would be needed at the CLI level to achieve that?
|
|
||
| ## Architecture Overview | ||
|
|
||
| The CLI is built on a three-layer configuration model: |
There was a problem hiding this comment.
This answers my "how many configs" question above. Thanks!
relating to https://github.com/orgs/ccmmf/discussions/182
This would be the example that would be adapted to fit downscaling, and other workflows as we go.
Note: currently the ccmmf compute nodes are not responsive, so this can only be seen in local execution mode. (login node is fine; Rob is working on the compute nodes)