Simple cli wrapper by hdpriest-ui · Pull Request #18 · ccmmf/workflows

hdpriest-ui · 2026-03-18T20:01:27Z

relating to https://github.com/orgs/ccmmf/discussions/182

magic-ensemble instantiated with verbs for various steps. Internally each verb binds directly to methods in the 2a_grass directory. These would be updated when we re-org the repository.
user-facing configuration with parameters we expect users to change instantiated at example_user_config.yaml. allows users to specify external data resources, control use of apptainer, etc.
non-user facing configuration for the workflow contained in workflow_manifest.yaml, which specifies the pipeline steps and their i/o. Users shouldn't mess with this, but maintainers would.
added tool for patching the XML file with different dispatches as needed by pecan
had to make minor changes to some of the actual 2a_grass scripts - @infotroph ; definitely check here to make sure I didn't break anything. This will have to stay in sync with the content of the data artifacts that we are handing over, and is a potential headache. Stabilizing both sides of this is a good idea at our collective earliest convenience.

This would be the example that would be adapted to fit downscaling, and other workflows as we go.

Note: currently the ccmmf compute nodes are not responsive, so this can only be seen in local execution mode. (login node is fine; Rob is working on the compute nodes)

…hell

…isplay run directory during dry-run and modify 00_fetch_s3_and_prepare_run_dir.sh to resolve and use absolute run directory for artifact downloads and extractions.

staging to workflow CLI - magic-ensemble: --config is now required; supports use_apptainer (run prepare steps inside a container) and pecan_dispatch (select how ensemble members are submitted/executed) - workflow_manifest.yaml: defines available dispatch modes (local-gnu-parallel, slurm-dispatch) with appropriate host XML for native and apptainer execution; S3 resources consolidated - Prep scripts: accept CLI flags instead of env vars; stage user-provided external files (e.g. template.xml) into the run directory before prepare steps run - tools/patch_xml.py: utility to patch elements in PEcAn XML config files in-place - 01_ERA5_nc_to_clim.R: ERA5 met inputs now looked up by grid cell center rather than site id - example_user_config.yaml: documents new user-facing options (use_apptainer, pecan_dispatch, external_paths) Relates to: https://github.com/orgs/ccmmf/discussions/182

dlebauer

This looks like a good step toward standardizing the workflow interfaces. The pattern of separating user-facing config from internal workflow details seems like the right direction.

A few things would help clarify the intent and direction. Not to bog down this PR - this works well as MVP proof of concept, but anything not addressed here should be captured in one or more follow up issues.

In the past we've discussed separating data prep from the rest of the analysis workflows. Is that still a viable option, or is there a rationale for keeping data prep combined with ensembles?
It's unclear why this lives under 2a_grass - what is the path from here to using this in the workflows that are our core deliverables?
A README that explains the approach would make it easier to understand and adapt. Including the overall design, what is the role of the cli, config files, execution graph; boundaries between config files, manifest, and template.xml. What general patterns and specific components will be reused when adapting this to other workflows? This can wait until the next iteration, but wanted to make sure it is on the map.
What is the plan for testing individual components and overall integration?

If this works now and is ready to implement, I'm good with the general pattern. My main question is how robust and extensible this will be. After implementing both the targets and custom workflows, do you have any insights on what to look for and how we would know if this gets to a level of complexity where we would consider a more standard workflow solution?

hdpriest-ui · 2026-03-23T14:51:51Z

I'm going to answer some of these while working on some other stuff, as i can;

re:

2. It's unclear why this lives under 2a_grass - what is the path from here to using this in the workflows that are our core deliverables?

2a_grass was the most mature ensembles workflow I had; definitely feel that we should attempt to extract the pieces of the various stages and put them together under the "ensembles" concept. I didn't want to make heavy edits to the workflow itself at the same time I was putting together the CLI.

- Stage external inputs to manifest-defined destinations rather than source basename; enforce that each external_paths key has a matching manifest.paths entry and error if not - Make get_val() fall through to defaults for missing config keys instead of erroring; add explicit post-resolution required check for run_dir only - Remove spurious check_aws calls from prepare and run-ensembles commands - Reorganize workflow_manifest.yaml: move steps block to top, normalize to 4-space indentation throughout - Add magic-ensemble-DEVELOPERS.md (architecture, internals, dispatch) and magic-ensemble-README.md

hdpriest-ui · 2026-04-06T15:35:54Z

addressed repo organization in proposal at https://github.com/orgs/ccmmf/discussions/217

dlebauer

Looks good to me!

Thanks for adding the READMEs. They are a great start. I think that we can iteratively improve these as we go and as we get feedback.

Will CLI tools be separated into different repositories? It seems we would want to avoid maintaining multiple places that document the overall architecture.

dlebauer · 2026-04-07T00:32:40Z

  <revision>git</revision>
  <delete.raw>TRUE</delete.raw>
-  <binary>sipnet.git</binary>
+  <binary>/usr/local/bin/sipnet.git</binary>


is this change intended to require either a) that the binary is located at /usr/local/bin and /or assume that this is within a container?

This is a good catch. I believe this will induce a bug in some situations, and we will require the use of the XML patch tool to ensure functionality in different run contexts.

I will add a change to this PR.

Can you say more about why this would need a patch tool? Certainly we need to be able to set the binary path to match the run context, but why not set it in xml_build.R?

Replaces the host-only `patch_dispatch()` function with a generic `patch_xml_block()` that accepts an XML tag name and yq paths for both plain and Apptainer variants. Uses this to patch both the `<host>` block (dispatch config) and the new `<model>` block (SIPNET binary path) in a single prepare pass. Adds `sipnet_model.model_xml` and `sipnet_model.model_xml_apptainer` to the workflow manifest, selecting the Apptainer variant (absolute binary path inside the container) when `use_apptainer=true`. Updates developer docs to reflect the new calling convention and extensibility pattern.

infotroph · 2026-04-09T16:26:57Z

+# Show path for user: relative to INVOCATION_CWD if under it, else absolute
+report_path() {
+  local abs_path="$1"
+  if [[ -n "$INVOCATION_CWD" && "$abs_path" == "$INVOCATION_CWD"/* ]]; then
+    echo "${abs_path#"$INVOCATION_CWD"/}"
+  else
+    echo "$abs_path"
+  fi
+}
+
+if [[ ! -f "$MANIFEST" ]]; then
+  echo "00_fetch_s3_and_prepare_run_dir: Manifest not found: $MANIFEST" >&2
+  exit 1
+fi
+
+if ! command -v yq &>/dev/null; then
+  echo "00_fetch_s3_and_prepare_run_dir: yq is required to read the manifest." >&2
+  exit 1
+fi
+
+cd "$REPO_ROOT"
+
+# Resolve a path relative to run_dir (RUN_DIR may be absolute or relative to REPO_ROOT).
+resolve_run_path() {


Is there an inconsistency between "absolute or REPO_ROOT" and "absolute or INVOCATION_CWD" here? The latter sounds more like how the PEcAn functions expect to work, but maybe in this CLI they'll wind up being the same?

infotroph · 2026-04-09T16:29:09Z

+# LandTrendr TIFs: bucket + key from s3.median_tif and s3.stdv_tif
+median_key_prefix=$(yq eval '.s3.median_tif.key_prefix' "$MANIFEST")
+median_filename=$(yq eval '.s3.median_tif.filename' "$MANIFEST")
+stdv_key_prefix=$(yq eval '.s3.stdv_tif.key_prefix' "$MANIFEST")
+stdv_filename=$(yq eval '.s3.stdv_tif.filename' "$MANIFEST")
+median_s3_key=$(s3_key "$median_key_prefix" "$median_filename")
+stdv_s3_key=$(s3_key "$stdv_key_prefix" "$stdv_filename")
+median_s3_uri="s3://${s3_bucket}/${median_s3_key}"
+stdv_s3_uri="s3://${s3_bucket}/${stdv_s3_key}"


Heads up that this may eventually need to support multiple years for the validation workflow. Not certain yet, though

infotroph · 2026-04-09T16:33:28Z

+      ((lat + 0.25) %/% 0.5) * 0.5, "N_",
+      ((abs(lon) + 0.25) %/% 0.5) * 0.5, "W"
+    )
+  )


Historical note since this is the only R file touched here: These are changes backported from the phase 3 version of this script. 👍

infotroph · 2026-04-09T16:37:41Z

+# resources. Fixed paths, S3 resources, and step I/O are defined in
+# workflow_manifest.yaml and are not overridden here.


What are the upper and lower bounds on number of distinct config files a user will need to look at to understand the full workflow at this point?

infotroph · 2026-04-09T16:40:02Z

+n_ens: 20
+n_met: 10
+ic_ensemble_size: 100


n_ens and ic_ensemble_size can probably be combined here

infotroph · 2026-04-09T17:03:54Z

+         <prerun>cp data/events.in @RUNDIR@</prerun>
+        </model>
+
+# Apptainer (not in user config)


Is the "not in user config" part a convenience or a requirement? eg would editing this to point to a different Docker org / tag be a valid way of testing with alternate releases?

infotroph · 2026-04-09T17:04:46Z

Should the filename be magic-ensemble.sh?

infotroph · 2026-04-09T17:11:25Z

+set -euo pipefail
+
+# --- Repo root, manifest, and invocation CWD ---
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"


TIL about BASH_SOURCE!

infotroph · 2026-04-09T17:20:14Z

+  landtrendr_raw_files="${landtrendr_raw_files}${run_dir}/${segment}"
+done < <(yq eval '.paths.landtrendr_raw_files' "$MANIFEST" | tr ',' '\n')
+
+# --- Pre-execution: AWS S3 tools check ---


Probably a bigger question than this workflow, but: Once CARB has all the files we've delivered them onto their own servers, it should conceptually be possible for the whole workflow to run without any S3 access. How many changes would be needed at the CLI level to achieve that?

infotroph · 2026-04-09T17:25:44Z

+
+## Architecture Overview
+
+The CLI is built on a three-layer configuration model:


This answers my "how many configs" question above. Thanks!

hdpriest-ui added 6 commits October 9, 2025 13:43

Create apptainer-sipnet-carb.yml

bb24060

Merge branch 'ccmmf:main' into main

bc10cea

Create run-workflow-examples.yml

83a23da

Add first iteration of workflow CLI with config files and data prep s…

c0f8255

…hell

Enhance run directory handling in scripts: update magic-ensemble to d…

f6f0f7b

…isplay run directory during dry-run and modify 00_fetch_s3_and_prepare_run_dir.sh to resolve and use absolute run directory for artifact downloads and extractions.

hdpriest-ui requested review from dlebauer and infotroph March 18, 2026 20:01

dlebauer reviewed Mar 21, 2026

View reviewed changes

Comment thread .github/workflows/apptainer-sipnet-carb.yml

Comment thread 2a_grass/workflow_manifest.yaml Outdated

Comment thread magic-ensemble Outdated

Comment thread magic-ensemble

Comment thread 2a_grass/00_stage_external_inputs.sh Outdated

dlebauer approved these changes Apr 7, 2026

View reviewed changes

infotroph approved these changes Apr 9, 2026

View reviewed changes

hdpriest-ui merged commit 66e212a into ccmmf:main Apr 9, 2026

hdpriest-ui mentioned this pull request Apr 9, 2026

Ensure remote data logistics are severed from local run prep #20

Open

		# resources. Fixed paths, S3 resources, and step I/O are defined in
		# workflow_manifest.yaml and are not overridden here.


		## Architecture Overview

		The CLI is built on a three-layer configuration model:

Conversation

hdpriest-ui commented Mar 18, 2026

Uh oh!

dlebauer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hdpriest-ui commented Mar 23, 2026

Uh oh!

hdpriest-ui commented Apr 6, 2026

Uh oh!

dlebauer left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants