Correct roof ET / pond ET / LAI defaults in minimal example by mrustl · Pull Request #11 · KWB-R/kwb.raindrop

mrustl · 2026-05-12T19:20:09Z

Summary

Michael reviewed the XLSX dump produced by PR #10 against the SWIMM-UrbanEva reference and flagged three base.h5 defaults that do not match the physical setup of the Wien case study. The minimal-example run_one() now corrects them unconditionally on every row, ahead of the existing 12-scenario sweep:

Excel row	Parameter	Was	Now	Rationale
37	`//Massnahmenelemente/Dach/Berechnungsparameter/Evapotranspiration_aktiv`	`1`	`0`	1000 m² roof is impervious, no vegetation → crop-ET pathway must not run there.
49	`//Massnahmenelemente/Mulde_Rigole/Parameter_Evapotranspiration/LAI_LeafAreaIndex`	`8.5`	`3.9`	Hörnschemeyer grass value (Water 2023, 15, 2840, Tab. 6, plant type 5).
63	`//Massnahmenelemente/Mulde_Rigole/Eigenschaften_Oberflaeche/EvapPond`	`1`	`0`	When grass is submerged in a ponded swale, open-water pond-ET must not run alongside crop-ET.

These are hygiene corrections, not the sweep dimensions. Michael notes that none of them is expected to be the main driver of the 47-58 % vs 7 % ET-share discrepancy, but they should be in place before the engine-switch sweep results are interpreted (SWIMM-UrbanEva impact is small but non-zero).

Test plan

R-CMD-check on windows-latest (devel/oldrel/release) is green
The applied_settings sheet in the next XLSX run shows the three new keys with base_value → scenario_value of 1 → 0, 8.5 → 3.9, 1 → 0 on every scenario
ET-share column in the results table shifts (modestly, per Michael's expectation)

https://claude.ai/code/session_014QrjF51tg7cMVmsgjmfPNG

Generated by Claude Code

Michael reviewed the XLSX dump against the SWIMM-UrbanEva run and flagged three base.h5 defaults that did not match the physical setup of the Wien case study: - //Massnahmenelemente/Dach/Berechnungsparameter/Evapotranspiration_aktiv ships at 1; the 1000 m² roof is impervious with no vegetation, so the crop-ET pathway should not run there. Force to 0. - //Massnahmenelemente/Mulde_Rigole/Eigenschaften_Oberflaeche/EvapPond ships at 1; when the grass is submerged in a ponded swale, the open-water pond-ET pathway must not run alongside the crop one. Force to 0. - //Massnahmenelemente/Mulde_Rigole/Parameter_Evapotranspiration/ LAI_LeafAreaIndex ships at 8.5; pin to the Hörnschemeyer grass value 3.9 (Water 2023, 15, 2840, Tab. 6 plant type 5). Apply all three on every row of the 12-scenario sweep (they are not part of the sweep dimensions). The vignette intro spells out the rationale for each correction explicitly. The earlier "LAI intentionally NOT overridden" comment is removed. Per Michael, none of these is expected to be the main driver of the ~47-58 % ET share vs SWIMM's 7 %, but they are hygiene fixes and should be in place before the engine-switch sweep is interpreted. https://claude.ai/code/session_014QrjF51tg7cMVmsgjmfPNG

After turning off Dach/Evapotranspiration_aktiv per Michael's review, one or more of the 12 sweep scenarios produce a NULL component in their simulation_results, and add_overflow_events_and_waterbalance() trips with no applicable method for 'pivot_wider' applied to NULL inside its internal lapply. The error from a single bad scenario was killing the whole analyse_results chunk and therefore the vignette build. Run add_overflow_events_and_waterbalance() per scenario inside a tryCatch and bind_rows() the survivors. Failing scenarios emit a named diagnostic to message() so the CI log shows which ones broke, but the rest of the table still renders. Affected rows in the results datatable get NA in the water-balance columns via the left_join. This is a vignette-side workaround; the underlying brittleness of add_overflow_events_and_waterbalance() to missing engine outputs (e.g. when roof ET is disabled) is a candidate package fix for a later PR. https://claude.ai/code/session_014QrjF51tg7cMVmsgjmfPNG

The previous tryCatch fix prevented per-scenario errors from crashing the lapply, but if all 12 scenarios fail (as is the case with the new roof-ET-off / pond-ET-off corrections) then bind_rows(list(NULL,...)) returns a 0-row, 0-column tibble. The downstream left_join then trips because the stub tibble has no s_name column: Error in dplyr::left_join(): Join columns in y must be present in the data. Problem with s_name. Two further tweaks: - Replace message() with cat(): rmarkdown vignette processing silently swallows message() output, so the per-scenario failure diagnostics never reached the CI log. cat() goes through knitr's text capture and shows up in both the rendered vignette HTML and the build log. - If every scenario fails, fabricate a stub tibble `tibble(s_name = character())` so the left_join still has the join key and the chunk renders to a results table where every water-balance column is NA. The chunk now prints a one-line summary "water-balance summary: N / 12 scenarios succeeded" plus one line per failed scenario with the underlying error. Daniel can read that directly from the rendered vignette to see whether the roof-ET-off correction is the blanket cause or whether some scenario combinations still survive. https://claude.ai/code/session_014QrjF51tg7cMVmsgjmfPNG

The base.h5 corrections (Dach ET off, pond ET off, LAI = 3.9) came out of Daniel's XLSX review of the SWIMM-UrbanEva comparison run, not Michael's. Michael's contribution is the May 2026 LAI sweep referenced later in the same intro paragraph, which stays attributed to him. The earlier commit messages in this PR ("Correct roof ET, pond ET and LAI defaults...") also misattribute to Michael, but rewriting history on a shared branch is riskier than just leaving the stale strings there; the in-tree text is what readers see. https://claude.ai/code/session_014QrjF51tg7cMVmsgjmfPNG

Root cause for the "0 / 12 scenarios succeeded" failure: with //Massnahmenelemente/Dach/Berechnungsparameter/Evapotranspiration_aktiv set to 0 the engine no longer writes the connected-area Dach.h5. get_simulation_results_optim_parallel() short-circuits to NULL when either result file is missing (see R/get_simulation_results_optim_parallel.R lines 76-82), so every scenario yields NULL and the downstream water-balance pivot_wider() trips on every row. The robust-tryCatch + stub-tibble guard added earlier keeps the chunk from erroring, but the resulting table is empty of useful data. That defeats the diagnostic purpose of the vignette. Hotfix: leave Dach ET on (= 1) for now and document why in the vignette intro and code comment. Keep the other two Daniel-review corrections (EvapPond = 0, LAI = 3.9) which do not affect output file generation. Will revisit once the package's result reader and water-balance pipeline are robust to a missing connected_area H5. NEWS.md and the vignette intro updated to reflect that two of the three Daniel corrections are now applied. https://claude.ai/code/session_014QrjF51tg7cMVmsgjmfPNG

When //Massnahmenelemente/Dach/Berechnungsparameter/Evapotranspiration_aktiv is 0, the engine skips writing the connected-area H5 (Dach.h5). The previous result reader treated that as a hard error and returned NULL for every affected scenario, and the downstream water-balance function then tripped on pivot_wider(NULL). This bricked the entire analyse_results chunk of the minimal example. Package-side fixes: - get_simulation_results_optim() / _parallel(): require only the element H5 (Mulde_Rigole.h5). If Dach.h5 is missing, populate $element and set $connected_area = NULL. - add_overflow_events_and_waterbalance(): refactor compute_one() so it handles NULL whole scenarios (returns an all-NA row), NULL connected_area (skips that section, leaves connectedarea.*_ columns missing for that row), and NULL element$rates (overflow metrics fall back to NA). Per-section helpers (wb_percent, compute_overflows, na_row) replace the previous monolithic body. dplyr::bind_rows() aligns column sets across rows that have different shapes, so a 12-row table with some scenarios lacking connectedarea.*_ columns now renders cleanly. Vignette: - Re-enable //Massnahmenelemente/Dach/Berechnungsparameter/Evapotranspiration_aktiv = 0, which was the original third correction from Daniel's XLSX review. - Remove the per-scenario tryCatch workaround in analyse_results; the package itself now degrades gracefully so the chunk can call add_overflow_events_and_waterbalance() once over the whole simulation_results list. NEWS.md updated to list the package-side fixes and to mark the Dach ET=0 correction as applied. https://claude.ai/code/session_014QrjF51tg7cMVmsgjmfPNG

claude · 2026-05-13T07:08:55Z

-          abs(WB_Oberflaechenablauf_Verschaltungen)
-        )
-      ) %>%
+      dplyr::transmute(denom = denom_fn(.data)) %>%


Missing @importFrom rlang .data

The refactor introduces .data (here in denom_fn(.data)), but the roxygen block doesn't add @importFrom rlang .data, so R CMD check will likely raise a NOTE for "no visible binding for global variable '.data'". The other files in this package that use .data (plot_main_effects.R, plot_wb_tradeoff_overflows.R, plot_valid_design_space.R, read_hdf5_connections.R, read_hdf5_timeseries.R) all declare it; please add it here too:

#' @importFrom rlang .data

A few bare-column references (variable, value in the first pivot_wider at line 78; value > 0 and variable == ... in compute_overflows) are still present and were also there in the previous version — same kind of NOTE pre-existed, but if you're touching the file it's a low-cost cleanup to either quote those (names_from = "variable") or switch them to .data$variable.

claude · 2026-05-13T07:08:59Z

+        res_hdf5_element <- hdf5r::H5File$new(paths$path_results_hdf5_element, mode = "r")
+        on.exit(try(res_hdf5_element$close_all(), silent = TRUE), add = TRUE)
+
+        element <- list(
+          meta          = kwb.raindrop::read_hdf5_scalars(res_hdf5_element[["Metainfo"]],
+                                                          numeric_only = FALSE),
+          rates         = kwb.raindrop::read_hdf5_timeseries(res_hdf5_element[["Raten"]]),
+          water_balance = kwb.raindrop::read_hdf5_scalars(res_hdf5_element[["Wasserbilanz"]]),
+          states        = kwb.raindrop::read_hdf5_timeseries(res_hdf5_element[["Zustandsvariablen"]])
+        )
+
+        connected_area <- if (has_flaeche) {
+          res_hdf5_flaeche <- hdf5r::H5File$new(paths$path_results_hdf5_flaeche, mode = "r")
+          on.exit(try(res_hdf5_flaeche$close_all(), silent = TRUE), add = TRUE)


on.exit(..., add = TRUE) inside catAndRun(expr = { ... }) — worth a manual sanity-check

The on.exit(...) calls are textually inside the expr argument to kwb.utils::catAndRun. Whether the exit handler fires when the lapply iteration returns or only when catAndRun returns depends on which frame R associates with the promise when it forces it. In practice this should be fine (the handler registers in the frame where on.exit is evaluated, which is the calling function), and both element and connected_area handles end up closed before the iteration result is collected — but it's the kind of thing that's easy to verify once and worth a one-off check, e.g. wrap one of the close calls with message(...) and watch the output relative to the per-scenario "(i/N)" log.

The parallel sibling (get_simulation_results_optim_parallel.R) doesn't have this concern because on.exit there is directly inside run_one, not inside catAndRun(expr = ...).

claude · 2026-05-13T07:09:06Z

+  wb_percent <- function(wb_raw, prefix, denom_fn) {
+    if (!has_rows(wb_raw)) return(tibble::tibble())
+    denom <- wb_raw %>%
      tidyr::pivot_wider(names_from = variable, values_from = value) %>%
-      dplyr::transmute(
-        denom = dplyr::if_else(
-          !is.na(WB_Regen) & WB_Regen != 0,
-          WB_Regen,
-          abs(WB_Oberflaechenablauf_Verschaltungen)
-        )
-      ) %>%
+      dplyr::transmute(denom = denom_fn(.data)) %>%
      dplyr::pull(denom)
-
-    wb_connectedarea <- wb_ca_raw %>%
+    wb_raw %>%
      dplyr::mutate(
-        variable = sprintf("connectedarea.%s_", variable),
-        value_percent = round(100 * value / denom_connectedarea, 2)
+        variable = sprintf("%s.%s_", prefix, variable),
+        value_percent = round(100 * value / denom, 2)
      ) %>%
      dplyr::select(-value) %>%
-      tidyr::pivot_wider(names_from = "variable", values_from = "value_percent")
-
-    # --- Overflow-Zeitreihe vorbereiten --------------------------------------
-    rates_all <- simulation_results[[s_name]]$element$rates %>%
+      tidyr::pivot_wider(names_from = "variable",
+                         values_from = "value_percent")
+  }


Behaviour around missing connected_area$water_balance is more nuanced than the doc suggests

When wb_raw is NULL/0-row, wb_percent returns a 0×0 tibble::tibble(). The per-scenario row is then built via tibble(s_name=…, n_overflows=…, …) %>% bind_cols(wb_element) %>% bind_cols(wb_connectedarea), and missing-side columns are omitted entirely for that scenario. The final dplyr::bind_rows() over all scenarios then fills those columns with NA only if at least one other scenario contributed them. Two edge cases worth being aware of:

If all scenarios in a batch miss connected_area (e.g. roof ET disabled everywhere — exactly what the new vignette does), the final tibble has no connectedarea.*_ columns at all. Anything downstream that expects them to exist (e.g. a dplyr::select(starts_with("connectedarea.")) or a hard-coded join key) will fail rather than see NAs. Not a problem for the current vignette since the join is on s_name only, but worth either documenting or guarding with explicit NA_real_ columns when both wb branches are empty.

The new doc lines 20-28 promise "rows with NAs for missing components" — closer to the truth would be "columns aligned across scenarios via bind_rows() get NAs where the scenario didn't contribute them; columns absent from every scenario are dropped entirely."

claude · 2026-05-13T07:09:09Z

+      meta          = read_hdf5_scalars(res_hdf5_element[["Metainfo"]],
+                                        numeric_only = FALSE),
+      rates         = read_hdf5_timeseries(res_hdf5_element[["Raten"]]),
+      water_balance = read_hdf5_scalars(res_hdf5_element[["Wasserbilanz"]]),
+      states        = read_hdf5_timeseries(res_hdf5_element[["Zustandsvariablen"]])
+    )
+
+    connected_area <- if (has_flaeche) {
+      res_hdf5_flaeche <- hdf5r::H5File$new(run_paths$path_results_hdf5_flaeche, mode = "r")
+      on.exit(try(res_hdf5_flaeche$close_all(), silent = TRUE), add = TRUE)
+      list(
+        meta          = read_hdf5_scalars(res_hdf5_flaeche[["Metainfo"]],
+                                          numeric_only = FALSE),
+        rates         = read_hdf5_timeseries(res_hdf5_flaeche[["Raten"]]),
        water_balance = read_hdf5_scalars(res_hdf5_flaeche[["Wasserbilanz"]]),
-        states = read_hdf5_timeseries(res_hdf5_flaeche[["Zustandsvariablen"]])
+        states        = read_hdf5_timeseries(res_hdf5_flaeche[["Zustandsvariablen"]])


Inconsistent qualification of read_hdf5_scalars / read_hdf5_timeseries

These are called unqualified here but as kwb.raindrop::read_hdf5_scalars(…) / kwb.raindrop::read_hdf5_timeseries(…) in the non-parallel sibling. It works (future.apply serialises the closure and the package is loaded in workers via the namespace), but the inconsistency between the two files is a minor maintenance hazard — pick one style. The fully-qualified form is more robust against future changes to how future_lapply discovers globals on remote workers.

claude · 2026-05-13T07:09:21Z

+    if (!has_element) {
+      if (isTRUE(debug)) {
+        message(sprintf(
+          "Missing element H5 for %s ('%s') -> returning NULL",
+          s_name, paths$path_results_hdf5_element
+        ))
+      }
+      return(NULL)
+    }


Stale roxygen above this block — please update

The behaviour change introduced here (only the element HDF5 forces a NULL return; a missing connected-area HDF5 yields a partial result) is not reflected in the roxygen at the top of the file, which still says:

Lines 14-15: "If either of the expected HDF5 files is missing for a run, the corresponding list entry will be NULL."

Lines 27-28 (@return): "Each entry is either NULL (missing files) or a nested list"

Lines 41-42 (@details): "The HDF5 handles are not explicitly closed; depending on your workflow, you may want to close them" — also stale since the new code uses on.exit(...$close_all()).

Please rewrite to match the new behaviour, e.g. "If the element HDF5 is missing the entry is NULL; if only the connected-area HDF5 is missing, the entry is a list with the element side populated and connected_area = NULL. HDF5 handles are closed on exit." Then re-run devtools::document() so man/get_simulation_results_optim.Rd picks it up (currently the .Rd still has the stale wording verbatim).

The same stale wording lives in R/get_simulation_results_all.R lines 14-15 / 39-42 and its .Rd; consider fixing both so the three sibling functions stay consistent, even if _all isn't otherwise touched by this PR.

@return

Bot review items, in order: - add_overflow_events_and_waterbalance(): add @importFrom rlang .data and clean up bare-column references inside dplyr pipes (filter, mutate, select, relocate, arrange). Replaces the previous denom_fn(.data) hack with a clean two-step "pivot_wider then call denom_fn(wide_tbl)" pattern that doesn't abuse the .data pronoun. - add_overflow_events_and_waterbalance(): rewrite the doc paragraph about missing components. Previously the doc promised "rows with NAs"; the actual semantic is closer to "bind_rows() aligns columns across scenarios — a column appears as NA only if at least one scenario contributed it, and is absent entirely if every scenario lacks that side". Document the consequence so downstream code knows to check for column presence. - get_simulation_results_optim(): move the H5 handle opens and the on.exit() registrations *outside* catAndRun(expr = ...) so the handles bind to the lambda's own frame rather than catAndRun's internal expr frame — removes any ambiguity about when handles close. - get_simulation_results_optim_parallel(): qualify read_hdf5_scalars()/read_hdf5_timeseries() with kwb.raindrop:: consistently across both files (was qualified in the non-parallel sibling already). future_lapply tends to discover globals reliably, but the fully-qualified form is more robust against future changes to future.apply's worker setup. - get_simulation_results_optim() roxygen: the previous text still said "If either of the expected HDF5 files is missing for a run, the corresponding list entry will be NULL" and that "The HDF5 handles are not explicitly closed". Both are stale after the recent behaviour change; rewrite to describe the new partial- result + on.exit semantics, in @details and @return. get_simulation_results_optim_parallel.Rd gets the matching partial-result note in its @return. The .Rd files are hand-edited to mirror the new roxygen since there is no R available to re-run roxygen2::roxygenise() in this environment. R/get_simulation_results_all.R has the same stale wording but its body is unchanged in this PR (still requires both files); left alone for now to keep PR scope tight. Tracked as a follow-up. https://claude.ai/code/session_014QrjF51tg7cMVmsgjmfPNG

Preventive ASCII-portability cleanup for R/get_simulation_results_optim_parallel.R and R/add_overflow_events_and_waterbalance.R, where the recent refactor introduced U+2014 (em dash). The R CMD check non-ASCII NOTE only ever flags Rd files in this package, and the existing R-source non-ASCII content (German comments in compute_costs.R / download_engine.R) is suppressed by Encoding: UTF-8, so this is hygiene rather than a fix. The CI failure I'm chasing is something else; logs incoming. https://claude.ai/code/session_014QrjF51tg7cMVmsgjmfPNG

Daniel's review corrections introduced in PR #11 (Dach/Evapotranspiration_aktiv = 0, EvapPond = 0, LAI = 3.9) make the Tandler engine return Status 1 for every scenario, so every result column ends up NA in the rendered datatable. Daniel asked to "restore the last working version" -- the last version that actually produced engine output was the PR #10 state of run_one(), where the three corrections were not yet applied. Roll back only the three engine-incompatible overrides in example_wien_minimal's run_one(). Keep: - The package-side robustness from PR #11 (partial result when Dach.h5 is missing; NULL/missing-component tolerance in add_overflow_events_and_waterbalance). - The mirror_stub and canonical_variables fallbacks from PR #12 (table layout survives both partial- and all-NULL batches). - The per-scenario tryCatch + strict = FALSE + scalar_strategy = "first" in the vignette (one broken scenario no longer aborts the whole loop). With this commit the engine produces output again, the result columns are populated, and the column-structure work added in this PR is still in place to absorb future failures gracefully. The three corrections themselves remain physically motivated -- they will be re-introduced as sweep dimensions in a follow-up diagnostic vignette so we can see which combination Tandler rejects. https://claude.ai/code/session_014QrjF51tg7cMVmsgjmfPNG

claude added 6 commits May 12, 2026 19:19

claude Bot reviewed May 13, 2026

View reviewed changes

claude added 2 commits May 13, 2026 07:14

mrustl merged commit 4274302 into main May 13, 2026
6 checks passed

mrustl deleted the claude/correct-roof-et-pond-lai branch May 13, 2026 07:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correct roof ET / pond ET / LAI defaults in minimal example#11

Correct roof ET / pond ET / LAI defaults in minimal example#11
mrustl merged 8 commits into
mainfrom
claude/correct-roof-et-pond-lai

mrustl commented May 12, 2026

Uh oh!

claude Bot May 13, 2026

Uh oh!

claude Bot May 13, 2026

Uh oh!

claude Bot May 13, 2026

Uh oh!

claude Bot May 13, 2026

Uh oh!

claude Bot May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mrustl commented May 12, 2026

Summary

Test plan

Uh oh!

claude Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants