fix(db): return data unchanged instead of NULL when no covariates found by anshul23102 · Pull Request #4000 · PecanProject/pecan

anshul23102 · 2026-05-18T06:38:58Z

Bug

arrhenius.scaling.traits() and filter_sunleaf_traits() both set data <- NULL when no matching covariates are found, contradicting their own documentation.

`arrhenius.scaling.traits()`

The docstring says:

Note that data with no matching covariates will be unchanged.

But the else-branch does:

} else {
  data <- NULL   # crashes the caller
}
return(data)

This function is called in query.trait.data() for Vcmax, root_respiration_rate, leaf_respiration_rate_m2, and stem_respiration_rate. When no temperature covariate is recorded in BETYdb for a species, data becomes NULL, and the very next line if (nrow(result) == 0) throws:

Error in nrow(result) : argument is of length zero

The entire trait query pipeline crashes silently. The missing.temp = 25 parameter was designed to handle exactly this case (assume 25°C → scaling is a no-op), but it is never reached.

`filter_sunleaf_traits()`

The same data <- NULL pattern: when no canopy_layer covariate exists, all observations should pass through unfiltered. Instead they are all discarded.

Fix

Remove data <- NULL from both else-branches so the functions fall through to return(data) unchanged, as documented.

Impact

Any PEcAn meta-analysis run that queries a temperature-sensitive trait for a species with no temperature covariate in BETYdb will crash at query.trait.data(). This is a common situation for less-studied species or older database entries.

arrhenius.scaling.traits() and filter_sunleaf_traits() both set data <- NULL in their else-branch, contradicting the documented behaviour ("data with no matching covariates will be unchanged"). When no temperature covariates exist for any observation, the correct response is to return the input data unchanged (equivalent to assuming all measurements were taken at missing.temp = 25 degC, making the Arrhenius scaling a no-op). Returning NULL instead caused a hard crash in query.trait.data() at nrow(NULL) whenever Vcmax, root_respiration_rate, leaf_respiration_rate_m2, or stem_respiration_rate were queried for species with no temperature covariate recorded in BETYdb. filter_sunleaf_traits() has the identical bug: when no canopy_layer covariate is available, all measurements should pass through unfiltered, not be silently discarded.

mdietze · 2026-05-18T12:38:29Z

@dlebauer since I think you originally wrote these bits many many years ago, I was looking for your feedback on which is wrong here, the code or the documentation (i.e., do we want to remove these traits if they lack covariates, or assume they correspond to defaults?). If the documentation is wrong, sounds like there are other fixed downstream that are needed

dlebauer · 2026-05-20T03:44:08Z

@mdietze

When I implemented this, the default of 25C for missing measurement temperature seemed more defensible, given the dataset that we had.

But now, for a temperature-dependent trait with unknown measurement or reference temperature, I think it is safer to either filter those rows out or error rather than silently assume 25C; leave it to the user to explicitly make the decision to fill in missing temperature covariate data.

I think that the sunleaf case is different because the method of taking these measurements from top of canopy seems more standardized.

anshul23102 · 2026-05-20T04:26:58Z

Thanks for the context @dlebauer. That distinction makes sense.
For filter_sunleaf_traits, happy to keep the fix as-is (return data unchanged when no canopy_layer covariate found).
For arrhenius.scaling.traits, I agree silent assumption of 25°C is risky. Would you prefer:

Warn + proceed - emit a logger.warn() when falling back to missing.temp, so the user sees it but the run doesn't crash
Filter - drop rows that have no temperature covariate rather than assigning the default
Error - stop with an informative message telling the user to provide temperature covariates

I can update the PR whichever way you'd like. The original crash (argument is of length zero in nrow()) still needs addressing regardless of which policy we pick.

mdietze · 2026-05-20T13:25:24Z

I vote for 2 + warning: drop rows missing covariates, but send the user a warning letting them know that you've done so (e.g. "X rows of data on Y are dropped due to missing covariate date")

dlebauer · 2026-05-20T16:21:14Z

I agree with @mdietze that there values should be dropped with warning. Thanks @anshul23102 !

anshul23102 · 2026-05-20T19:08:21Z

Thanks for the decision. Here's my plan before I update the code:

arrhenius.scaling.traits: drop rows with no temperature covariate and emit a logger.warn() with the count (e.g. "X rows dropped due to missing temperature covariate"). Returns an empty data frame instead of NULL, which also fixes the nrow() crash downstream.
filter_sunleaf_traits: keep returning data unchanged when no canopy_layer covariate is found, based on @dlebauer's note that the sunleaf measurement method is more standardized, so passing all observations through seems correct here.

Does this sound right to both of you, or should filter_sunleaf_traits follow the same drop + warn approach?

dlebauer · 2026-05-20T20:06:45Z

Looks good to me except:

Returns an empty data frame instead of NULL, which also fixes the nrow() crash downstream.

Returns empty data frame iff no input rows have covariates, else returns with rows that had covariates.

And please make sure to update docs accordingly and add tests - these don't have to comprehensively cover all inputs, just the key functionality.

…aling.traits Previously the function silently filled any missing measurement temperature with the `missing.temp` default (25 C) before applying Arrhenius scaling. For a temperature-dependent trait whose measurement temperature is unknown this produces silently wrong values, and when NO covariates at all were found the function returned NULL, crashing query.trait.data() with 'argument is of length zero'. New behaviour agreed with maintainers: - Rows that have no temperature covariate are dropped and a logger.warn() is emitted with the count of dropped rows. - If no observations have any temperature covariate, an empty data frame (zero rows, same columns) is returned instead of NULL. - filter_sunleaf_traits() continues to return data unchanged when no canopy_layer covariate is found (measurement protocol is more standardised for sun-leaf traits). Changes: - covariate.functions.R: implement drop-and-warn logic; keep missing.temp argument for backward compatibility but mark it as unused in docs. - test.covariate.functions.R: add four focused tests covering all rows present, partial drop with warning, all rows dropped, and the filter_sunleaf_traits empty-covariate path. - NEWS.md: correct the entry to describe the actual new behaviour.

anshul23102 · 2026-05-27T01:32:34Z

Thanks @dlebauer and @mdietze. I have pushed the agreed implementation. Here is a summary of what changed:

arrhenius.scaling.traits() (drop + warn)

Rows whose measurement temperature covariate is missing are now dropped. A logger.warn() is emitted with the count, e.g. "2 row(s) of trait data dropped due to missing temperature covariate.".
If no temperature covariates are found for any observation (the outer else branch), all rows are dropped and the function returns an empty data frame with the same columns as the input, eliminating the NULL return that caused the nrow() crash downstream.
The missing.temp parameter is kept in the signature for backward compatibility but is no longer applied; the doc string marks it as unused.

filter_sunleaf_traits()

No change to logic. Returns data unchanged when no canopy_layer covariate is found, as agreed.

Tests added (test.covariate.functions.R)

Four new tests:

All rows have a temperature covariate: all three rows returned, scaled correctly, no temp column in output.
Partial covariates (rows 2 and 4 have none): those two rows dropped, warning fired, only rows 1 and 3 returned.
No covariates at all: empty data frame returned with columns preserved.
filter_sunleaf_traits with empty covariates: input returned unchanged.

NEWS.md

Entry corrected to describe the drop-and-warn behaviour rather than the previous "return unchanged" description.

All four test scenarios pass locally with the updated code. CI will cover R CMD check.

mdietze · 2026-05-28T16:52:18Z

Failing checks look like they need a make doc and commit of updated Rmd files

…rn behaviour The roxygen docstring changed in the previous commit (missing.temp is now unused; covariates param note updated; description updated). Regenerate the corresponding Rd file so CI tree-is-clean check passes.

mdietze · 2026-05-28T18:15:02Z

@anshul23102 I appreciate your efforts to address the GH Actions token issue, but I'd request that you submit that as a separate PR so it can be evaluated on its own merits rather than as part of the trait workflow. When doing so please tag @infotroph and @robkooper as I'll want at least one of them evaluating the proposed change.

anshul23102 · 2026-05-28T18:28:15Z

Dropped the `ci(render-quarto)` commit from this branch as requested. The GH Actions token fix will go up as a separate PR tagged @infotroph and @robkooper .

What remains here:

`fix(db)`: `arrhenius.scaling.traits()` drops rows with no temperature covariate and warns; returns an empty data frame (not NULL) when no covariates exist for any observation
`fix(db)`: `filter_sunleaf_traits()` returns data unchanged when no `canopy_layer` covariate is found
`docs(db)`: regenerated `arrhenius.scaling.traits.Rd` to match the updated roxygen

All R package checks were passing before the push; removing the Dockerfile change should clear the Docker build failures.

infotroph · 2026-05-28T18:36:59Z

Thanks for splitting the patches! I'll be happy to review when it's ready. For the record this is a general rule: CI changes should always go in separately from code changes.

For this PR, fortunately the rate limit is a transient issue -- if it fails again, at worst we'll wait an hour and then re-trigger the checks.

github-actions Bot added the base label May 18, 2026

anshul23102 force-pushed the fix/covariate-functions-null-return branch from 6ed1fc0 to 823ed4f Compare May 18, 2026 12:34

mdietze assigned dlebauer May 18, 2026

Merge branch 'develop' into fix/covariate-functions-null-return

8635bb7

Merge branch 'develop' into fix/covariate-functions-null-return

a318bfc

anshul23102 and others added 2 commits May 27, 2026 00:51

Merge branch 'develop' into fix/covariate-functions-null-return

09f89ce

github-actions Bot added the tests label May 27, 2026

mdietze approved these changes May 28, 2026

View reviewed changes

mdietze enabled auto-merge May 28, 2026 17:08

mdietze added this pull request to the merge queue May 28, 2026

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 28, 2026

github-actions Bot added dockerfile github-actions labels May 28, 2026

anshul23102 force-pushed the fix/covariate-functions-null-return branch from 4b6e15a to bcf30c6 Compare May 28, 2026 18:28

dlebauer added this pull request to the merge queue May 28, 2026

Merged via the queue into PecanProject:develop with commit e52c594 May 28, 2026
38 of 50 checks passed

Conversation

anshul23102 commented May 18, 2026

Bug

arrhenius.scaling.traits()

filter_sunleaf_traits()

Fix

Impact

Uh oh!

mdietze commented May 18, 2026

Uh oh!

dlebauer commented May 20, 2026

Uh oh!

anshul23102 commented May 20, 2026

Uh oh!

mdietze commented May 20, 2026

Uh oh!

dlebauer commented May 20, 2026

Uh oh!

anshul23102 commented May 20, 2026

Uh oh!

dlebauer commented May 20, 2026

Uh oh!

anshul23102 commented May 27, 2026

Uh oh!

mdietze commented May 28, 2026

Uh oh!

Uh oh!

mdietze commented May 28, 2026

Uh oh!

anshul23102 commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

infotroph commented May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

`arrhenius.scaling.traits()`

`filter_sunleaf_traits()`

anshul23102 commented May 28, 2026 •

edited

Loading