Skip to content

fix: seed sample_calc_dataset fixture to eliminate flaky pressure test#367

Merged
darothen merged 1 commit into
developfrom
fix/flaky-calc-test-seed
Apr 24, 2026
Merged

fix: seed sample_calc_dataset fixture to eliminate flaky pressure test#367
darothen merged 1 commit into
developfrom
fix/flaky-calc-test-seed

Conversation

@darothen
Copy link
Copy Markdown
Collaborator

Summary

  • Seeds the sample_calc_dataset pytest fixture with np.random.default_rng(42) to produce bit-reproducible data across Python versions
  • Fixes a flaky test_pressure_at_surface failure on Python 3.13 where unseeded random draws for orographic heights could produce extreme negative values, pushing barometric-formula surface pressure above the 105,000 Pa test bound

Root cause

The fixture used the global numpy RNG (np.random.normal, np.random.uniform) without seeding. Python 3.13 initializes the global RNG state differently, allowing draws like −320 m for surface heights that are physically impossible but statistically valid — causing the derived surface pressure to exceed the test assertion bound.

Test plan

  • Run pytest tests/test_calc.py::TestPressureCalculations::test_pressure_at_surface — should pass consistently
  • Run full test suite — no other test should be affected by the fixture seed change

🤖 Generated with Claude Code

np.random.normal(500, 200) for orographic heights occasionally draws
extreme negative values (e.g. -320 m below sea level), which push the
barometric-formula surface pressure above the test's 105 000 Pa bound.
The failure was latent but surfaced on Python 3.13 due to a different
RNG initialisation state.

Replace all np.random.* calls in the fixture with a local seeded
np.random.default_rng(42) instance so generated data is bit-reproducible
across Python versions and the pressure range assertion holds reliably.
@darothen
Copy link
Copy Markdown
Collaborator Author

This arises from an issue I encountered in #366 where we had flaky tests on Python 3.13 that I traced back to some changes in random number generation. The fix was extremely straightforward so I delegated to an agent. Will merge after checks pass.

@darothen darothen merged commit 446cff9 into develop Apr 24, 2026
8 checks passed
@darothen darothen deleted the fix/flaky-calc-test-seed branch April 24, 2026 18:54
@aaTman aaTman mentioned this pull request Apr 30, 2026
aaTman added a commit that referenced this pull request Apr 30, 2026
* Add pressure_dimension_str arg to geopotential_thickness (#297)

* `DurationMeanError` memory fix and add time resolution option (#296)

* update duration with handling spatial dims, remove compute, fix sparse lead time dim generation

* update name on metric in tests

* add docstring for time res arg

* Move parallel config check outside of function (#301)

* move function out of run, move cache mkdir to init

* add tests for new func

* ruff

* update parallel_config passthrough and tests

* feat: Forecast wrapper for custom xarray datasets (#302)

* implements a new Forecast object that can wrap existing xarray datasets

* Revise per copilot review

* Simplify IBTrACS polars subset (#303)

* Update `geopotential_thickness` var names and docstring (#306)

* update docstrings and var namings

* rename vars, add test

* ruff

* Clarify default preprocess function names; geopotential division fix (#305)

* update naming

* default preprocess for applied_tc

* ruff

* ruff

* Remove "cases" key requirement in yamls and dicts (#308)

* remove cases top level of yaml and fix code to handle this

* remove old load events yaml function

* update validation precommit and formatting

* remove out-of-date notebook from docs

* CIRA Icechunk store (#310)

* dependencies and generate store file started

* in-flight, added and cleaned filter funcs

* add icechunk + obstore and cira icechunk generation script

* remove cira gen script no longer used

* code cleanup

* add icechunk datatree forecast class object

* uv lock

* add documentation, group helper func, and add repository kwargs passthrough

* remove icechunk forecast object

* typo

* ruff

* update pyproject and uv lock

* add TODO

* update PR template

* Remove `IndividualCaseCollection` (#317)

* update all references to IndividualCaseCollection and convert dicts/ "cases": keys to lists

* update template

* make questions bold

* add whitespace

* remove indent error and typo from evaluate_cli

* make load_individual_cases include passthrough for existing dataclasses

* ruff

* add comment for clarification on list comp

* ruff (again)

* remove all references to collection, replace with list

* ruff

* rename collection -> list

* ruff

* Cleanup docstrings in repo (#318)

* update these docstrings

* remove docstring changes markdown

* update docstrings

* update other docstrings

* remove individualcasecollection reference, update based on develop changes

* add explanation for dim reqs (#320)

* Update `defaults` and `inputs` to include new CIRA icechunk store (#319)

* more explicit naming, add func and model names var

* add test coverage, ruff, linting

* update readme for new cira approach

* move cira func and model ref to inputs

* update docs

* module wasnt called for moved func

* update tests for moving func and var

* ruff

* fix mock typos

* Bump version from 0.2.0 to 0.3.0 (#324)

* Updated API (#321)

* move cache dir creation to init, rename funcs, add parallel/serial check function, update test names

* update naming

* add run method for backwards compatibility

* update tests

* add tests and cover if serial and parallel_config is not None

* feat: redesign public API with hierarchical namespace submodules

- Add ewb.evaluation() as main entry point (alias for ExtremeWeatherBench)
- Create namespace submodules: ewb.targets, ewb.forecasts, ewb.metrics,
  ewb.derived, ewb.regions, ewb.cases, ewb.defaults
- Expose all classes at top level for convenience (ewb.ERA5, etc.)
- Add ewb.load_cases() convenience alias
- Update all example files to use new import pattern
- Update usage.md documentation
- Maintain backward compatibility with existing imports

* ruff/linting. add utils to init

* add test coverage for module loading patterns

* ruff

* Cleanup docstrings in repo (#318)

* update these docstrings

* remove docstring changes markdown

* update docstrings

* update other docstrings

* remove individualcasecollection reference, update based on develop changes

* add explanation for dim reqs (#320)

* Update `defaults` and `inputs` to include new CIRA icechunk store (#319)

* more explicit naming, add func and model names var

* add test coverage, ruff, linting

* update readme for new cira approach

* move cira func and model ref to inputs

* update docs

* module wasnt called for moved func

* update tests for moving func and var

* ruff

* fix mock typos

* update defaults var refs

* Golden tests (#323)

* first pass for gt test infra + yaml

* use shapefile for severe convection and catch latitude swap

* add ignore for golden test when running pytest by default

* ruff

* move pytest addopts and markers to pyproject.toml

* Remove `IndividualCaseCollection` (#317)

* update all references to IndividualCaseCollection and convert dicts/ "cases": keys to lists

* update template

* make questions bold

* add whitespace

* remove indent error and typo from evaluate_cli

* make load_individual_cases include passthrough for existing dataclasses

* ruff

* add comment for clarification on list comp

* ruff (again)

* remove all references to collection, replace with list

* ruff

* rename collection -> list

* ruff

* Cleanup docstrings in repo (#318)

* update these docstrings

* remove docstring changes markdown

* update docstrings

* update other docstrings

* remove individualcasecollection reference, update based on develop changes

* add explanation for dim reqs (#320)

* Update `defaults` and `inputs` to include new CIRA icechunk store (#319)

* more explicit naming, add func and model names var

* add test coverage, ruff, linting

* update readme for new cira approach

* move cira func and model ref to inputs

* update docs

* module wasnt called for moved func

* update tests for moving func and var

* ruff

* fix mock typos

* Bump version from 0.2.0 to 0.3.0 (#324)

* Updated API (#321)

* move cache dir creation to init, rename funcs, add parallel/serial check function, update test names

* update naming

* add run method for backwards compatibility

* update tests

* add tests and cover if serial and parallel_config is not None

* feat: redesign public API with hierarchical namespace submodules

- Add ewb.evaluation() as main entry point (alias for ExtremeWeatherBench)
- Create namespace submodules: ewb.targets, ewb.forecasts, ewb.metrics,
  ewb.derived, ewb.regions, ewb.cases, ewb.defaults
- Expose all classes at top level for convenience (ewb.ERA5, etc.)
- Add ewb.load_cases() convenience alias
- Update all example files to use new import pattern
- Update usage.md documentation
- Maintain backward compatibility with existing imports

* ruff/linting. add utils to init

* add test coverage for module loading patterns

* ruff

* Cleanup docstrings in repo (#318)

* update these docstrings

* remove docstring changes markdown

* update docstrings

* update other docstrings

* remove individualcasecollection reference, update based on develop changes

* add explanation for dim reqs (#320)

* Update `defaults` and `inputs` to include new CIRA icechunk store (#319)

* more explicit naming, add func and model names var

* add test coverage, ruff, linting

* update readme for new cira approach

* move cira func and model ref to inputs

* update docs

* module wasnt called for moved func

* update tests for moving func and var

* ruff

* fix mock typos

* update defaults var refs

* remove to_csv

* PyPI Preparation (#315)

* update build-system and project

* update workflows, publish, and pyproject

* add justfile and twine

* update publish yaml

* change to python 3.10 as minimum requirement

* kerchunk needs 3.11, swapping pyproject and tests to remove 3.10

* change workflows to use version matrix

* align workflows

* Cleanup docstrings in repo (#318)

* update these docstrings

* remove docstring changes markdown

* update docstrings

* update other docstrings

* remove individualcasecollection reference, update based on develop changes

* add explanation for dim reqs (#320)

* Update `defaults` and `inputs` to include new CIRA icechunk store (#319)

* more explicit naming, add func and model names var

* add test coverage, ruff, linting

* update readme for new cira approach

* move cira func and model ref to inputs

* update docs

* module wasnt called for moved func

* update tests for moving func and var

* ruff

* fix mock typos

* Bump version from 0.2.0 to 0.3.0 (#324)

* Updated API (#321)

* move cache dir creation to init, rename funcs, add parallel/serial check function, update test names

* update naming

* add run method for backwards compatibility

* update tests

* add tests and cover if serial and parallel_config is not None

* feat: redesign public API with hierarchical namespace submodules

- Add ewb.evaluation() as main entry point (alias for ExtremeWeatherBench)
- Create namespace submodules: ewb.targets, ewb.forecasts, ewb.metrics,
  ewb.derived, ewb.regions, ewb.cases, ewb.defaults
- Expose all classes at top level for convenience (ewb.ERA5, etc.)
- Add ewb.load_cases() convenience alias
- Update all example files to use new import pattern
- Update usage.md documentation
- Maintain backward compatibility with existing imports

* ruff/linting. add utils to init

* add test coverage for module loading patterns

* ruff

* Cleanup docstrings in repo (#318)

* update these docstrings

* remove docstring changes markdown

* update docstrings

* update other docstrings

* remove individualcasecollection reference, update based on develop changes

* add explanation for dim reqs (#320)

* Update `defaults` and `inputs` to include new CIRA icechunk store (#319)

* more explicit naming, add func and model names var

* add test coverage, ruff, linting

* update readme for new cira approach

* move cira func and model ref to inputs

* update docs

* module wasnt called for moved func

* update tests for moving func and var

* ruff

* fix mock typos

* update defaults var refs

* Golden tests (#323)

* first pass for gt test infra + yaml

* use shapefile for severe convection and catch latitude swap

* add ignore for golden test when running pytest by default

* ruff

* move pytest addopts and markers to pyproject.toml

* Remove `IndividualCaseCollection` (#317)

* update all references to IndividualCaseCollection and convert dicts/ "cases": keys to lists

* update template

* make questions bold

* add whitespace

* remove indent error and typo from evaluate_cli

* make load_individual_cases include passthrough for existing dataclasses

* ruff

* add comment for clarification on list comp

* ruff (again)

* remove all references to collection, replace with list

* ruff

* rename collection -> list

* ruff

* Cleanup docstrings in repo (#318)

* update these docstrings

* remove docstring changes markdown

* update docstrings

* update other docstrings

* remove individualcasecollection reference, update based on develop changes

* add explanation for dim reqs (#320)

* Update `defaults` and `inputs` to include new CIRA icechunk store (#319)

* more explicit naming, add func and model names var

* add test coverage, ruff, linting

* update readme for new cira approach

* move cira func and model ref to inputs

* update docs

* module wasnt called for moved func

* update tests for moving func and var

* ruff

* fix mock typos

* Bump version from 0.2.0 to 0.3.0 (#324)

* Updated API (#321)

* move cache dir creation to init, rename funcs, add parallel/serial check function, update test names

* update naming

* add run method for backwards compatibility

* update tests

* add tests and cover if serial and parallel_config is not None

* feat: redesign public API with hierarchical namespace submodules

- Add ewb.evaluation() as main entry point (alias for ExtremeWeatherBench)
- Create namespace submodules: ewb.targets, ewb.forecasts, ewb.metrics,
  ewb.derived, ewb.regions, ewb.cases, ewb.defaults
- Expose all classes at top level for convenience (ewb.ERA5, etc.)
- Add ewb.load_cases() convenience alias
- Update all example files to use new import pattern
- Update usage.md documentation
- Maintain backward compatibility with existing imports

* ruff/linting. add utils to init

* add test coverage for module loading patterns

* ruff

* Cleanup docstrings in repo (#318)

* update these docstrings

* remove docstring changes markdown

* update docstrings

* update other docstrings

* remove individualcasecollection reference, update based on develop changes

* add explanation for dim reqs (#320)

* Update `defaults` and `inputs` to include new CIRA icechunk store (#319)

* more explicit naming, add func and model names var

* add test coverage, ruff, linting

* update readme for new cira approach

* move cira func and model ref to inputs

* update docs

* module wasnt called for moved func

* update tests for moving func and var

* ruff

* fix mock typos

* update defaults var refs

* remove to_csv

* swap pyproject tools to hatch; add if and packages-dir to publish

* update pyproject version for release

* Remove duplicate function and fixtures (#326)

* chore: remove duplicate function and fixtures

- Remove duplicate _parallel_serial_config_check function from evaluate.py
  (was defined twice at lines 189 and 982 with identical implementation)
- Remove duplicate runner fixture from test_evaluate_cli.py
  (already defined in conftest.py)
- Remove duplicate temp_config_dir fixture from test_evaluate_cli.py
  (already defined in conftest.py)
- Remove unused tempfile import from test_evaluate_cli.py

* ruff

* update install docs for pypi and fix typo in link

* Update ar calcs with improved parallelization (#329)

* update ar calcs with improved parallelization

* ruff formatting

* update attr name

* remove diagnostic logging; ruff formatting; update ar bounds code to work, add zarr 3 warning catch, add optional time dimension str arg

* more ar_bounds work to get it moving, looks good here

* mypy

* mypy

* update time to valid_time to follow ewb conventions

* bump pyproject version (#331)

* Add ROCSS (#300)

* Add ROC skill metrics and tests

* format

---------

Co-authored-by: Rodrigo Almeida <rodrigo.almeida@hhi.fraunhofer.de>

* update pyproject with icechunk (#335)

* move dask and distributed to core deps (#336)

* Make new AR parallelization work for forecasts (>1 time dim) (#334)

* allow for lead time as additional dim for binary dilation function

* update deps to include icechunk and dask

* update comment

* update comments, remove unneeded list loop

* Update to `icechunk` for example scripts (#333)

* update preprocessers and refs to icechunk stores

* remove print statements

* swap freeze climatology to 0.15

* run -> run evaluation

* ruff

* Make numba (more) threadsafe working with dask (#337)

* swap target to cpu, 4d -> nd naming

* add coverage for numba funcs + check for thread safety

* ruff

* Landfall metric fixes (#338)

* ensure dimension exists when returning _interpolate_and_format_landfalls

* fix first landfall, make data flow schema more consistent

* shift landfall choice logic away from calc

* clean up landfall logic in metrics and calc

* update tests and formatting

* mypy typing catch (if a float managed to get in here i'd be impressed, but okay)

* update landfall intensity mae metric to remove nans

* add landfall number aggregation; remove nans

* Landfall test dim fix (#341)

* align dims on tests

* add dim fix to calc tests

* `EarlySignal` fixes (#340)

* mid-flight using mean in detection mask

* update early signal to include more robust aggregations

* add tests

* Relax TC tracking criteria (#339)

* match TempestExtremes better w/ distance between peaks and peak wind radius

* relax default surface pressure threshold

* ruff

* match defaults

* update default in test

* set minimum wind >= 10m/s to ten steps by default

* Catch non-ocean landfalls (#342)

* match TE better w/ distance between peaks and peak wind radius

* relax default surface pressure threshold

* ruff

* match defaults

* update default in test

* add ocean filter to avoid erroneous landfalls

* add ocean geoms to tests

* swap to 50m to avoid spurrious reefs

* update comments and typing

* add dedupe method to prevent 50km consecutive landfalls

* enforce level chunk for nantrapezoid_pressure_levels (#343)

* Update ARs with pressure ceiling and cleaner derived variable (#344)

* add max level arg and move code into derive_variable for more clarity

* ruff

* Update GHCNh filtering (#346)

* first pass on ghcnh filtering update + change GHCN-H to GHCNh

* add return url option to circumvent downloading

* add ghcnh remote input (in-flight, scan_parquet cast broken)

* update ghnch script for more permissive filtering

* ruff

* remove class now that typing is fixed in parq

* update ES to avoid failing on dask-backed arrays and add a check in tests (#347)

* Add preprocess func for heatwaves to remove ocean gridpoints (#348)

* add preprocessors to defaults and include in applied_heatwave script

* add preprocess to remove ocean gridpoints

* don't need two functions doing the same thing

* Add temperature event finder scripts (#345)

* add in all files, plots, and tests

* add ignore for test

* change file names, add plotting scripts, improve parallelization

* freeze -> cold

* update imports

* add min_gridpoints arg and return empty df

* remove tests for data prep

* update file names in comments

* remove old heat and cold bounds script

* manage longitude conversions

* longitude shenanigans

* Add `overlap_target_threshold` to `EarlySignal` (#350)

* add optional arg and conditional

* update with tests to confirm functionality

* update tests

* Climatology scripting (#349)

* add scripts

* rename script, combine into one

* ruff

* added marginal severe cases (#351)

* add updated 2m temp climatology quantiles (#354)

* Temperature event bounds update (#353)

* update heat and cold bounds, update docstrings and methodology. add base_temp_events.yaml for reproducibility

* add defaults; remove top python line

* add lat min and max args

* ruff + mypy

* add restriction criteria and method

* add animation code

* simplify to include climatology bound(s), fix func names

* update animate

* ruff

* remove animate script

* rename

* update with rename

* Adjust some TC case bounds (#355)

* first pass bounds updates

* second pass to expand bounds

* Tune duration logic (#357)

* add new duration method

* update duration logic with consecutive pair logic

* adjust test to match new duration logic

* v1.0.2

* Update global temp bound method; fix event case number mismatch (#359)

* add some parallelism and stricter event criteria

* smaller batching

* update default batch

* update event bounds; case shift issue

* rename

* ruff

* enable plot to be used in an axes

* add p15-p85 events yaml

* update with more explicit blob criteria

* add PeriodicBoundaryIndex (h/t TomNicholas) to avoid lon shenanigans

* rename marginal severe with underscores; add marginal temperature events for p15-p85 cases without antarctica

* remove marginal severe with dashes

* update nz heatwave (200k km2 limit doesnt work on small island)

* swap marginal temp event type to heat wave

* remove no landfalling tc's (-8 TCs) (#361)

* Manage landfall edge cases (#362)

* modify criteria slightly

* make event_type heat_wave in marginal temp events

* update tracker to fill in-between lead time points for a valid track

* update defaults

* update  return next landfall logic to metric

* add neighborhood max wind func and add track gap filler

* update stitching criteria and tests

* add extra column allowance

* add a bunch of landfall methods

* add empty init time array method

* update landfall metrics methods and add metadata inclusion (small data increase. lots of value)

* ruff

* add landfall filter to within window of valid landfall

* cleanup

* update consecutive days label formatting

* move label position to right for consecutive plots

* update TC count in readme after removing non-landfall TCs

* Add > 15 degree latitude filter for ARs (#365)

* add > 15 degree filter for ARs

* ruff

* Consolidate temperature data_prep scripts into temperature_events.py (#364)

* refactor: consolidate temp data_prep scripts into temperature_events.py

Merges heat_cold_bounds_case.py, plot_temperature_events.py, and
temperature_bounds_global.py into a single temperature_events.py with
three subcommands (plot / case / global). Preserves develop behavior
exactly: original _wrap_periodically formula, no ndimage landmass
isolation, numpy-based detect_events / expand_event_bounds /
include_temps_with_events, and no --event-quantiles flag.

Made-with: Cursor

* ruff

* cleanup unused funcs and args

* consistent operators

* ruff

* fix: seed sample_calc_dataset fixture to eliminate flaky pressure test (#367)

np.random.normal(500, 200) for orographic heights occasionally draws
extreme negative values (e.g. -320 m below sea level), which push the
barometric-formula surface pressure above the test's 105 000 Pa bound.
The failure was latent but surfaced on Python 3.13 due to a different
RNG initialisation state.

Replace all np.random.* calls in the fixture with a local seeded
np.random.default_rng(42) instance so generated data is bit-reproducible
across Python versions and the pressure range assertion holds reliably.

* fix: correct three CIN bugs in mixed-layer CAPE/CIN computation (#366)

* fix: correct three CIN bugs in mixed-layer CAPE/CIN computation

Three bugs in _cape.py caused CIN to be unreliable (sign wrong, magnitude
wrong, and false-LFC suppression). Discovered by benchmarking against xcape
and atmos on 6,400 ERA5 profiles.

Bug 1 — Sign convention: compute_buoyancy_energy_inline returns negative
values below the LFC; the raw accumulation was returned without negation,
producing negative CIN where callers expect a positive inhibition energy
(matching MetPy and xcape conventions). Fixed by negating cin at both return
sites.

Bug 2 — Integration scope: the LFC branch accumulated all buoyancy energy
unconditionally (including positively-buoyant layers between surface and LFC),
reducing CIN when a parcel had brief near-surface positive buoyancy before
encountering a cap. Added `if energy < 0` guard to match the no-LFC branch
and the behaviour of reference implementations.

Bug 3 — Spurious LCL buoyancy (critical): insert_lcl_level set the inserted
level's environment temperature to t_lcl (the parcel's saturation temperature)
instead of interpolating the actual atmospheric temperature in log-pressure.
This made env_tv ≈ parcel_tv at the LCL, injecting a +1-2 K buoyancy spike
that was detected as an LFC, hiding the real capping inversion above and
driving CIN to near-zero for profiles with low LCLs and strong cap inversions.

Adds TestCINSignConvention, TestCINIntegrationScope, and
TestLCLTemperatureInterpolation test classes with profiles that specifically
exercise each bug path and will catch regressions.

* fix: address Bugbot review comments on CIN convention consistency

- Document CIN sign convention in compute_ml_cape_cin_from_profile docstring:
  returned as a non-negative inhibition magnitude matching MetPy/xcape.
- Negate MetPy's negative-signed CIN in the reference data generator so
  stored cin_reference values match the implementation's positive convention;
  update both era5_reference.npz and pathological_profiles.npz accordingly.
- Rename expected_cin_range to expected_cin_magnitude_range in TestKnownProfile
  to make the positive convention explicit at the call site.
- Strengthen TestCINIntegrationScope: replace the weak cin >= 0 assertion with
  a quantitative lower bound (>50 J/kg) that would catch the original bug
  where positive near-surface buoyancy cancelled cap-layer CIN.

* fix: remove pytest.mark.flaky (pytest-rerunfailures not in deps)

---------

Co-authored-by: Daniel Rothenberg <daniel@danielrothenberg.com>
Co-authored-by: Rodrigo Almeida <rodrigo.almeida94@outlook.pt>
Co-authored-by: Rodrigo Almeida <rodrigo.almeida@hhi.fraunhofer.de>
Co-authored-by: Amy McGovern <amcgovern@ou.edu>
Co-authored-by: Daniel Rothenberg <daniel@brightband.com>
aaTman added a commit that referenced this pull request Apr 30, 2026
* Add pressure_dimension_str arg to geopotential_thickness (#297)

* `DurationMeanError` memory fix and add time resolution option (#296)

* update duration with handling spatial dims, remove compute, fix sparse lead time dim generation

* update name on metric in tests

* add docstring for time res arg

* Move parallel config check outside of function (#301)

* move function out of run, move cache mkdir to init

* add tests for new func

* ruff

* update parallel_config passthrough and tests

* feat: Forecast wrapper for custom xarray datasets (#302)

* implements a new Forecast object that can wrap existing xarray datasets

* Revise per copilot review

* Simplify IBTrACS polars subset (#303)

* Update `geopotential_thickness` var names and docstring (#306)

* update docstrings and var namings

* rename vars, add test

* ruff

* Clarify default preprocess function names; geopotential division fix (#305)

* update naming

* default preprocess for applied_tc

* ruff

* ruff

* Remove "cases" key requirement in yamls and dicts (#308)

* remove cases top level of yaml and fix code to handle this

* remove old load events yaml function

* update validation precommit and formatting

* remove out-of-date notebook from docs

* CIRA Icechunk store (#310)

* dependencies and generate store file started

* in-flight, added and cleaned filter funcs

* add icechunk + obstore and cira icechunk generation script

* remove cira gen script no longer used

* code cleanup

* add icechunk datatree forecast class object

* uv lock

* add documentation, group helper func, and add repository kwargs passthrough

* remove icechunk forecast object

* typo

* ruff

* update pyproject and uv lock

* add TODO

* update PR template

* Remove `IndividualCaseCollection` (#317)

* update all references to IndividualCaseCollection and convert dicts/ "cases": keys to lists

* update template

* make questions bold

* add whitespace

* remove indent error and typo from evaluate_cli

* make load_individual_cases include passthrough for existing dataclasses

* ruff

* add comment for clarification on list comp

* ruff (again)

* remove all references to collection, replace with list

* ruff

* rename collection -> list

* ruff

* Cleanup docstrings in repo (#318)

* update these docstrings

* remove docstring changes markdown

* update docstrings

* update other docstrings

* remove individualcasecollection reference, update based on develop changes

* add explanation for dim reqs (#320)

* Update `defaults` and `inputs` to include new CIRA icechunk store (#319)

* more explicit naming, add func and model names var

* add test coverage, ruff, linting

* update readme for new cira approach

* move cira func and model ref to inputs

* update docs

* module wasnt called for moved func

* update tests for moving func and var

* ruff

* fix mock typos

* Bump version from 0.2.0 to 0.3.0 (#324)

* Updated API (#321)

* move cache dir creation to init, rename funcs, add parallel/serial check function, update test names

* update naming

* add run method for backwards compatibility

* update tests

* add tests and cover if serial and parallel_config is not None

* feat: redesign public API with hierarchical namespace submodules

- Add ewb.evaluation() as main entry point (alias for ExtremeWeatherBench)
- Create namespace submodules: ewb.targets, ewb.forecasts, ewb.metrics,
  ewb.derived, ewb.regions, ewb.cases, ewb.defaults
- Expose all classes at top level for convenience (ewb.ERA5, etc.)
- Add ewb.load_cases() convenience alias
- Update all example files to use new import pattern
- Update usage.md documentation
- Maintain backward compatibility with existing imports

* ruff/linting. add utils to init

* add test coverage for module loading patterns

* ruff

* Cleanup docstrings in repo (#318)

* update these docstrings

* remove docstring changes markdown

* update docstrings

* update other docstrings

* remove individualcasecollection reference, update based on develop changes

* add explanation for dim reqs (#320)

* Update `defaults` and `inputs` to include new CIRA icechunk store (#319)

* more explicit naming, add func and model names var

* add test coverage, ruff, linting

* update readme for new cira approach

* move cira func and model ref to inputs

* update docs

* module wasnt called for moved func

* update tests for moving func and var

* ruff

* fix mock typos

* update defaults var refs

* Golden tests (#323)

* first pass for gt test infra + yaml

* use shapefile for severe convection and catch latitude swap

* add ignore for golden test when running pytest by default

* ruff

* move pytest addopts and markers to pyproject.toml

* Remove `IndividualCaseCollection` (#317)

* update all references to IndividualCaseCollection and convert dicts/ "cases": keys to lists

* update template

* make questions bold

* add whitespace

* remove indent error and typo from evaluate_cli

* make load_individual_cases include passthrough for existing dataclasses

* ruff

* add comment for clarification on list comp

* ruff (again)

* remove all references to collection, replace with list

* ruff

* rename collection -> list

* ruff

* Cleanup docstrings in repo (#318)

* update these docstrings

* remove docstring changes markdown

* update docstrings

* update other docstrings

* remove individualcasecollection reference, update based on develop changes

* add explanation for dim reqs (#320)

* Update `defaults` and `inputs` to include new CIRA icechunk store (#319)

* more explicit naming, add func and model names var

* add test coverage, ruff, linting

* update readme for new cira approach

* move cira func and model ref to inputs

* update docs

* module wasnt called for moved func

* update tests for moving func and var

* ruff

* fix mock typos

* Bump version from 0.2.0 to 0.3.0 (#324)

* Updated API (#321)

* move cache dir creation to init, rename funcs, add parallel/serial check function, update test names

* update naming

* add run method for backwards compatibility

* update tests

* add tests and cover if serial and parallel_config is not None

* feat: redesign public API with hierarchical namespace submodules

- Add ewb.evaluation() as main entry point (alias for ExtremeWeatherBench)
- Create namespace submodules: ewb.targets, ewb.forecasts, ewb.metrics,
  ewb.derived, ewb.regions, ewb.cases, ewb.defaults
- Expose all classes at top level for convenience (ewb.ERA5, etc.)
- Add ewb.load_cases() convenience alias
- Update all example files to use new import pattern
- Update usage.md documentation
- Maintain backward compatibility with existing imports

* ruff/linting. add utils to init

* add test coverage for module loading patterns

* ruff

* Cleanup docstrings in repo (#318)

* update these docstrings

* remove docstring changes markdown

* update docstrings

* update other docstrings

* remove individualcasecollection reference, update based on develop changes

* add explanation for dim reqs (#320)

* Update `defaults` and `inputs` to include new CIRA icechunk store (#319)

* more explicit naming, add func and model names var

* add test coverage, ruff, linting

* update readme for new cira approach

* move cira func and model ref to inputs

* update docs

* module wasnt called for moved func

* update tests for moving func and var

* ruff

* fix mock typos

* update defaults var refs

* remove to_csv

* PyPI Preparation (#315)

* update build-system and project

* update workflows, publish, and pyproject

* add justfile and twine

* update publish yaml

* change to python 3.10 as minimum requirement

* kerchunk needs 3.11, swapping pyproject and tests to remove 3.10

* change workflows to use version matrix

* align workflows

* Cleanup docstrings in repo (#318)

* update these docstrings

* remove docstring changes markdown

* update docstrings

* update other docstrings

* remove individualcasecollection reference, update based on develop changes

* add explanation for dim reqs (#320)

* Update `defaults` and `inputs` to include new CIRA icechunk store (#319)

* more explicit naming, add func and model names var

* add test coverage, ruff, linting

* update readme for new cira approach

* move cira func and model ref to inputs

* update docs

* module wasnt called for moved func

* update tests for moving func and var

* ruff

* fix mock typos

* Bump version from 0.2.0 to 0.3.0 (#324)

* Updated API (#321)

* move cache dir creation to init, rename funcs, add parallel/serial check function, update test names

* update naming

* add run method for backwards compatibility

* update tests

* add tests and cover if serial and parallel_config is not None

* feat: redesign public API with hierarchical namespace submodules

- Add ewb.evaluation() as main entry point (alias for ExtremeWeatherBench)
- Create namespace submodules: ewb.targets, ewb.forecasts, ewb.metrics,
  ewb.derived, ewb.regions, ewb.cases, ewb.defaults
- Expose all classes at top level for convenience (ewb.ERA5, etc.)
- Add ewb.load_cases() convenience alias
- Update all example files to use new import pattern
- Update usage.md documentation
- Maintain backward compatibility with existing imports

* ruff/linting. add utils to init

* add test coverage for module loading patterns

* ruff

* Cleanup docstrings in repo (#318)

* update these docstrings

* remove docstring changes markdown

* update docstrings

* update other docstrings

* remove individualcasecollection reference, update based on develop changes

* add explanation for dim reqs (#320)

* Update `defaults` and `inputs` to include new CIRA icechunk store (#319)

* more explicit naming, add func and model names var

* add test coverage, ruff, linting

* update readme for new cira approach

* move cira func and model ref to inputs

* update docs

* module wasnt called for moved func

* update tests for moving func and var

* ruff

* fix mock typos

* update defaults var refs

* Golden tests (#323)

* first pass for gt test infra + yaml

* use shapefile for severe convection and catch latitude swap

* add ignore for golden test when running pytest by default

* ruff

* move pytest addopts and markers to pyproject.toml

* Remove `IndividualCaseCollection` (#317)

* update all references to IndividualCaseCollection and convert dicts/ "cases": keys to lists

* update template

* make questions bold

* add whitespace

* remove indent error and typo from evaluate_cli

* make load_individual_cases include passthrough for existing dataclasses

* ruff

* add comment for clarification on list comp

* ruff (again)

* remove all references to collection, replace with list

* ruff

* rename collection -> list

* ruff

* Cleanup docstrings in repo (#318)

* update these docstrings

* remove docstring changes markdown

* update docstrings

* update other docstrings

* remove individualcasecollection reference, update based on develop changes

* add explanation for dim reqs (#320)

* Update `defaults` and `inputs` to include new CIRA icechunk store (#319)

* more explicit naming, add func and model names var

* add test coverage, ruff, linting

* update readme for new cira approach

* move cira func and model ref to inputs

* update docs

* module wasnt called for moved func

* update tests for moving func and var

* ruff

* fix mock typos

* Bump version from 0.2.0 to 0.3.0 (#324)

* Updated API (#321)

* move cache dir creation to init, rename funcs, add parallel/serial check function, update test names

* update naming

* add run method for backwards compatibility

* update tests

* add tests and cover if serial and parallel_config is not None

* feat: redesign public API with hierarchical namespace submodules

- Add ewb.evaluation() as main entry point (alias for ExtremeWeatherBench)
- Create namespace submodules: ewb.targets, ewb.forecasts, ewb.metrics,
  ewb.derived, ewb.regions, ewb.cases, ewb.defaults
- Expose all classes at top level for convenience (ewb.ERA5, etc.)
- Add ewb.load_cases() convenience alias
- Update all example files to use new import pattern
- Update usage.md documentation
- Maintain backward compatibility with existing imports

* ruff/linting. add utils to init

* add test coverage for module loading patterns

* ruff

* Cleanup docstrings in repo (#318)

* update these docstrings

* remove docstring changes markdown

* update docstrings

* update other docstrings

* remove individualcasecollection reference, update based on develop changes

* add explanation for dim reqs (#320)

* Update `defaults` and `inputs` to include new CIRA icechunk store (#319)

* more explicit naming, add func and model names var

* add test coverage, ruff, linting

* update readme for new cira approach

* move cira func and model ref to inputs

* update docs

* module wasnt called for moved func

* update tests for moving func and var

* ruff

* fix mock typos

* update defaults var refs

* remove to_csv

* swap pyproject tools to hatch; add if and packages-dir to publish

* update pyproject version for release

* Remove duplicate function and fixtures (#326)

* chore: remove duplicate function and fixtures

- Remove duplicate _parallel_serial_config_check function from evaluate.py
  (was defined twice at lines 189 and 982 with identical implementation)
- Remove duplicate runner fixture from test_evaluate_cli.py
  (already defined in conftest.py)
- Remove duplicate temp_config_dir fixture from test_evaluate_cli.py
  (already defined in conftest.py)
- Remove unused tempfile import from test_evaluate_cli.py

* ruff

* update install docs for pypi and fix typo in link

* Update ar calcs with improved parallelization (#329)

* update ar calcs with improved parallelization

* ruff formatting

* update attr name

* remove diagnostic logging; ruff formatting; update ar bounds code to work, add zarr 3 warning catch, add optional time dimension str arg

* more ar_bounds work to get it moving, looks good here

* mypy

* mypy

* update time to valid_time to follow ewb conventions

* bump pyproject version (#331)

* Add ROCSS (#300)

* Add ROC skill metrics and tests

* format

---------

Co-authored-by: Rodrigo Almeida <rodrigo.almeida@hhi.fraunhofer.de>

* update pyproject with icechunk (#335)

* move dask and distributed to core deps (#336)

* Make new AR parallelization work for forecasts (>1 time dim) (#334)

* allow for lead time as additional dim for binary dilation function

* update deps to include icechunk and dask

* update comment

* update comments, remove unneeded list loop

* Update to `icechunk` for example scripts (#333)

* update preprocessers and refs to icechunk stores

* remove print statements

* swap freeze climatology to 0.15

* run -> run evaluation

* ruff

* Make numba (more) threadsafe working with dask (#337)

* swap target to cpu, 4d -> nd naming

* add coverage for numba funcs + check for thread safety

* ruff

* Landfall metric fixes (#338)

* ensure dimension exists when returning _interpolate_and_format_landfalls

* fix first landfall, make data flow schema more consistent

* shift landfall choice logic away from calc

* clean up landfall logic in metrics and calc

* update tests and formatting

* mypy typing catch (if a float managed to get in here i'd be impressed, but okay)

* update landfall intensity mae metric to remove nans

* add landfall number aggregation; remove nans

* Landfall test dim fix (#341)

* align dims on tests

* add dim fix to calc tests

* `EarlySignal` fixes (#340)

* mid-flight using mean in detection mask

* update early signal to include more robust aggregations

* add tests

* Relax TC tracking criteria (#339)

* match TempestExtremes better w/ distance between peaks and peak wind radius

* relax default surface pressure threshold

* ruff

* match defaults

* update default in test

* set minimum wind >= 10m/s to ten steps by default

* Catch non-ocean landfalls (#342)

* match TE better w/ distance between peaks and peak wind radius

* relax default surface pressure threshold

* ruff

* match defaults

* update default in test

* add ocean filter to avoid erroneous landfalls

* add ocean geoms to tests

* swap to 50m to avoid spurrious reefs

* update comments and typing

* add dedupe method to prevent 50km consecutive landfalls

* enforce level chunk for nantrapezoid_pressure_levels (#343)

* Update ARs with pressure ceiling and cleaner derived variable (#344)

* add max level arg and move code into derive_variable for more clarity

* ruff

* Update GHCNh filtering (#346)

* first pass on ghcnh filtering update + change GHCN-H to GHCNh

* add return url option to circumvent downloading

* add ghcnh remote input (in-flight, scan_parquet cast broken)

* update ghnch script for more permissive filtering

* ruff

* remove class now that typing is fixed in parq

* update ES to avoid failing on dask-backed arrays and add a check in tests (#347)

* Add preprocess func for heatwaves to remove ocean gridpoints (#348)

* add preprocessors to defaults and include in applied_heatwave script

* add preprocess to remove ocean gridpoints

* don't need two functions doing the same thing

* Add temperature event finder scripts (#345)

* add in all files, plots, and tests

* add ignore for test

* change file names, add plotting scripts, improve parallelization

* freeze -> cold

* update imports

* add min_gridpoints arg and return empty df

* remove tests for data prep

* update file names in comments

* remove old heat and cold bounds script

* manage longitude conversions

* longitude shenanigans

* Add `overlap_target_threshold` to `EarlySignal` (#350)

* add optional arg and conditional

* update with tests to confirm functionality

* update tests

* Climatology scripting (#349)

* add scripts

* rename script, combine into one

* ruff

* added marginal severe cases (#351)

* add updated 2m temp climatology quantiles (#354)

* Temperature event bounds update (#353)

* update heat and cold bounds, update docstrings and methodology. add base_temp_events.yaml for reproducibility

* add defaults; remove top python line

* add lat min and max args

* ruff + mypy

* add restriction criteria and method

* add animation code

* simplify to include climatology bound(s), fix func names

* update animate

* ruff

* remove animate script

* rename

* update with rename

* Adjust some TC case bounds (#355)

* first pass bounds updates

* second pass to expand bounds

* Tune duration logic (#357)

* add new duration method

* update duration logic with consecutive pair logic

* adjust test to match new duration logic

* v1.0.2

* Update global temp bound method; fix event case number mismatch (#359)

* add some parallelism and stricter event criteria

* smaller batching

* update default batch

* update event bounds; case shift issue

* rename

* ruff

* enable plot to be used in an axes

* add p15-p85 events yaml

* update with more explicit blob criteria

* add PeriodicBoundaryIndex (h/t TomNicholas) to avoid lon shenanigans

* rename marginal severe with underscores; add marginal temperature events for p15-p85 cases without antarctica

* remove marginal severe with dashes

* update nz heatwave (200k km2 limit doesnt work on small island)

* swap marginal temp event type to heat wave

* remove no landfalling tc's (-8 TCs) (#361)

* Manage landfall edge cases (#362)

* modify criteria slightly

* make event_type heat_wave in marginal temp events

* update tracker to fill in-between lead time points for a valid track

* update defaults

* update  return next landfall logic to metric

* add neighborhood max wind func and add track gap filler

* update stitching criteria and tests

* add extra column allowance

* add a bunch of landfall methods

* add empty init time array method

* update landfall metrics methods and add metadata inclusion (small data increase. lots of value)

* ruff

* add landfall filter to within window of valid landfall

* cleanup

* update consecutive days label formatting

* move label position to right for consecutive plots

* update TC count in readme after removing non-landfall TCs

* Add > 15 degree latitude filter for ARs (#365)

* add > 15 degree filter for ARs

* ruff

* Consolidate temperature data_prep scripts into temperature_events.py (#364)

* refactor: consolidate temp data_prep scripts into temperature_events.py

Merges heat_cold_bounds_case.py, plot_temperature_events.py, and
temperature_bounds_global.py into a single temperature_events.py with
three subcommands (plot / case / global). Preserves develop behavior
exactly: original _wrap_periodically formula, no ndimage landmass
isolation, numpy-based detect_events / expand_event_bounds /
include_temps_with_events, and no --event-quantiles flag.

Made-with: Cursor

* ruff

* cleanup unused funcs and args

* consistent operators

* ruff

* fix: seed sample_calc_dataset fixture to eliminate flaky pressure test (#367)

np.random.normal(500, 200) for orographic heights occasionally draws
extreme negative values (e.g. -320 m below sea level), which push the
barometric-formula surface pressure above the test's 105 000 Pa bound.
The failure was latent but surfaced on Python 3.13 due to a different
RNG initialisation state.

Replace all np.random.* calls in the fixture with a local seeded
np.random.default_rng(42) instance so generated data is bit-reproducible
across Python versions and the pressure range assertion holds reliably.

* fix: correct three CIN bugs in mixed-layer CAPE/CIN computation (#366)

* fix: correct three CIN bugs in mixed-layer CAPE/CIN computation

Three bugs in _cape.py caused CIN to be unreliable (sign wrong, magnitude
wrong, and false-LFC suppression). Discovered by benchmarking against xcape
and atmos on 6,400 ERA5 profiles.

Bug 1 — Sign convention: compute_buoyancy_energy_inline returns negative
values below the LFC; the raw accumulation was returned without negation,
producing negative CIN where callers expect a positive inhibition energy
(matching MetPy and xcape conventions). Fixed by negating cin at both return
sites.

Bug 2 — Integration scope: the LFC branch accumulated all buoyancy energy
unconditionally (including positively-buoyant layers between surface and LFC),
reducing CIN when a parcel had brief near-surface positive buoyancy before
encountering a cap. Added `if energy < 0` guard to match the no-LFC branch
and the behaviour of reference implementations.

Bug 3 — Spurious LCL buoyancy (critical): insert_lcl_level set the inserted
level's environment temperature to t_lcl (the parcel's saturation temperature)
instead of interpolating the actual atmospheric temperature in log-pressure.
This made env_tv ≈ parcel_tv at the LCL, injecting a +1-2 K buoyancy spike
that was detected as an LFC, hiding the real capping inversion above and
driving CIN to near-zero for profiles with low LCLs and strong cap inversions.

Adds TestCINSignConvention, TestCINIntegrationScope, and
TestLCLTemperatureInterpolation test classes with profiles that specifically
exercise each bug path and will catch regressions.

* fix: address Bugbot review comments on CIN convention consistency

- Document CIN sign convention in compute_ml_cape_cin_from_profile docstring:
  returned as a non-negative inhibition magnitude matching MetPy/xcape.
- Negate MetPy's negative-signed CIN in the reference data generator so
  stored cin_reference values match the implementation's positive convention;
  update both era5_reference.npz and pathological_profiles.npz accordingly.
- Rename expected_cin_range to expected_cin_magnitude_range in TestKnownProfile
  to make the positive convention explicit at the call site.
- Strengthen TestCINIntegrationScope: replace the weak cin >= 0 assertion with
  a quantitative lower bound (>50 J/kg) that would catch the original bug
  where positive near-surface buoyancy cancelled cap-layer CIN.

* fix: remove pytest.mark.flaky (pytest-rerunfailures not in deps)

* Update docs + zensical (#369)

* add data and update mkdocs

* add /site to gitignore

* updated pages and case format

* updated event types and descriptions

* severe days -> severe convection, move cases to top, ERA-5 -> ERA5

* add full examples, fix heatwave -> heat wave

* tweak to severe docs' references to spc

* data prep module

* general issue fixing and updates

* update case studies language; change file case style

* modify descriptions

* some more small edits

* add colab code and examples

* remove notebooks from repo

* update colab links

* update tc's

* zensical

* rtd fix

* remove colab, swap case number. update tc count. zensical warnings

* update version

---------

Co-authored-by: Daniel Rothenberg <daniel@danielrothenberg.com>
Co-authored-by: Rodrigo Almeida <rodrigo.almeida94@outlook.pt>
Co-authored-by: Rodrigo Almeida <rodrigo.almeida@hhi.fraunhofer.de>
Co-authored-by: Amy McGovern <amcgovern@ou.edu>
Co-authored-by: Daniel Rothenberg <daniel@brightband.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant