-
Notifications
You must be signed in to change notification settings - Fork 28
SIP279 SIPNET Restart MVP #276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
67 commits
Select commit
Hold shift + click to select a range
d9bb2ca
Add restart MVP integration tests and fixtures
dlebauer 3d8243c
Implement strict full-state restart checkpoint MVP
dlebauer 17c609d
Fix CLI argument parsing for file-name and required option args
dlebauer 336c3fa
Refactor restart logic into dedicated module
dlebauer 5194b07
Harden restart load checks and checkpoint failure tests
dlebauer 2b6e8e1
Split restart docs between user and developer guides
dlebauer 70bf334
Switch restart checkpoints to ASCII schema 1.0
dlebauer aa19995
Make restart schema fully named and update SIPNET_RESTART header
dlebauer ccea1e9
Format restart text schema sections and accept blank separator lines
dlebauer 0006d1b
Relax restart boundary validation to timestamp keys
dlebauer 7206a51
Switch restart resume contract to post-boundary timestamp
dlebauer 2eda169
Polish restart formatting and align developer guide sequence
dlebauer a0e37a6
Clarify user-guide restart boundary timestamp wording
dlebauer 84ebdf7
doc fix: dump-config default is off
dlebauer fca2ef1
Merge branch 'master' into codex/restart-mvp-master
dlebauer 43a4397
Update docs/user-guide/model-inputs.md
dlebauer 8978ab3
Add minimal changelog entry for restart MVP
dlebauer dfbd746
Remove restart strict flag, add events-file option, and include resta…
dlebauer 625a9aa
Complete Step 1 restart parser robustness
dlebauer 321322e
Add restart build-info mismatch warning acceptance test
dlebauer 27abb02
Enforce midnight restart boundaries
dlebauer 64a1822
Remove restart event cursor/hash state
dlebauer 594a8dc
Move cumulative GDD tracking to trackers for restart continuity
dlebauer d2b2ad2
Cleanup + Step 6: tests, GDD consolidation, event validation, and sch…
dlebauer b385710
Step 7: trim restart boundary metadata to timestamp-only fields
dlebauer 8944baa
Step 8: rename mean restart namespace to mean.npp and harden schema l…
dlebauer 719b69f
Docs: align restart and runtime guides with current behavior
dlebauer b672b81
Step 9: drop balance restart state and reject legacy balance keys
dlebauer 410704e
Harden restart validation and align docs/tests
dlebauer 9b7373a
Merge origin/master and resolve tracker conflicts
dlebauer 421cdfe
Merge branch 'master' into codex/restart-mvp-master
dlebauer 936556d
conditionally call restartNoteProcessedClimateStep
dlebauer b7a7e61
Apply suggestions from code review
dlebauer 9b290a7
Harden restart checkpoint validation and regression tests
dlebauer e1ee530
Address Mike's PR 276 restart and events feedback
dlebauer 3200032
Merge origin/master into codex/restart-mvp-master
dlebauer 9ad90a9
Update smoke config baselines for restart defaults
dlebauer 4a33317
Fix broken restart doc example reference
dlebauer 3d1f6f1
Fix merged harvest test expectations
dlebauer 5fb5fee
Fix cpp-linter failure step
dlebauer c19f5a4
Guard frontend output file close
dlebauer 419a100
Preserve frontend cleanup semantics
dlebauer 788b8b1
Test updates for restart redo
Alomir 1e7423f
Updates for restart
Alomir 56dc673
Added restart exit code
Alomir b73610e
Move some function from test_restart here
Alomir 3cb1d37
Rearranged tracker vars
Alomir 234225b
Added #def for num model flags
Alomir c9addc8
Major redo of restart code
Alomir 595db63
Test fixes
Alomir 6daa25a
Added NOLINT to copied header
Alomir 79eccea
Tweaks/reorg for clang-tidy happiness
Alomir 9737e1f
Update error messages for linux static_assert
Alomir 532363c
Attempt to silence clang-tidy
Alomir 3dd5b8e
Merge branch 'master' into codex/restart-mvp-master
Alomir 2b49f44
Add check for no events before first climate
Alomir 3dcbcf5
Add function to check first event vs. climate year/day
Alomir 984dd86
Convert several errors to warnings
Alomir 99f7ae8
Add docs/html
Alomir e8a92d3
Cleanup
Alomir 8fe00bd
Add no-events guard to isFirstEventBefore
Alomir 3556616
Restart doc updates
Alomir 47a212d
Minor restart doc updates
Alomir c8cfdf2
Apply minor tweaks, copilot feedback
Alomir ff2b6dc
Update restart file keys
Alomir db84146
Merge branch 'master' into codex/restart-mvp-master
Alomir 72ea8e0
Reorg for more clarity
Alomir File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,94 @@ | ||
| # Restart Checkpoint Spec | ||
|
|
||
| This page documents SIPNET's restart checkpoint implementation. | ||
|
|
||
| ## Scope and Intent | ||
|
|
||
| SIPNET restart is designed for segmented orchestration: | ||
|
|
||
| - stop at end of one climate segment | ||
| - write full runtime state at segment end (`RESTART_OUT`) | ||
| - restore full runtime state at next segment start (`RESTART_IN`) | ||
| - fail fast on incompatible restart/configuration inputs | ||
|
|
||
| SIPNET itself does not stitch outputs across segments. | ||
|
|
||
| ## Runtime Sequence | ||
|
|
||
| On resume, SIPNET executes: | ||
|
|
||
| 1. Normal setup (`setupModel`, `setupEvents`) | ||
| 2. Load checkpoint and overwrite runtime state | ||
| 3. Validate compatibility checks and restart boundary checks | ||
| 4. Continue run from resumed climate input | ||
|
|
||
| ## Restart Schema v1.0 Overview | ||
|
|
||
| Checkpoint format is ASCII text with one key/value per line: | ||
|
|
||
| - header: `SIPNET_RESTART 1.0` | ||
dlebauer marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| - metadata: `meta_info.model_version`, `meta_info.build_info`, `meta_info.checkpoint_utc_epoch`, `meta_info.processed_steps` | ||
| - schema layout guard metadata: `schema_layout.envi_size`, `schema_layout.trackers_size`, `schema_layout.phenology_trackers_size`, `schema_layout.event_trackers_size` | ||
| - mode flags: `flags.*` | ||
| - boundary metadata: `boundary.year`, `boundary.day`, `boundary.time`, `boundary.length` | ||
| - mean tracker metadata: `mean.npp.*` | ||
| - full runtime state: `envi.*`, `trackers.*`, `phenology.*`, `event_trackers.*` | ||
| - mean ring buffers: `mean.npp.values.<idx>`, `mean.npp.weights.<idx>` | ||
| - end marker: `end_restart 1` | ||
|
|
||
| Example checkpoint content is exercised in | ||
| `tests/sipnet/test_restart_infrastructure/testRestartMVP.c`. | ||
|
|
||
| ## Validation Contract | ||
|
|
||
| On load, SIPNET enforces the following. Lines that start with (warning) log a warning and do not error. | ||
|
|
||
| - magic header match | ||
| - schema version match | ||
| - model numeric version match | ||
| - `schema_layout.*` values exactly match the expected struct sizes for the running build | ||
| - (warning) build info mismatch | ||
| - context flag compatibility | ||
| - first-row climate timestamp strictly after checkpoint boundary (`year`, `day`, `time`) | ||
| - (warning) resumed segment starts on the midnight-following day and within one timestep after midnight | ||
| - mean tracker shape/cursor validity | ||
| - All lines appearing after `end_restart` are ignored | ||
| - integer values must fit in signed 32-bit range | ||
| - floating-point values must be finite (`nan`/`inf` are rejected) | ||
|
|
||
| All mismatches above are hard errors except as indicated. | ||
|
|
||
| ## Climate and Event Boundaries | ||
|
|
||
| Restart writes always emit a checkpoint. If the last processed climate step is | ||
| more than one timestep before midnight, SIPNET logs a warning, as there will be a time gap in any resumption from that | ||
| file. | ||
|
|
||
| Resumed climate segments must begin on the day after the checkpoint boundary. If they start more than one timestep | ||
| after midnight (using the first resumed climate row's timestep length) SIPNET logs a warning. | ||
|
|
||
| Event files must be segmented to the same time boundaries as climate segments. | ||
|
|
||
| ## When Saved State Changes | ||
|
|
||
| If you add saved state or change an existing saved payload: | ||
|
|
||
| 1. Update the serialized payload type and restart read/write logic in `src/sipnet/restart.c`. | ||
| 2. Update the `RESTART_SCHEMA_LAYOUT_*` constants, static asserts, and runtime schema-layout validation. | ||
| 3. Update restart docs/tests and bump `RESTART_SCHEMA_VERSION`. | ||
|
|
||
| ## Struct Drift Guards | ||
|
|
||
| Restart schema v1.0 includes compile-time and runtime drift guards so struct layout changes cannot silently pass: | ||
|
|
||
| - Compile-time guards: `_Static_assert` checks in `src/sipnet/restart.c` for `Envi`, `Trackers`, `PhenologyTrackers`, `EventTrackers`, and expected number of model flags in `Context`. | ||
| - Runtime guards: `schema_layout.*` fields in each checkpoint are validated on load. | ||
| - Test guardrails: `tests/sipnet/test_restart_infrastructure/testRestartMVP.c` verifies schema layout keys are present and rejects tampered values. | ||
|
|
||
| ## Schema Bump Checklist | ||
Alomir marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| When intentionally changing the restart schema version: | ||
|
|
||
| 1. Update `src/sipnet/restart.c` in all schema touchpoints: `RESTART_SCHEMA_VERSION`, `RESTART_SCHEMA_LAYOUT_*`, `_Static_assert` layout guards, and checkpoint read/write + required-key validation logic. | ||
| 2. Update restart examples/fixtures to the new header and key set, including the restart fixtures in `tests/sipnet/test_restart_infrastructure/testRestartMVP.c`. | ||
| 3. Update docs that name schema version or key expectations: `docs/developer-guide/restart-checkpoint.md` and `docs/user-guide/running-sipnet.md`. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.