Skip to content

fix: drop withDefaultValue from StatisticsManager so checkpoint state round-trips#5150

Merged
Yicong-Huang merged 6 commits into
apache:mainfrom
Ma77Ball:fix/checkPointState
May 23, 2026
Merged

fix: drop withDefaultValue from StatisticsManager so checkpoint state round-trips#5150
Yicong-Huang merged 6 commits into
apache:mainfrom
Ma77Ball:fix/checkPointState

Conversation

@Ma77Ball
Copy link
Copy Markdown
Contributor

What changes were proposed in this PR?

StatisticsManager declared its input/output stats maps as mutable.Map.empty.withDefaultValue((0L, 0L)). The resulting Map.WithDefault wrapper does not survive a Kryo round-trip (its inner map deserializes as null), so
chkpt.load(CP_STATE_KEY) on a default-state ControllerProcessor throws KryoException: NullPointerException, blocking Controller.loadFromCheckpoint from ever rehydrating a checkpointed controller. This PR removes the wrapper and inlines getOrElse(portId, (0L, 0L)) at the two write sites; behavior is unchanged.

Any related issues, documentation, or discussions?

closes: #4686

How was this PR tested?

Replaced the two existing should be serializable cases in CheckpointSpec with full save then load round-trips (controller + worker) that assert restored.actorId == original.actorId; the new tests reproduce the original NPE on main and pass after the fix. Verified locally with sbt 'WorkflowExecutionService / Test / testOnly org.apache.texera.amber.engine.faulttolerance.CheckpointSpec' (3/3 pass).

Was this PR authored or co-authored using generative AI tooling?

Co-authored with Claude Opus 4.7 in compliance with ASF

@Ma77Ball
Copy link
Copy Markdown
Contributor Author

/request-review @aglinxinyuan

@github-actions github-actions Bot requested a review from aglinxinyuan May 22, 2026 08:40
@github-actions github-actions Bot added engine fix ci changes related to CI labels May 22, 2026
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 22, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 43.34%. Comparing base (bf2f92c) to head (322b82d).

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #5150      +/-   ##
============================================
- Coverage     43.36%   43.34%   -0.03%     
+ Complexity     2217     2211       -6     
============================================
  Files          1049     1049              
  Lines         40560    40558       -2     
  Branches       4322     4320       -2     
============================================
- Hits          17589    17580       -9     
- Misses        21883    21888       +5     
- Partials       1088     1090       +2     
Flag Coverage Δ *Carryforward flag
access-control-service 39.53% <ø> (ø) Carriedforward from c1870e3
agent-service 33.76% <ø> (ø) Carriedforward from c1870e3
amber 43.79% <100.00%> (-0.06%) ⬇️
computing-unit-managing-service 0.00% <ø> (ø) Carriedforward from c1870e3
config-service 0.00% <ø> (ø) Carriedforward from c1870e3
file-service 32.18% <ø> (ø) Carriedforward from c1870e3
frontend 34.61% <ø> (ø) Carriedforward from c1870e3
python 90.50% <ø> (ø) Carriedforward from c1870e3
workflow-compiling-service 56.81% <ø> (ø) Carriedforward from c1870e3

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions github-actions Bot removed the ci changes related to CI label May 22, 2026
Copy link
Copy Markdown
Contributor

@Yicong-Huang Yicong-Huang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes direction looks correct. I wonder if it has issues with existing DBs (for the checkpointed and persisted states).

Added a comment regarding DataProcessor in StatisticsManager.

Signed-off-by: Matthew B. <mgball@uci.edu>
@Yicong-Huang Yicong-Huang added this pull request to the merge queue May 23, 2026
Merged via the queue into apache:main with commit a7d4bc6 May 23, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ControllerProcessor save→load round-trip through CheckpointState throws Kryo NPE on StatisticsManager.inputStatistics

3 participants