fix(spec): don't panic on overwrite-truncate when previous totals exceed i32::MAX or are non-numeric#2511
Open
SreeramGarlapati wants to merge 1 commit into
Conversation
…e, widen totals to u64
`update_snapshot_summaries` on the overwrite-with-truncate path called
`truncate_table_summary(...).map_err(...).unwrap()`, turning a recoverable
parse error into a process-level panic. Two ways to hit it from normal API
usage:
1. `get_prop` parsed previous totals as `i32` (max ~2.15B). The rest of
this file already parses these counters as `u64`, so `get_prop` was
the odd one out and any table that legitimately accumulated more than
`i32::MAX` data files / records / file-size bytes panicked the writer
on the next overwrite-with-truncate.
2. Any non-numeric previous-total value (corruption, manual edits, a
foreign implementation) produced a parse `Err` and the same panic.
Switch `get_prop` to `u64` to match the rest of the file and give a more
specific error message including the offending property name and value.
Replace the `.unwrap()` on the truncation result with `?`, preserving the
existing "Failed to truncate table summary." wrapper as caller context.
Two unit tests cover the two failure modes, each pinned via the `expect`/
`expect_err` API so a future regression to the panic path fails the test
loudly rather than silently re-introducing the crash. Verified that both
tests panic at `snapshot_summary.rs:356:18` against the unfixed code
(red-state proof) and pass cleanly against the fix.
Closes apache#2510.
Author
blackmwk
approved these changes
May 26, 2026
Contributor
blackmwk
left a comment
There was a problem hiding this comment.
Thanks @SreeramGarlapati for this pr, just one minor point!
| @@ -409,16 +407,16 @@ pub(crate) fn update_snapshot_summaries( | |||
| } | |||
|
|
|||
| #[allow(dead_code)] | |||
Contributor
There was a problem hiding this comment.
Should we remove this atrribute?
Author
There was a problem hiding this comment.
good catch - am keeping this PR's scope tight to the panic fix.
hence opened #2514 as a follow-up that drops all four stale #[allow(dead_code)] attributes in this file (this one plus the ones on update_snapshot_summaries, truncate_table_summary, and update_totals — all stale for the same reason).
verified cargo check and cargo clippy --all-features --tests -- -D warnings stay clean after the removals.
@blackmwk - given the follow up addresses the comment - could u complete merging the current PR?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
What changes are included in this PR?
update_snapshot_summarieson the overwrite-with-truncate path was doingtruncate_table_summary(...).map_err(...).unwrap(). Theunwrap()turned a recoverable parse error into a process panic, and there are two ways to trip it from normal API usage.The first is a sneaky one:
get_propparsed the previous snapshot's totals asi32, even though every other counter path in the same file already usesu64. So once a table accumulated more than ~2.15B data files / records / bytes (i.e. crossedi32::MAX), the next overwrite-with-truncate would parse-fail, hit the unwrap, and crash the writer. Quietly tableable too — nothing in the public API hints thati32is the secret ceiling.The second is corruption / cross-implementation interop: any non-numeric value in a previous
total-*property (a foreign implementation, a manual edit, a half-corrupt metadata file) produces a parse error and the same panic.The fix is small. Widen
get_proptou64so it matches the rest of the file and lifts the artificial ceiling. Replace the.unwrap()with?, keeping the existing\"Failed to truncate table summary.\"wrapper as caller context so the error chain still tells you what was being attempted when the inner parse blew up.get_prop's error message now includes the offending property name and value, which is what you actually need when triaging a malformed metadata file.Scope-wise this only touches the overwrite-truncate path.
update_totalshas a structurally similarunwrap()antipattern that is a separate (lower-severity) issue and not included here.Are these changes tested?
Yes — two unit tests in
snapshot_summary.rs, both red-state proven against the unfixed code:The first asserts that a previous summary with
total-data-files = i32::MAX + 1(andtotal-recordssimilarly) survives the truncation and lands indeleted-data-files/deleted-recordsintact. Against the unfixed code this panics atsnapshot_summary.rs:356:18with\"number too large to fit in target type\"— that's the i32-ceiling bug demonstrated directly.The second feeds
\"not_a_number\"as a previous total and asserts the function returns anErrwhose message carries the\"truncate table summary\"wrapper. Against the unfixed code this panics at the same site with\"invalid digit found in string\"— the malformed-input bug.I ran each test against the unfixed code, observed the panic, applied the fix, and watched them go green. Existing
test_truncate_table_summaryandtest_update_snapshot_summaries_appendcontinue to pass, andcargo fmt --check/cargo clippy --all-features --tests -- -D warningsare clean.