Skip to content

Conversation

jasonstack
Copy link

@jasonstack jasonstack commented Sep 22, 2025

What is the issue

https://github.com/riptano/cndb/issues/15141

What does this PR fix and why was it fixed

Added compaction task verification before transaction commit to validate if all source sstables' min/max keys are present in output sstables or expired if missing from output sstables.

This is mainly to detect data loss caused by compaction strategy skipping some subranges of source sstables HCD-130.

It works as following:

  1. Extract all boundary keys (min/max) from source sstable
  2. Check if boundary keys exist in output sstables
    a If all present, validation passes
    b. If any keys are missing, continue validation
  3. For missing keys, read partition data from source sstables
  4. Apply tombstone purging by using gc_grace_seconds=0
  5. Check purged content
    a If there is no live data, validation passes. Keys are properly obsoleted
    b If there is live data, validation fails and throws to abort compaction if it's not dry run.

Configs:

  • cassandra.compaction_validation_mode=NONE, available options: NONE/WARN/ABORT

@github-actions
Copy link

github-actions bot commented Sep 22, 2025

Checklist before you submit for review

  • This PR adheres to the Definition of Done
  • Make sure there is a PR in the CNDB project updating the Converged Cassandra version
  • Use NoSpamLogger for log lines that may appear frequently in the logs
  • Verify test results on Butler
  • Test coverage for new/modified code is > 80%
  • Proper code formatting
  • Proper title for each commit staring with the project-issue number, like CNDB-1234
  • Each commit has a meaningful description
  • Each commit is not very long and contains related changes
  • Renames, moves and reformatting are in distinct commits
  • All new files should contain the DataStax copyright header instead of the Apache License one

return;

// Early-open modifies obsolete SSTables' start keys, making validation unreliable
if (DatabaseDescriptor.getSSTablePreemptiveOpenIntervalInMB() > 0)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outside of CNDB this is a very significant limitation as the database is expected to always run with early open enabled. Would it work to look at the matching entry for inputs in the CANONICAL sstable set?

(If not, we need to issue a warning somewhere when the combination of these two options is set.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I think it's possible to get the original sstables with original start position from LifecycleTransaction

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pushed e96829ba514609e3ca4cf3e4361bedb962cca284 to use the original sstables instead of logged.obsolete which are modified by early-open

@jasonstack jasonstack force-pushed the cndb-15141-main branch 2 times, most recently from 021d5f8 to 8a98d05 Compare September 25, 2025 07:04
/**
* Whether to enable compaction validation to detect potential data loss and abort compaction
*/
COMPACTION_VALIDATION_ENABLED("cassandra.compaction_validation_enabled", "true"),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want this to be turned on by default everywhere? And do we want it in non-abort mode?

Related to this, I would change the flag to be an enum (e.g NONE/WARN/ABORT) to make the configuration cleaner.

Copy link
Author

@jasonstack jasonstack Sep 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed 3ab5de9f134 to change config to enum (NONE/WARN/ABORT).

For CC, the default is NONE.
For CNDB side, it would be ABORT for compactor and NONE for all other services.

wdyt?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jasonstack jasonstack force-pushed the cndb-15141-main branch 2 times, most recently from b5d44cf to 3ab5de9 Compare September 26, 2025 05:28
@sonarqubecloud
Copy link

@jasonstack jasonstack merged commit 5ea1221 into main Sep 26, 2025
426 of 463 checks passed
@jasonstack jasonstack deleted the cndb-15141-main branch September 26, 2025 07:42
@cassci-bot
Copy link

✔️ Build ds-cassandra-pr-gate/PR-2008 approved by Butler


Approved by Butler
See build details here

michaelsembwever pushed a commit that referenced this pull request Oct 9, 2025
…lidation before txn prepare (#2008)

riptano/cndb#15141

Added compaction task verification before transaction commit to validate
if all source sstables' min/max keys are present in output sstables or
expired if missing from output sstables.

This is mainly to detect data loss caused by compaction strategy
skipping some subranges of source sstables HCD-130.

It works as following:
1. Extract all boundary keys (min/max) from source sstable
2. Check if boundary keys exist in output sstables
  a If all present, validation passes
  b. If any keys are missing, continue validation
3. For missing keys, read partition data from source sstables
4. Apply tombstone purging by using `gc_grace_seconds=0`
5. Check purged content
a If there is no live data, validation passes. Keys are properly
obsoleted
b If there is live data, validation fails and throws to abort compaction
if it's configured to abort

Configs:
* `cassandra.compaction_validation_mode=NONE`, available options:
NONE/WARN/ABORT
michaelsembwever pushed a commit that referenced this pull request Oct 10, 2025
…lidation before txn prepare (#2008)

riptano/cndb#15141

Added compaction task verification before transaction commit to validate
if all source sstables' min/max keys are present in output sstables or
expired if missing from output sstables.

This is mainly to detect data loss caused by compaction strategy
skipping some subranges of source sstables HCD-130.

It works as following:
1. Extract all boundary keys (min/max) from source sstable
2. Check if boundary keys exist in output sstables
  a If all present, validation passes
  b. If any keys are missing, continue validation
3. For missing keys, read partition data from source sstables
4. Apply tombstone purging by using `gc_grace_seconds=0`
5. Check purged content
a If there is no live data, validation passes. Keys are properly
obsoleted
b If there is live data, validation fails and throws to abort compaction
if it's configured to abort

Configs:
* `cassandra.compaction_validation_mode=NONE`, available options:
NONE/WARN/ABORT
michaelsembwever pushed a commit that referenced this pull request Oct 14, 2025
…lidation before txn prepare (#2008)

riptano/cndb#15141

Added compaction task verification before transaction commit to validate
if all source sstables' min/max keys are present in output sstables or
expired if missing from output sstables.

This is mainly to detect data loss caused by compaction strategy
skipping some subranges of source sstables HCD-130.

It works as following:
1. Extract all boundary keys (min/max) from source sstable
2. Check if boundary keys exist in output sstables
  a If all present, validation passes
  b. If any keys are missing, continue validation
3. For missing keys, read partition data from source sstables
4. Apply tombstone purging by using `gc_grace_seconds=0`
5. Check purged content
a If there is no live data, validation passes. Keys are properly
obsoleted
b If there is live data, validation fails and throws to abort compaction
if it's configured to abort

Configs:
* `cassandra.compaction_validation_mode=NONE`, available options:
NONE/WARN/ABORT
michaelsembwever pushed a commit that referenced this pull request Oct 15, 2025
…lidation before txn prepare (#2008)

riptano/cndb#15141

Added compaction task verification before transaction commit to validate
if all source sstables' min/max keys are present in output sstables or
expired if missing from output sstables.

This is mainly to detect data loss caused by compaction strategy
skipping some subranges of source sstables HCD-130.

It works as following:
1. Extract all boundary keys (min/max) from source sstable
2. Check if boundary keys exist in output sstables
  a If all present, validation passes
  b. If any keys are missing, continue validation
3. For missing keys, read partition data from source sstables
4. Apply tombstone purging by using `gc_grace_seconds=0`
5. Check purged content
a If there is no live data, validation passes. Keys are properly
obsoleted
b If there is live data, validation fails and throws to abort compaction
if it's configured to abort

Configs:
* `cassandra.compaction_validation_mode=NONE`, available options:
NONE/WARN/ABORT
michaelsembwever pushed a commit that referenced this pull request Oct 15, 2025
…lidation before txn prepare (#2008)

riptano/cndb#15141

Added compaction task verification before transaction commit to validate
if all source sstables' min/max keys are present in output sstables or
expired if missing from output sstables.

This is mainly to detect data loss caused by compaction strategy
skipping some subranges of source sstables HCD-130.

It works as following:
1. Extract all boundary keys (min/max) from source sstable
2. Check if boundary keys exist in output sstables
  a If all present, validation passes
  b. If any keys are missing, continue validation
3. For missing keys, read partition data from source sstables
4. Apply tombstone purging by using `gc_grace_seconds=0`
5. Check purged content
a If there is no live data, validation passes. Keys are properly
obsoleted
b If there is live data, validation fails and throws to abort compaction
if it's configured to abort

Configs:
* `cassandra.compaction_validation_mode=NONE`, available options:
NONE/WARN/ABORT
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants