Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: assert single deletes do indeed delete keys #115881

Closed
jbowens opened this issue Dec 8, 2023 · 0 comments
Closed

storage: assert single deletes do indeed delete keys #115881

jbowens opened this issue Dec 8, 2023 · 0 comments
Assignees
Labels
A-storage Relating to our storage engine (Pebble) on-disk storage. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-storage Storage Team

Comments

@jbowens
Copy link
Collaborator

jbowens commented Dec 8, 2023

Cockroach's single deletes are expected to delete exactly one key. In Pebble if a single delete is being elided but does not drop any keys, Cockroach's invariants have been broken somehow. We can add a runtime assertion in this case and fatal the process.

Related to #114492.

Jira issue: CRDB-34266

@jbowens jbowens added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-storage Relating to our storage engine (Pebble) on-disk storage. T-storage Storage Team labels Dec 8, 2023
@jbowens jbowens added this to Incoming in (Deprecated) Storage via automation Dec 8, 2023
@nicktrav nicktrav moved this from Incoming to In Progress (this milestone) in (Deprecated) Storage Dec 12, 2023
sumeerbhola added a commit to sumeerbhola/pebble that referenced this issue Dec 13, 2023
Two callbacks are added to Options:
- IneffectualSingleDeleteCallback is called when a SingleDelete is
  being elided without encountering a Set/Merge/SetWithDelete.
- SingleDeleteInvariantViolationCallback is called when a SingleDelete
  has at least a pair of Set/Merge/SetWithDelete below it, since it
  violates the usage requirement of a SingleDelete.

The metamorphic test now fails on the SingleDeleteInvariantViolationCallback.

The support for Merge to be SingleDeleted, that was introduced in
cockroachdb#3118, is slightly improved
in that SingleDelete consumes a single Merge instead of turning into
a Delete.

A subsequent change in CockroachDB will fatal or increment a metric
(based on cluster settings) for these callbacks.

Informs cockroachdb/cockroach#115881
sumeerbhola added a commit to sumeerbhola/pebble that referenced this issue Dec 13, 2023
Two callbacks are added to Options:
- IneffectualSingleDeleteCallback is called when a SingleDelete is
  being elided without encountering a Set/Merge/SetWithDelete.
- SingleDeleteInvariantViolationCallback is called when a SingleDelete
  has at least a pair of Set/Merge/SetWithDelete below it, since it
  violates the usage requirement of a SingleDelete.

The metamorphic test now fails on the SingleDeleteInvariantViolationCallback.

The support for Merge to be SingleDeleted, that was introduced in
cockroachdb#3118, is slightly improved
in that SingleDelete consumes a single Merge instead of turning into
a Delete.

A subsequent change in CockroachDB will fatal or increment a metric
(based on cluster settings) for these callbacks.

Informs cockroachdb/cockroach#115881
sumeerbhola added a commit to sumeerbhola/pebble that referenced this issue Dec 13, 2023
Two callbacks are added to Options:
- IneffectualSingleDeleteCallback is called when a SingleDelete is
  being elided without encountering a Set/Merge/SetWithDelete.
- SingleDeleteInvariantViolationCallback is called when a SingleDelete
  has at least a pair of Set/Merge/SetWithDelete below it, since it
  violates the usage requirement of a SingleDelete.

The metamorphic test now fails on the SingleDeleteInvariantViolationCallback.

The support for Merge to be SingleDeleted, that was introduced in
cockroachdb#3118, is slightly improved
in that SingleDelete consumes a single Merge instead of turning into
a Delete.

A subsequent change in CockroachDB will fatal or increment a metric
(based on cluster settings) for these callbacks.

Informs cockroachdb/cockroach#115881
sumeerbhola added a commit to cockroachdb/pebble that referenced this issue Dec 13, 2023
Two callbacks are added to Options:
- IneffectualSingleDeleteCallback is called when a SingleDelete is
  being elided without encountering a Set/Merge/SetWithDelete.
- SingleDeleteInvariantViolationCallback is called when a SingleDelete
  has at least a pair of Set/Merge/SetWithDelete below it, since it
  violates the usage requirement of a SingleDelete.

The metamorphic test now fails on the SingleDeleteInvariantViolationCallback.

The support for Merge to be SingleDeleted, that was introduced in
#3118, is slightly improved
in that SingleDelete consumes a single Merge instead of turning into
a Delete.

A subsequent change in CockroachDB will fatal or increment a metric
(based on cluster settings) for these callbacks.

Informs cockroachdb/cockroach#115881
sumeerbhola added a commit to sumeerbhola/pebble that referenced this issue Dec 13, 2023
Two callbacks are added to Options:
- IneffectualSingleDeleteCallback is called when a SingleDelete is
  being elided without encountering a Set/Merge/SetWithDelete.
- SingleDeleteInvariantViolationCallback is called when a SingleDelete
  has at least a pair of Set/Merge/SetWithDelete below it, since it
  violates the usage requirement of a SingleDelete.

The metamorphic test now fails on the SingleDeleteInvariantViolationCallback.

The support for Merge to be SingleDeleted, that was introduced in
cockroachdb#3118, is slightly improved
in that SingleDelete consumes a single Merge instead of turning into
a Delete.

A subsequent change in CockroachDB will fatal or increment a metric
(based on cluster settings) for these callbacks.

Informs cockroachdb/cockroach#115881
sumeerbhola added a commit to cockroachdb/pebble that referenced this issue Dec 13, 2023
Two callbacks are added to Options:
- IneffectualSingleDeleteCallback is called when a SingleDelete is
  being elided without encountering a Set/Merge/SetWithDelete.
- SingleDeleteInvariantViolationCallback is called when a SingleDelete
  has at least a pair of Set/Merge/SetWithDelete below it, since it
  violates the usage requirement of a SingleDelete.

The metamorphic test now fails on the SingleDeleteInvariantViolationCallback.

The support for Merge to be SingleDeleted, that was introduced in
#3118, is slightly improved
in that SingleDelete consumes a single Merge instead of turning into
a Delete.

A subsequent change in CockroachDB will fatal or increment a metric
(based on cluster settings) for these callbacks.

Informs cockroachdb/cockroach#115881
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Dec 14, 2023
93aaf52c3be db: add compaction callbacks for unexpected SingleDelete behavior
ba91febb8d4 db: compactionIter.nextInStripe skips RANGEDEL covered keys in the same stripe
f065d7a4d05 db: fix SINGLEDEL, MERGE + DEL interaction

Informs cockroachdb#115881
Informs cockroachdb#114421

Release note: None
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Dec 14, 2023
cockroachdb/pebble@ab4952c5 metamorphic: track object key bounds
cockroachdb/pebble@288bf0fb db: add compaction callbacks for unexpected SingleDelete behavior

Release note: None

Informs cockroachdb#115881
Informs cockroachdb#114421

Epic: none
blathers-crl bot pushed a commit that referenced this issue Dec 14, 2023
93aaf52c3be db: add compaction callbacks for unexpected SingleDelete behavior
ba91febb8d4 db: compactionIter.nextInStripe skips RANGEDEL covered keys in the same stripe
f065d7a4d05 db: fix SINGLEDEL, MERGE + DEL interaction

Informs #115881
Informs #114421

Release note: None
craig bot pushed a commit that referenced this issue Dec 14, 2023
116103: logictest: uncomment composite types test that passes r=rafiss a=rafiss

This was broken for the legacy schema changer, but there's no need to skip it now that it works in the new schema changer.

informs #91972
Release note: None

116383: opt: disallow locking table references on null-extended side of outer join r=rytaft a=rytaft

In #115795 we disallowed locking tables on the null-extended side of outer joins, but did not make the same change for numeric table references. This commit fixes that oversight.

Informs #97434

There is no release note since #115795 has not yet been included in any release.

Release note: None

116385: ui: default to descending hot ranges order r=koorosh a=kvoli

The hot ranges page default sort setting changed from descending to ascending in b06b6f0. Change it back to descending, so that the hottest ranges appear first by default.

Epic: none
Release note: None

116452: roachtest: pass startopts to SetDefaultPort helpers by reference r=srosenberg,RaduBerinde,renatolabs a=DarrylWong

Startopts was being passed by value, effectively making the helpers noops.

Release note: none
Epic: none
Fixes: #116432 
Fixes: #116433 
Fixes: #116485

116463: go.mod: bump Pebble to ab4952c5f87b r=jbowens,aadityasondhi a=sumeerbhola

cockroachdb/pebble@ab4952c5 metamorphic: track object key bounds
cockroachdb/pebble@288bf0fb db: add compaction callbacks for unexpected SingleDelete behavior

Release note: None

Informs #115881
Informs #114421

Epic: none

116477: bazel: remove spare line r=rail a=rickystewart

--incompatible_strict_action_env is set regardless, so we don't need to specify it again.

Epic: CRDB-8308
Release note: None

Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>
Co-authored-by: Rebecca Taft <becca@cockroachlabs.com>
Co-authored-by: Austen McClernon <austen@cockroachlabs.com>
Co-authored-by: DarrylWong <darryl@cockroachlabs.com>
Co-authored-by: sumeerbhola <sumeer@cockroachlabs.com>
Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com>
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Dec 15, 2023
93aaf52c3be db: add compaction callbacks for unexpected SingleDelete behavior
ba91febb8d4 db: compactionIter.nextInStripe skips RANGEDEL covered keys in the same stripe
f065d7a4d05 db: fix SINGLEDEL, MERGE + DEL interaction

Release note: None

Informs cockroachdb#115881
Informs cockroachdb#114421

Release note: None

Release justification: additional invariant checking to help uncover cockroachdb#114421.
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Dec 15, 2023
Handlers are added for SingleDeleteInvariantViolationCallback and
IneffectualSingleDeleteCallback options in Pebble. By default, they crash
the process. If crashing is turned off, they increment metrics,
storage.single-delete.invariant-violation and
storage.single-delete.ineffectual.

Informs cockroachdb#115881
Informs cockroachdb#114421

Epic: none

Release note (ops change): The cluster settings
storage.single_delete.crash_on_invariant_violation.enabled and
storage.single_delete.crash_on_ineffectual.enabled default to true
since these can be indicative of a serious bug that could corrupt
data.
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Dec 15, 2023
Handlers are added for SingleDeleteInvariantViolationCallback and
IneffectualSingleDeleteCallback options in Pebble. By default, they crash
the process. If crashing is turned off, they increment metrics,
storage.single-delete.invariant-violation and
storage.single-delete.ineffectual.

Informs cockroachdb#115881
Informs cockroachdb#114421

Epic: none

Release note (ops change): The cluster settings
storage.single_delete.crash_on_invariant_violation.enabled and
storage.single_delete.crash_on_ineffectual.enabled default to true
since these can be indicative of a serious bug that could corrupt
data.
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Dec 15, 2023
Handlers are added for SingleDeleteInvariantViolationCallback and
IneffectualSingleDeleteCallback options in Pebble. By default, they crash
the process. If crashing is turned off, they increment metrics,
storage.single-delete.invariant-violation and
storage.single-delete.ineffectual.

Informs cockroachdb#115881
Informs cockroachdb#114421

Epic: none

Release note (ops change): The cluster settings
storage.single_delete.crash_on_invariant_violation.enabled and
storage.single_delete.crash_on_ineffectual.enabled default to true
since these can be indicative of a serious bug that could corrupt
data.
craig bot pushed a commit that referenced this issue Dec 15, 2023
116544: storage: add single delete callbacks to crash, or increment metrics r=jbowens a=sumeerbhola

Handlers are added for SingleDeleteInvariantViolationCallback and IneffectualSingleDeleteCallback options in Pebble. By default, they crash the process. If crashing is turned off, they increment metrics, storage.single-delete.invariant-violation and
storage.single-delete.ineffectual.

Informs #115881
Informs #114421

Epic: none

Release note (ops change): The cluster settings
storage.single_delete.crash_on_invariant_violation.enabled and storage.single_delete.crash_on_ineffectual.enabled default to true since these can be indicative of a serious bug that could corrupt data.

116549: sql,opt: don't validate AOST during session migration r=rafiss a=rafiss

When re-preparing a statement for a session migration, we want to skip evaluating and validating the AS OF SYSTEM TIME clause. During session migrations, we know that the statement will just be prepared, and not executed, and each statement could have different AOST timestamps. Therefore it is incorrect to evaluate the AOST clause and fix the transaction timestamp.

No release note since this fixes a bug that only affects Serverless. See [here](https://cockroachlabsgcp.splunkcloud.com/en-US/app/search/search?earliest=1702141875&latest=1702401075&q=search%20index%3Dcc-app-crdb*%20%20%22could%20not%20prepare%20statement%20during%20session%20migration%22%20host_ip%3D%2210.4.0.19%22&display.page.search.mode=verbose&dispatch.sample_ratio=1&display.general.type=events&display.page.search.tab=events&workload_pool=standard_perf&sid=1702579453.274591) for an example of the error.

Epic: None
Release note: None

116567: roachtest: run workload after 8tb OR roachtest r=adityamaru a=msbutler

Previously the tpce workload would not run after the 8tb OR test, because of a misconfiguration in the test specs. This fixes this bug.

Epic: none

Release note: none

116580: roachtest: fix mininum version calculation in `UploadWorkload` r=rickystewart,DarrylWong a=renatolabs

In ARM64 builds, the `workload` binary is only available in 23.2+.

Epic: none

Release note: None

116587: roachtest: don't run `schemachange` workload with upgrade migrations r=annrpom,DarrylWong a=renatolabs

This has been flaking on all branches.

Informs: #116586.
Fixes: #116304.
Fixes: #116425.
Fixes: #116357.

Release note: None

Co-authored-by: sumeerbhola <sumeer@cockroachlabs.com>
Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>
Co-authored-by: Michael Butler <butler@cockroachlabs.com>
Co-authored-by: Renato Costa <renato@cockroachlabs.com>
blathers-crl bot pushed a commit that referenced this issue Dec 15, 2023
Handlers are added for SingleDeleteInvariantViolationCallback and
IneffectualSingleDeleteCallback options in Pebble. By default, they crash
the process. If crashing is turned off, they increment metrics,
storage.single-delete.invariant-violation and
storage.single-delete.ineffectual.

Informs #115881
Informs #114421

Epic: none

Release note (ops change): The cluster settings
storage.single_delete.crash_on_invariant_violation.enabled and
storage.single_delete.crash_on_ineffectual.enabled default to true
since these can be indicative of a serious bug that could corrupt
data.
blathers-crl bot pushed a commit that referenced this issue Dec 15, 2023
Handlers are added for SingleDeleteInvariantViolationCallback and
IneffectualSingleDeleteCallback options in Pebble. By default, they crash
the process. If crashing is turned off, they increment metrics,
storage.single-delete.invariant-violation and
storage.single-delete.ineffectual.

Informs #115881
Informs #114421

Epic: none

Release note (ops change): The cluster settings
storage.single_delete.crash_on_invariant_violation.enabled and
storage.single_delete.crash_on_ineffectual.enabled default to true
since these can be indicative of a serious bug that could corrupt
data.
(Deprecated) Storage automation moved this from In Progress (this milestone) to Done Dec 18, 2023
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Dec 20, 2023
As elaborated in code comments, these can be false positives caused by
delete-only compactions. The metrics are not disabled, under the rare
chance that we see a problem in a cluster and one of the metrics (if
it happens to not be a false positive) gives us a hint about the cause
of the problem. For the same reason, we log such callbacks every 5
minutes, so we can correlate the key in such logs with an actual
problem. False positives should be rare, especially for the invariant
violation callback.

Informs cockroachdb#115881
Informs cockroachdb#114421

Epic: none

Release note (ops change): The cluster settings
storage.single_delete.crash_on_invariant_violation.enabled and
storage.single_delete.crash_on_ineffectual.enabled are disabled and
must not be enabled.
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Dec 20, 2023
As elaborated in code comments, these can be false positives caused by
delete-only compactions. The metrics are not disabled, under the rare
chance that we see a problem in a cluster and one of the metrics (if
it happens to not be a false positive) gives us a hint about the cause
of the problem. For the same reason, we log such callbacks every 5
minutes, so we can correlate the key in such logs with an actual
problem. False positives should be rare, especially for the invariant
violation callback.

Informs cockroachdb#115881
Informs cockroachdb#114421

Epic: none

Release note (ops change): The cluster settings
storage.single_delete.crash_on_invariant_violation.enabled and
storage.single_delete.crash_on_ineffectual.enabled are disabled and
must not be enabled.
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Dec 20, 2023
As elaborated in code comments, these can be false positives caused by
delete-only compactions. The metrics are not disabled, under the rare
chance that we see a problem in a cluster and one of the metrics (if
it happens to not be a false positive) gives us a hint about the cause
of the problem. For the same reason, we log such callbacks every 5
minutes, so we can correlate the key in such logs with an actual
problem. False positives should be rare, especially for the invariant
violation callback.

Informs cockroachdb#115881
Informs cockroachdb#114421

Epic: none

Release note (ops change): The cluster settings
storage.single_delete.crash_on_invariant_violation.enabled and
storage.single_delete.crash_on_ineffectual.enabled are disabled and
must not be enabled.
jbowens pushed a commit to jbowens/cockroach that referenced this issue Dec 20, 2023
As elaborated in code comments, these can be false positives caused by
delete-only compactions. The metrics are not disabled, under the rare
chance that we see a problem in a cluster and one of the metrics (if
it happens to not be a false positive) gives us a hint about the cause
of the problem. For the same reason, we log such callbacks every 5
minutes, so we can correlate the key in such logs with an actual
problem. False positives should be rare, especially for the invariant
violation callback.

Informs cockroachdb#115881
Informs cockroachdb#114421

Epic: none

Release note (ops change): The cluster settings
storage.single_delete.crash_on_invariant_violation.enabled and
storage.single_delete.crash_on_ineffectual.enabled are disabled and
must not be enabled.
craig bot pushed a commit that referenced this issue Dec 21, 2023
116889: storage: disable crashing for single delete callbacks r=sumeerbhola a=sumeerbhola

As elaborated in code comments, these can be false positives caused by delete-only compactions. The metrics are not disabled, under the rare chance that we see a problem in a cluster and one of the metrics (if it happens to not be a false positive) gives us a hint about the cause of the problem. For the same reason, we log such callbacks every 5 minutes, so we can correlate the key in such logs with an actual problem. False positives should be rare, especially for the invariant violation callback.

Informs #115881
Informs #114421

Epic: none

Release note (ops change): The cluster settings
storage.single_delete.crash_on_invariant_violation.enabled and storage.single_delete.crash_on_ineffectual.enabled are disabled and must not be enabled.

Co-authored-by: sumeerbhola <sumeer@cockroachlabs.com>
blathers-crl bot pushed a commit that referenced this issue Dec 21, 2023
As elaborated in code comments, these can be false positives caused by
delete-only compactions. The metrics are not disabled, under the rare
chance that we see a problem in a cluster and one of the metrics (if
it happens to not be a false positive) gives us a hint about the cause
of the problem. For the same reason, we log such callbacks every 5
minutes, so we can correlate the key in such logs with an actual
problem. False positives should be rare, especially for the invariant
violation callback.

Informs #115881
Informs #114421

Epic: none

Release note (ops change): The cluster settings
storage.single_delete.crash_on_invariant_violation.enabled and
storage.single_delete.crash_on_ineffectual.enabled are disabled and
must not be enabled.
blathers-crl bot pushed a commit that referenced this issue Dec 21, 2023
As elaborated in code comments, these can be false positives caused by
delete-only compactions. The metrics are not disabled, under the rare
chance that we see a problem in a cluster and one of the metrics (if
it happens to not be a false positive) gives us a hint about the cause
of the problem. For the same reason, we log such callbacks every 5
minutes, so we can correlate the key in such logs with an actual
problem. False positives should be rare, especially for the invariant
violation callback.

Informs #115881
Informs #114421

Epic: none

Release note (ops change): The cluster settings
storage.single_delete.crash_on_invariant_violation.enabled and
storage.single_delete.crash_on_ineffectual.enabled are disabled and
must not be enabled.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-storage Relating to our storage engine (Pebble) on-disk storage. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-storage Storage Team
Projects
Archived in project
Development

No branches or pull requests

2 participants