sql: DROP DATABASE CASCADE leaves dependent objects orphaned #51782

arulajmani · 2020-07-22T17:50:44Z

When we drop a database, we queue up a single job to perform schema changes for all objects in the database instead of having individual objects queue up separate jobs. This list is constructed from a filtered list of objects in the database -- objects that depend on other objects (and therefore would be handled by a cascade deletion) are not present in the list. This results in orphaned namespace/descriptor table entries for objects that are dependent on other objects with the database.

To Reproduce

CREATE TABLE db.foo(a int);
CREATE VIEW db.v as SELECT a FROM db.foo; 
DROP DATABASE db CASCADE; 
SELECT * FROM system.namespace;

There's orphaned entries for v which shouldn't be there.

The text was updated successfully, but these errors were encountered:

arulajmani · 2020-07-22T19:43:48Z

Whatever solution we construct for dependent objects, this problem also arises with owned sequences as well in case the sequence-table ownership is cross-database. We should find such dependencies and have them added to DroppedTableDetails list passed to the createDropDatabaseJob as well.

There were a few things going on here because of the way we filtered tables during `DROP DATABASE CASCADE`. The intention was to filter out objects that depended on other objects also being deleted (and would therefore be deleted by the CASCADE nature of object drops). This assumption is correct for table-view and view-view dependencies -- but not for sequences. We also switched from individual schema change jobs during a database drop to a single job which doesn't play well with this filtering -- as no new jobs are queued, all objects that were filtered would leave orphaned namespace/descriptor entries behind. It's interesting to note that the filtering didn't directly cause this issue, it just made the underlying issue more visible -- the single drop database schema change job relies on knowing about all object descriptors that need to be dropped upfront. Orphaned entries which would have only occured for cross-database dependencies can occur in the same database because of the filtering. This leads to why the fix is what it is. As part of this patch, I've tried to make the preperation steps for dropping databases more explicit. First, we accumalate all objects that need to be deleted. This includes objects that will be implicitly deleted, such as owned sequences, dependent views (both in the database being deleted and other databases that aren't being deleted). The schema change job uses this new list to ensure entries are appropriately cleaned up. We still perform the filter step as before to identify objects which are the "root" of a drop and only call drop on these objects. The only change here is that instead of accumulating dependent objects, we explicitly accumulate cascading views. Fixes cockroachdb#51782 Fixes cockroachdb#50997 Release note (bug fix): Before this change, we would leave orphaned system.namespace/system.descriptor entries if we ran a `DROP DATABASE CASCADE` and the database contained "dependency" relations. For example, if the database included a view which depended on a table in the database, dropping the database would result in an orphaned entry for the view. Same thing for a sequence that was used by a table in the database. (See cockroachdb#51782 for repro steps). This bug is now fixed, and cleanup happens as expected.

51813: sql: correctly identify and cleanup dependent objects in drop database r=ajwerner a=arulajmani There were a few things going on here because of the way we filtered tables during `DROP DATABASE CASCADE`. The intention was to filter out objects that depended on other objects also being deleted (and would therefore be deleted by the CASCADE nature of object drops). This assumption is correct for table-view and view-view dependencies -- but not for sequences. We also switched from individual schema change jobs during a database drop to a single job which doesn't play well with this filtering -- as no new jobs are queued, all objects that were filtered would leave orphaned namespace/descriptor entries behind. It's interesting to note that the filtering didn't directly cause this issue, it just made the underlying issue more visible -- the single drop database schema change job relies on knowing about all object descriptors that need to be dropped upfront. Orphaned entries which would have only occured for cross-database dependencies can occur in the same database because of the filtering. This leads to why the fix is what it is. As part of this patch, I've tried to make the preperation steps for dropping databases more explicit. First, we accumalate all objects that need to be deleted. This includes objects that will be implicitly deleted, such as owned sequences, dependent views (both in the database being deleted and other databases that aren't being deleted). The schema change job uses this new list to ensure entries are appropriately cleaned up. We still perform the filter step as before to identify objects which are the "root" of a drop and only call drop on these objects. The only change here is that instead of accumulating dependent objects, we explicitly accumulate cascading views. Fixes #51782 Fixes #50997 Release note (bug fix): Before this change, we would leave orphaned system.namespace/system.descriptor entries if we ran a `DROP DATABASE CASCADE` and the database contained "dependency" relations. For example, if the database included a view which depended on a table in the database, dropping the database would result in an orphaned entry for the view. Same thing for a sequence that was used by a table in the database. (See #51782 for repro steps). This bug is now fixed, and cleanup happens as expected. Co-authored-by: arulajmani <arulajmani@gmail.com>

There were a few things going on here because of the way we filtered tables during `DROP DATABASE CASCADE`. The intention was to filter out objects that depended on other objects also being deleted (and would therefore be deleted by the CASCADE nature of object drops). This assumption is correct for table-view and view-view dependencies -- but not for sequences. We also switched from individual schema change jobs during a database drop to a single job which doesn't play well with this filtering -- as no new jobs are queued, all objects that were filtered would leave orphaned namespace/descriptor entries behind. It's interesting to note that the filtering didn't directly cause this issue, it just made the underlying issue more visible -- the single drop database schema change job relies on knowing about all object descriptors that need to be dropped upfront. Orphaned entries which would have only occured for cross-database dependencies can occur in the same database because of the filtering. This leads to why the fix is what it is. As part of this patch, I've tried to make the preperation steps for dropping databases more explicit. First, we accumalate all objects that need to be deleted. This includes objects that will be implicitly deleted, such as owned sequences, dependent views (both in the database being deleted and other databases that aren't being deleted). The schema change job uses this new list to ensure entries are appropriately cleaned up. We still perform the filter step as before to identify objects which are the "root" of a drop and only call drop on these objects. The only change here is that instead of accumulating dependent objects, we explicitly accumulate cascading views. Fixes cockroachdb#51782 Fixes cockroachdb#50997 Release note (bug fix): Before this change, we would leave orphaned system.namespace/system.descriptor entries if we ran a `DROP DATABASE CASCADE` and the database contained "dependency" relations. For example, if the database included a view which depended on a table in the database, dropping the database would result in an orphaned entry for the view. Same thing for a sequence that was used by a table in the database. (See cockroachdb#51782 for repro steps). This bug is now fixed, and cleanup happens as expected.

There are bugs in previous versions (cockroachdb#51782, cockroachdb#54861, and surely others) which can leave descriptors in an invalid state. When this occurs, the only recourse is to inject either repaired versions directly or other descriptors that allow use of normal DDL statements to reach the desired state. Prior to this change, the mechanism by which this repair could be achieved was a bit of a hack whereby the "node" users which is used internally to muck with certain system tables could be granted to the "root" user for purposes of updating entries in tables like system.namespace and system.descriptor. Even this was problematic because those tables do not utilize all of the column families when written to using the KV API. That made writing new rows to such tables extremely dangerous as low-level code for determining zone configurations in kv assumes that there will be appropriate values in all kv rows in those tables (which is not the case for certain column families). Fixes cockroachdb#50948. Release note (sql change): Added admin-only, crdb_internal functions to enable descriptor repair in dire circumstances.

55699: sql,builtins: add builtin functions to perform descriptor repair r=lucy-zhang a=ajwerner There are bugs in previous versions (#51782, #54861, and surely others) which can leave descriptors in an invalid state. When this occurs, the only recourse is to inject either repaired versions directly or other descriptors that allow use of normal DDL statements to reach the desired state. Prior to this change, the mechanism by which this repair could be achieved was a bit of a hack whereby the "node" users which is used internally to muck with certain system tables could be granted to the "root" user for purposes of updating entries in tables like system.namespace and system.descriptor. Even this was problematic because those tables do not utilize all of the column families when written to using the KV API. That made writing new rows to such tables extremely dangerous as low-level code for determining zone configurations in kv assumes that there will be appropriate values in all kv rows in those tables (which is not the case for certain column families). Fixes #50948. Release note (sql change): Added admin-only, crdb_internal functions to enable descriptor repair in dire circumstances. 56025: monitoring: remove faulty prometheus.yaml field r=taroface a=taroface The field `serviceAccountName: prometheus` in the `ServiceMonitor` spec was causing an error: ``` ValidationError(ServiceMonitor.spec): unknown field "serviceAccountName" in com.coreos.monitoring.v1.ServiceMonitor.spec; if you choose to ignore these errors, turn validation off with --validate=false ``` This field doesn't seem to be in the [ServiceMonitorSpec](https://github.com/prometheus-operator/prometheus-operator/blob/v0.43.0/Documentation/api.md#servicemonitorspec), and [others](https://cockroachlabs.slack.com/archives/C9C1Z1LLV/p1597905009000500?thread_ts=1597429926.154100&cid=C9C1Z1LLV) have had to comment out the line, so I've removed it. This configuration works in testing when the line is removed. Release note: none 56844: sql: minor EXPLAIN ANALYZE cleanups r=RaduBerinde a=RaduBerinde #### sql: move top-level explain fields to explain package The explain package contains all field names, except the top-level fields (like "distribution", "planning time"). Move them to this package by providing specific methods to set them. Release note: None #### sql: add MakeDeterministic explain flag This change adds a flag that causes editing of non-deterministic values like planning time and even distribution/vectorization. This allows testing of EXPLAIN ANALYZE (PLAN) output with all logic test configs. Release note: None 56855: jobs: reduce contention on jobs table by not updating terminal jobs r=spaskob a=ajwerner Currently our session and claiming logic will interact with terminal jobs. This is bad both in that claiming may take a very long time because it may claim already terminal jobs and because expiring out old sessions may end up updating long ago terminated jobs. Release note (bug fix): Eliminate opportunity for livelock in jobs subsystem due to frequent updates to already finished jobs. Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com> Co-authored-by: taroface <ryankuo@gmail.com> Co-authored-by: Radu Berinde <radu@cockroachlabs.com>

There are bugs in previous versions (cockroachdb#51782, cockroachdb#54861, and surely others) which can leave descriptors in an invalid state. When this occurs, the only recourse is to inject either repaired versions directly or other descriptors that allow use of normal DDL statements to reach the desired state. Prior to this change, the mechanism by which this repair could be achieved was a bit of a hack whereby the "node" users which is used internally to muck with certain system tables could be granted to the "root" user for purposes of updating entries in tables like system.namespace and system.descriptor. Even this was problematic because those tables do not utilize all of the column families when written to using the KV API. That made writing new rows to such tables extremely dangerous as low-level code for determining zone configurations in kv assumes that there will be appropriate values in all kv rows in those tables (which is not the case for certain column families). Fixes cockroachdb#50948. Release note (sql change): Added admin-only, crdb_internal functions to enable descriptor repair in dire circumstances.

arulajmani added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-schema-changes labels Jul 22, 2020

arulajmani self-assigned this Jul 22, 2020

arulajmani mentioned this issue Jul 23, 2020

sql: correctly identify and cleanup dependent objects in drop database #51813

Merged

craig bot closed this as completed in 5063b61 Jul 24, 2020

This was referenced Jul 24, 2020

release-20.1: sql: correctly identify and cleanup dependent objects in drop database #51895

Merged

sql: refactor drop table cascade to reduce potential for bugs going forward #51916

Open

ajwerner mentioned this issue Oct 19, 2020

sql,builtins: add builtin functions to perform descriptor repair #55699

Merged

ajwerner mentioned this issue Nov 19, 2020

release-20.2: sql,builtins: add builtin functions to perform descriptor repair #56937

Merged

ajwerner mentioned this issue Nov 24, 2020

sql/catalog: v20.2.0: parentID does not exist when validating cross references #56915

Closed

jordanlewis mentioned this issue Dec 3, 2020

doctor: tool that fixes missing ParentID issue #57455

Closed

thoszhang mentioned this issue Jan 12, 2021

sql/catalog: parentID 446 does not exist #58507

Closed

yuzefovich mentioned this issue Jan 15, 2021

sentry: structured.go:1453: parentID 21063 does not exist -- *errutil.leafError: parentID 21063 does not exist (1) structured.go:1453: *withstack.withStack (top exception) *assert.withAssertionFailure (check the extra data payloads) #58796

Closed

ajwerner mentioned this issue Feb 23, 2021

schema: drop of column fails because view depends on it when it shouldn't #61007

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: DROP DATABASE CASCADE leaves dependent objects orphaned #51782

sql: DROP DATABASE CASCADE leaves dependent objects orphaned #51782

arulajmani commented Jul 22, 2020

arulajmani commented Jul 22, 2020

sql: DROP DATABASE CASCADE leaves dependent objects orphaned #51782

sql: DROP DATABASE CASCADE leaves dependent objects orphaned #51782

Comments

arulajmani commented Jul 22, 2020

arulajmani commented Jul 22, 2020