Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: DROP DATABASE CASCADE leaves dependent objects orphaned #51782

Closed
arulajmani opened this issue Jul 22, 2020 · 1 comment · Fixed by #51813
Closed

sql: DROP DATABASE CASCADE leaves dependent objects orphaned #51782

arulajmani opened this issue Jul 22, 2020 · 1 comment · Fixed by #51813
Assignees
Labels
A-schema-changes C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.

Comments

@arulajmani
Copy link
Collaborator

When we drop a database, we queue up a single job to perform schema changes for all objects in the database instead of having individual objects queue up separate jobs. This list is constructed from a filtered list of objects in the database -- objects that depend on other objects (and therefore would be handled by a cascade deletion) are not present in the list. This results in orphaned namespace/descriptor table entries for objects that are dependent on other objects with the database.

To Reproduce

CREATE TABLE db.foo(a int);
CREATE VIEW db.v as SELECT a FROM db.foo; 
DROP DATABASE db CASCADE; 
SELECT * FROM system.namespace; 

There's orphaned entries for v which shouldn't be there.

@arulajmani arulajmani added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-schema-changes labels Jul 22, 2020
@arulajmani
Copy link
Collaborator Author

Whatever solution we construct for dependent objects, this problem also arises with owned sequences as well in case the sequence-table ownership is cross-database. We should find such dependencies and have them added to DroppedTableDetails list passed to the createDropDatabaseJob as well.

@arulajmani arulajmani self-assigned this Jul 22, 2020
arulajmani added a commit to arulajmani/cockroach that referenced this issue Jul 23, 2020
There were a few things going on here because of the way we filtered
tables during `DROP DATABASE CASCADE`. The intention was to filter
out objects that depended on other objects also being deleted (and would
therefore be deleted by the CASCADE nature of object drops). This
assumption is correct for table-view and view-view dependencies -- but
not for sequences. We also switched from individual schema change jobs
during a database drop to a single job which doesn't play well with
this filtering -- as no new jobs are queued, all objects that were
filtered would leave orphaned namespace/descriptor entries behind.
It's interesting to note that the filtering didn't directly cause this
issue, it just made the underlying issue more visible -- the single drop
database schema change job relies on knowing about all object
descriptors that need to be dropped upfront. Orphaned entries which
would have only occured for cross-database dependencies can occur in
the same database because of the filtering. This leads to why the fix
is what it is.

As part of this patch, I've tried to make the preperation steps for
dropping databases more explicit. First, we accumalate all objects
that need to be deleted. This includes objects that will be implicitly
deleted, such as owned sequences, dependent views (both in the database
being deleted and other databases that aren't being deleted). The
schema change job uses this new list to ensure entries are appropriately
cleaned up. We still perform the filter step as before to identify
objects which are the "root" of a drop and only call drop on these
objects. The only change here is that instead of accumulating dependent
objects, we explicitly accumulate cascading views.

Fixes cockroachdb#51782
Fixes cockroachdb#50997

Release note (bug fix): Before this change, we would leave orphaned
system.namespace/system.descriptor entries if we ran a
`DROP DATABASE CASCADE` and the database contained "dependency"
relations. For example, if the database included a view which
depended on a table in the database, dropping the database would result
in an orphaned entry for the view. Same thing for a sequence that was
used by a table in the database. (See cockroachdb#51782 for repro steps). This bug
is now fixed, and cleanup happens as expected.
arulajmani added a commit to arulajmani/cockroach that referenced this issue Jul 24, 2020
There were a few things going on here because of the way we filtered
tables during `DROP DATABASE CASCADE`. The intention was to filter
out objects that depended on other objects also being deleted (and would
therefore be deleted by the CASCADE nature of object drops). This
assumption is correct for table-view and view-view dependencies -- but
not for sequences. We also switched from individual schema change jobs
during a database drop to a single job which doesn't play well with
this filtering -- as no new jobs are queued, all objects that were
filtered would leave orphaned namespace/descriptor entries behind.
It's interesting to note that the filtering didn't directly cause this
issue, it just made the underlying issue more visible -- the single drop
database schema change job relies on knowing about all object
descriptors that need to be dropped upfront. Orphaned entries which
would have only occured for cross-database dependencies can occur in
the same database because of the filtering. This leads to why the fix
is what it is.

As part of this patch, I've tried to make the preperation steps for
dropping databases more explicit. First, we accumalate all objects
that need to be deleted. This includes objects that will be implicitly
deleted, such as owned sequences, dependent views (both in the database
being deleted and other databases that aren't being deleted). The
schema change job uses this new list to ensure entries are appropriately
cleaned up. We still perform the filter step as before to identify
objects which are the "root" of a drop and only call drop on these
objects. The only change here is that instead of accumulating dependent
objects, we explicitly accumulate cascading views.

Fixes cockroachdb#51782
Fixes cockroachdb#50997

Release note (bug fix): Before this change, we would leave orphaned
system.namespace/system.descriptor entries if we ran a
`DROP DATABASE CASCADE` and the database contained "dependency"
relations. For example, if the database included a view which
depended on a table in the database, dropping the database would result
in an orphaned entry for the view. Same thing for a sequence that was
used by a table in the database. (See cockroachdb#51782 for repro steps). This bug
is now fixed, and cleanup happens as expected.
craig bot pushed a commit that referenced this issue Jul 24, 2020
51813: sql: correctly identify and cleanup dependent objects in drop database r=ajwerner a=arulajmani

There were a few things going on here because of the way we filtered
tables during `DROP DATABASE CASCADE`. The intention was to filter
out objects that depended on other objects also being deleted (and would
therefore be deleted by the CASCADE nature of object drops). This
assumption is correct for table-view and view-view dependencies -- but
not for sequences. We also switched from individual schema change jobs
during a database drop to a single job which doesn't play well with
this filtering -- as no new jobs are queued, all objects that were
filtered would leave orphaned namespace/descriptor entries behind.
It's interesting to note that the filtering didn't directly cause this
issue, it just made the underlying issue more visible -- the single drop
database schema change job relies on knowing about all object
descriptors that need to be dropped upfront. Orphaned entries which
would have only occured for cross-database dependencies can occur in
the same database because of the filtering. This leads to why the fix
is what it is.

As part of this patch, I've tried to make the preperation steps for
dropping databases more explicit. First, we accumalate all objects
that need to be deleted. This includes objects that will be implicitly
deleted, such as owned sequences, dependent views (both in the database
being deleted and other databases that aren't being deleted). The
schema change job uses this new list to ensure entries are appropriately
cleaned up. We still perform the filter step as before to identify
objects which are the "root" of a drop and only call drop on these
objects. The only change here is that instead of accumulating dependent
objects, we explicitly accumulate cascading views.

Fixes #51782
Fixes #50997

Release note (bug fix): Before this change, we would leave orphaned
system.namespace/system.descriptor entries if we ran a
`DROP DATABASE CASCADE` and the database contained "dependency"
relations. For example, if the database included a view which
depended on a table in the database, dropping the database would result
in an orphaned entry for the view. Same thing for a sequence that was
used by a table in the database. (See #51782 for repro steps). This bug
is now fixed, and cleanup happens as expected.

Co-authored-by: arulajmani <arulajmani@gmail.com>
@craig craig bot closed this as completed in 5063b61 Jul 24, 2020
arulajmani added a commit to arulajmani/cockroach that referenced this issue Jul 24, 2020
There were a few things going on here because of the way we filtered
tables during `DROP DATABASE CASCADE`. The intention was to filter
out objects that depended on other objects also being deleted (and would
therefore be deleted by the CASCADE nature of object drops). This
assumption is correct for table-view and view-view dependencies -- but
not for sequences. We also switched from individual schema change jobs
during a database drop to a single job which doesn't play well with
this filtering -- as no new jobs are queued, all objects that were
filtered would leave orphaned namespace/descriptor entries behind.
It's interesting to note that the filtering didn't directly cause this
issue, it just made the underlying issue more visible -- the single drop
database schema change job relies on knowing about all object
descriptors that need to be dropped upfront. Orphaned entries which
would have only occured for cross-database dependencies can occur in
the same database because of the filtering. This leads to why the fix
is what it is.

As part of this patch, I've tried to make the preperation steps for
dropping databases more explicit. First, we accumalate all objects
that need to be deleted. This includes objects that will be implicitly
deleted, such as owned sequences, dependent views (both in the database
being deleted and other databases that aren't being deleted). The
schema change job uses this new list to ensure entries are appropriately
cleaned up. We still perform the filter step as before to identify
objects which are the "root" of a drop and only call drop on these
objects. The only change here is that instead of accumulating dependent
objects, we explicitly accumulate cascading views.

Fixes cockroachdb#51782
Fixes cockroachdb#50997

Release note (bug fix): Before this change, we would leave orphaned
system.namespace/system.descriptor entries if we ran a
`DROP DATABASE CASCADE` and the database contained "dependency"
relations. For example, if the database included a view which
depended on a table in the database, dropping the database would result
in an orphaned entry for the view. Same thing for a sequence that was
used by a table in the database. (See cockroachdb#51782 for repro steps). This bug
is now fixed, and cleanup happens as expected.
ajwerner added a commit to ajwerner/cockroach that referenced this issue Oct 20, 2020
There are bugs in previous versions (cockroachdb#51782, cockroachdb#54861, and surely others) which
can leave descriptors in an invalid state. When this occurs, the only recourse
is to inject either repaired versions directly or other descriptors that allow
use of normal DDL statements to reach the desired state. Prior to this change,
the mechanism by which this repair could be achieved was a bit of a hack
whereby the "node" users which is used internally to muck with certain system
tables could be granted to the "root" user for purposes of updating entries
in tables like system.namespace and system.descriptor. Even this was
problematic because those tables do not utilize all of the column families
when written to using the KV API. That made writing new rows to such tables
extremely dangerous as low-level code for determining zone configurations in
kv assumes that there will be appropriate values in all kv rows in those
tables (which is not the case for certain column families).

Fixes cockroachdb#50948.

Release note (sql change): Added admin-only, crdb_internal functions to enable
descriptor repair in dire circumstances.
ajwerner added a commit to ajwerner/cockroach that referenced this issue Nov 18, 2020
There are bugs in previous versions (cockroachdb#51782, cockroachdb#54861, and surely others) which
can leave descriptors in an invalid state. When this occurs, the only recourse
is to inject either repaired versions directly or other descriptors that allow
use of normal DDL statements to reach the desired state. Prior to this change,
the mechanism by which this repair could be achieved was a bit of a hack
whereby the "node" users which is used internally to muck with certain system
tables could be granted to the "root" user for purposes of updating entries
in tables like system.namespace and system.descriptor. Even this was
problematic because those tables do not utilize all of the column families
when written to using the KV API. That made writing new rows to such tables
extremely dangerous as low-level code for determining zone configurations in
kv assumes that there will be appropriate values in all kv rows in those
tables (which is not the case for certain column families).

Fixes cockroachdb#50948.

Release note (sql change): Added admin-only, crdb_internal functions to enable
descriptor repair in dire circumstances.
craig bot pushed a commit that referenced this issue Nov 18, 2020
55699: sql,builtins: add builtin functions to perform descriptor repair r=lucy-zhang a=ajwerner

There are bugs in previous versions (#51782, #54861, and surely others) which
can leave descriptors in an invalid state. When this occurs, the only recourse
is to inject either repaired versions directly or other descriptors that allow
use of normal DDL statements to reach the desired state. Prior to this change,
the mechanism by which this repair could be achieved was a bit of a hack
whereby the "node" users which is used internally to muck with certain system
tables could be granted to the "root" user for purposes of updating entries
in tables like system.namespace and system.descriptor. Even this was
problematic because those tables do not utilize all of the column families
when written to using the KV API. That made writing new rows to such tables
extremely dangerous as low-level code for determining zone configurations in
kv assumes that there will be appropriate values in all kv rows in those
tables (which is not the case for certain column families).

Fixes #50948.

Release note (sql change): Added admin-only, crdb_internal functions to enable
descriptor repair in dire circumstances.

56025: monitoring: remove faulty prometheus.yaml field r=taroface a=taroface

The field `serviceAccountName: prometheus` in the `ServiceMonitor` spec was causing an error:

```
ValidationError(ServiceMonitor.spec): unknown field "serviceAccountName" in com.coreos.monitoring.v1.ServiceMonitor.spec; if you choose to ignore these errors, turn validation off with --validate=false
```

This field doesn't seem to be in the [ServiceMonitorSpec](https://github.com/prometheus-operator/prometheus-operator/blob/v0.43.0/Documentation/api.md#servicemonitorspec), and [others](https://cockroachlabs.slack.com/archives/C9C1Z1LLV/p1597905009000500?thread_ts=1597429926.154100&cid=C9C1Z1LLV) have had to comment out the line, so I've removed it. This configuration works in testing when the line is removed.

Release note: none

56844: sql: minor EXPLAIN ANALYZE cleanups r=RaduBerinde a=RaduBerinde

#### sql: move top-level explain fields to explain package

The explain package contains all field names, except the top-level
fields (like "distribution", "planning time"). Move them to this
package by providing specific methods to set them.

Release note: None

#### sql: add MakeDeterministic explain flag

This change adds a flag that causes editing of non-deterministic
values like planning time and even distribution/vectorization. This
allows testing of EXPLAIN ANALYZE (PLAN) output with all logic test
configs.

Release note: None

56855: jobs: reduce contention on jobs table by not updating terminal jobs r=spaskob a=ajwerner

Currently our session and claiming logic will interact with terminal jobs.
This is bad both in that claiming may take a very long time because it may
claim already terminal jobs and because expiring out old sessions may end
up updating long ago terminated jobs.

Release note (bug fix): Eliminate opportunity for livelock in jobs subsystem
due to frequent updates to already finished jobs.

Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
Co-authored-by: taroface <ryankuo@gmail.com>
Co-authored-by: Radu Berinde <radu@cockroachlabs.com>
ajwerner added a commit to ajwerner/cockroach that referenced this issue Nov 20, 2020
There are bugs in previous versions (cockroachdb#51782, cockroachdb#54861, and surely others) which
can leave descriptors in an invalid state. When this occurs, the only recourse
is to inject either repaired versions directly or other descriptors that allow
use of normal DDL statements to reach the desired state. Prior to this change,
the mechanism by which this repair could be achieved was a bit of a hack
whereby the "node" users which is used internally to muck with certain system
tables could be granted to the "root" user for purposes of updating entries
in tables like system.namespace and system.descriptor. Even this was
problematic because those tables do not utilize all of the column families
when written to using the KV API. That made writing new rows to such tables
extremely dangerous as low-level code for determining zone configurations in
kv assumes that there will be appropriate values in all kv rows in those
tables (which is not the case for certain column families).

Fixes cockroachdb#50948.

Release note (sql change): Added admin-only, crdb_internal functions to enable
descriptor repair in dire circumstances.
ajwerner added a commit to ajwerner/cockroach that referenced this issue Nov 20, 2020
There are bugs in previous versions (cockroachdb#51782, cockroachdb#54861, and surely others) which
can leave descriptors in an invalid state. When this occurs, the only recourse
is to inject either repaired versions directly or other descriptors that allow
use of normal DDL statements to reach the desired state. Prior to this change,
the mechanism by which this repair could be achieved was a bit of a hack
whereby the "node" users which is used internally to muck with certain system
tables could be granted to the "root" user for purposes of updating entries
in tables like system.namespace and system.descriptor. Even this was
problematic because those tables do not utilize all of the column families
when written to using the KV API. That made writing new rows to such tables
extremely dangerous as low-level code for determining zone configurations in
kv assumes that there will be appropriate values in all kv rows in those
tables (which is not the case for certain column families).

Fixes cockroachdb#50948.

Release note (sql change): Added admin-only, crdb_internal functions to enable
descriptor repair in dire circumstances.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-schema-changes C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant