Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: foreign key relationships should encode columns, not indexes #37255

Closed
jordanlewis opened this issue May 1, 2019 · 7 comments
Closed

sql: foreign key relationships should encode columns, not indexes #37255

jordanlewis opened this issue May 1, 2019 · 7 comments
Assignees
Labels
A-sql-fks C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior.

Comments

@jordanlewis
Copy link
Member

Currently, foreign key relationships are encoded on IndexDescriptors via the ForeignKey and ReferencedBy fields. This encoding has several unfortunate side-effects:

  1. foreign keys are "tied" to indexes: instead of directly stating the columns of the referenced or referencing tables that make up the relationship, they specify a particular index. This currently limits checks to occur on that index, even though there might be a better one to look at depending on locality or other optimizer-driven choices, once the optimizer is involved in planning FK checks.
  2. a table that references another table must have an index on the columns that are referencing, to permit checked deletes from the referenced table. For example, a customers -> zipcodes foreign key must have an index on the customers.zipcode column, even though the zipcodes table might never be deleted from in reality - this is unfortunate because each index has an overhead on insert.
  3. an index can't have more than one outgoing foreign key

I'm starting this issue to centralize any ongoing discussions. I'm assigning it to the Schema changes project, but it's possible that another team will ultimately do the work.

To avoid this situation, we should move the knowledge about foreign keys out of IndexDescriptor and into TableDescriptor. Further, the ForeignKeyReference protobuf should be changed to encode columns instead of indexes. Here's a proposed diff:

diff --git a/pkg/sql/sqlbase/structured.proto b/pkg/sql/sqlbase/structured.proto
index 28e917fbc7..6e53b71584 100644
--- a/pkg/sql/sqlbase/structured.proto
+++ b/pkg/sql/sqlbase/structured.proto
@@ -50,18 +50,25 @@ message ForeignKeyReference {
     PARTIAL = 2; // Note: not actually supported, but we reserve the value for future use.
   }

-  optional uint32 table = 1 [(gogoproto.nullable) = false, (gogoproto.casttype) = "ID"];
-  optional uint32 index = 2 [(gogoproto.nullable) = false, (gogoproto.casttype) = "IndexID"];
+  optional uint32 referenced_table = 1 [(gogoproto.nullable) = false, (gogoproto.casttype) = "ID"];
+  optional uint32 index = 2 [(gogoproto.nullable) = false, (gogoproto.casttype) = "IndexID", deprecated=true];
   optional string name = 3 [(gogoproto.nullable) = false];
   optional ConstraintValidity validity = 4 [(gogoproto.nullable) = false];
   // If this FK only uses a prefix of the columns in its index, we record how
   // many to avoid spuriously counting the additional cols as used by this FK.
-  optional int32 shared_prefix_len = 5 [(gogoproto.nullable) = false];
+  optional int32 shared_prefix_len = 5 [(gogoproto.nullable) = false, deprecated=true];
   optional Action on_delete = 6 [(gogoproto.nullable) = false];
   optional Action on_update = 7 [(gogoproto.nullable) = false];
   // This is only important for composite keys. For all prior matches before
   // the addition of this value, MATCH SIMPLE will be used.
   optional Match match = 8 [(gogoproto.nullable) = false];
+  // self_column_ids contains the column ids in this table that make up the
+  // reference.
+  repeated uint32 self_column_ids = 9;
+  // other_column_ids contains the column ids in referenced_table that
+  // correspond to the self_column_ids in this table. This field must be the
+  // same length as self_column_ids.
+  repeated uint32 other_column_ids = 10;
 }

 message ColumnDescriptor {
@@ -306,8 +313,8 @@ message IndexDescriptor {
   repeated uint32 composite_column_ids = 13
       [(gogoproto.customname) = "CompositeColumnIDs", (gogoproto.casttype) = "ColumnID"];

-  optional ForeignKeyReference foreign_key = 9 [(gogoproto.nullable) = false];
-  repeated ForeignKeyReference referenced_by = 10 [(gogoproto.nullable) = false];
+  optional ForeignKeyReference foreign_key = 9 [(gogoproto.nullable) = false, deprecated=true];
+  repeated ForeignKeyReference referenced_by = 10 [(gogoproto.nullable) = false, deprecated=true];

   // Interleave, if it's not the zero value, describes how this index's data is
   // interleaved into another index's data.
@@ -729,6 +736,23 @@ message TableDescriptor {
   // index case. Also use for dropped interleaved indexes and columns.
   repeated GCDescriptorMutation gc_mutations = 33 [(gogoproto.nullable) = false,
                                                   (gogoproto.customname) = "GCMutations"];
+
+  // out_fks contains the foreign key references from this table to other
+  // tables, which is needed so inserts into this table can trigger checks on
+  // the referenced tables to ensure that the new data has corresponding rows.
+  repeated ForeignKeyReference out_fks = 34;
+  // in_fks contains the foreign key references from other tables to this
+  // table, which is needed so deletions or updates to this table can trigger
+  // checks on the referencing tables to ensure that no foreign key references
+  // are broken by the changes. These are the "backward references".
+  //
+  // As an example of how out_fks and in_fks work, consider two tables:
+  // customers (id int, zipcode int) and zipcodes (zipcode int), with a foreign
+  // key from customers.zipcode to zipcodes.zipcode. The out_fks field on
+  // customers would contain a ForeignKeyReference with self_column_ids = 1 and
+  // other_column_ids = 0, and the in_fks field on zipcodes would contain a
+  // ForeignKeyReference with self_column_ids = 0 and other_column_ids = 1.
+  repeated ForeignKeyReference in_fks = 35;
 }

 // DatabaseDescriptor represents a namespace (aka database) and is stored

Note that the old fields are deprecated and not deleted, because the code for the release that includes this upgrades will have to migrate descriptors from the old format to the new format, while preserving the old format so that mixed-version clusters can still work properly. The release after can delete the code that understands the old format, as well as change the deprecated fields to reserved (empty) ones.

@jordanlewis jordanlewis added C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. A-sql-fks labels May 1, 2019
@jordanlewis jordanlewis added this to Triage in Disaster Recovery Backlog via automation May 1, 2019
@jordanlewis
Copy link
Member Author

Whoops, I forgot that I already filed a very similar issue to this one: #36859. Nevertheless, I'm going to close that one and replace with this one, as this one has the benefit of a couple more weeks of thinking, @lucy-zhang's input, and a concrete proposal.

@thoszhang
Copy link
Contributor

I wonder if it would be better to create a new protobuf entirely for the foreign key references, to avoid ambiguity during migration.

Relatedly, I think the fact that the ForeignKeyReference protobuf is also used for the back-references is troublesome. References and back-references are asymmetric and are never treated equivalently, and none of the fields are actually relevant for the backreference except the referencing table and columns (though I suppose that could change in the future). At least, I'd prefer the column id slices to be source and target (or referencing and referenced, or something), instead of self and other, to avoid the swap.

So, I think having two new protobufs for reference and back-reference would be ideal, though I can understand reusing the existing one for at least the (forward) reference.

@jordanlewis
Copy link
Member Author

Great points, I agree with everything you said.

@thoszhang
Copy link
Contributor

Another question is what to do when dropping indexes on the referenced columns, if the backreferences aren't tied to specific indexes anymore, and there could be more than one usable index. Presumably, we'd need to enforce the existence of at least one unique index on (some subset of?) the referenced columns. (And if so, would we allow deleting the FK constraint automatically if there's only one such index and CASCADE is set?)

Postgres just associates a specific unique index to the foreign key, which can't be dropped without CASCADE:

lucy=# create table t (a int, b int);
CREATE TABLE
lucy=# create unique index i_b_a on t (b, a);
CREATE INDEX
lucy=# create unique index i_a_b on t (a, b);
CREATE INDEX
lucy=# create table u (a int, b int);
CREATE TABLE
lucy=# alter table u add foreign key (a, b) references t (a, b);
ALTER TABLE
lucy=# drop index i_b_a;
2019-05-01 22:25:15.269 EDT [58846] ERROR:  cannot drop index i_b_a because other objects depend on it
2019-05-01 22:25:15.269 EDT [58846] DETAIL:  constraint u_a_fkey on table u depends on index i_b_a
2019-05-01 22:25:15.269 EDT [58846] HINT:  Use DROP ... CASCADE to drop the dependent objects too.
2019-05-01 22:25:15.269 EDT [58846] STATEMENT:  drop index i_b_a;
ERROR:  cannot drop index i_b_a because other objects depend on it
DETAIL:  constraint u_a_fkey on table u depends on index i_b_a
HINT:  Use DROP ... CASCADE to drop the dependent objects too.
lucy=# drop index i_a_b;
DROP INDEX
lucy=# drop index i_b_a cascade;
NOTICE:  drop cascades to constraint u_a_fkey on table u
DROP INDEX

@dt dt moved this from Triage to Backlog in Disaster Recovery Backlog May 21, 2019
@dt dt moved this from Backlog to Backlog: FK Refactor in Disaster Recovery Backlog May 21, 2019
@jordanlewis jordanlewis self-assigned this Jun 4, 2019
@dt dt moved this from Backlog: FK Refactor to Current Milestone (July 22) in Disaster Recovery Backlog Jun 27, 2019
@ajwerner
Copy link
Contributor

Just of curiosity, when this is done will we allow foreign key relationships between columns of different tables which are unique but don't have the exact index to prove it? For example:

CREATE TABLE foo (a STRING, b STRING, PRIMARY KEY (a, b));
CREATE TABLE bar (b STRING, a STRING, PRIMARY KEY (b, a));
ALTER TABLE foo ADD FOREIGN KEY (a, b) REFERENCES bar (a, b);
pq: there is no unique constraint matching given keys for referenced table bar

Is there extra work required to deduce that this foreign key could indeed be easily checked and if so should that be tracked elsewhere?

@thoszhang
Copy link
Contributor

Yeah, what you're describing (allowing unique indexes regardless of order) is part of the planned work, I think.

@jordanlewis
Copy link
Member Author

This is done now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-fks C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior.
Projects
None yet
Development

No branches or pull requests

3 participants