New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
schemachanger: prettier EXPLAIN (DDL) output #103930
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this; this is super valuable! I don't have any opposition of the technical solution. I have some questions on the wordings of the explain output and a few minor questions/comments.
- Why do you want to call this new flag "SHAPE"? I didn't immediately see the meaning.
- Do we want to make "explain_shape" directory work similarly to "explain" and "explain_verbose"? That is, for every tested statement in an end-to-end test, we have one file for it under the "explain_shape" directory. Furthermore, when we run the TestEndToEnd_xxx test with a "rewrite" flag, the corresponding file under "explain_shape" gets rewritten (just as the file under "explain" and "explain_verbose")
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @postamar)
pkg/sql/schemachanger/scbuild/build.go
line 324 at r1 (raw file):
scpb.ForEachNamespace(b, func(_ scpb.Status, ts scpb.TargetStatus, e *scpb.Namespace) { dnm, isNew := getOrCreate(e.DescriptorID) if isNew || ts == scpb.InvalidTarget || ts == scpb.ToPublic {
why do we want the ts==scpb.InvalidTarget
filter?
pkg/sql/schemachanger/scpb/name_mappings.go
line 132 at r1 (raw file):
return name } return NameMappings(ts.NameMappings).ConstraintName(relationID, constraintID)
Did I understand this correctly: the purpose of the last return clause is for cases where certain name-related elements are needed but not involved in the plan (ts.Target); For example, altering a table won't touch the parent database but we probably want to know its parent database name, which will be stored in the builder state as an element with InvalidTargetStatus by the decomp logic.
pkg/sql/schemachanger/scplan/plan_explain.go
line 205 at r1 (raw file):
var en treeprinter.Node var estimatedMemAlloc int accountFor := func(label string) string {
I saw this accountFor
helper is defined multiple times. Will it be better to make it a function func accountFor(label string, memAcc *int) string
and define it once?
pkg/sql/schemachanger/scplan/plan_explain.go
line 377 at r1 (raw file):
// ExplainShape returns a human-readable plan rendering for // EXPLAIN (DDL, SHAPE) statements.
Can you add a bit more comments here about the main purpose of this new EXPLAIN flag? something like "it gives a condensed summary of the operations involved in the schema change with highlights on expensive operations (backfill, validation, etc.), which helps users understand where most of the time will be spent if the schema change is to be executed."
pkg/sql/schemachanger/scplan/plan_explain.go
line 446 at r1 (raw file):
func (p Plan) explainBackfillsAndMerges(root treeprinter.Node, ops []scop.Op) error { gbs, gms := groupBackfillsAndMerges(ops)
Are we respecting the real ordering of the backfill and merges if the plan involves >1 backfills and merges?
IIUC, backfill and merge always appears in pairs. If we have a plan where the real ordering of the operations are
- backfill_1
- merge_1
- backfill_2
- merge_2
Will the logic print out operations in that order? Or, will it be all backfills first and then all merges (i.e. backfill_1 followed by backfill_2 followed by merge_1 followed by merge_2)?
pkg/sql/schemachanger/scplan/plan_explain.go
line 596 at r1 (raw file):
sort.Slice(gbs, func(i, j int) bool { if gbs[i].relationID == gbs[j].relationID { return gbs[i].srcIndexID < gbs[j].srcIndexID
I have a gut feeling that this might not work properly, because it's possible that we have three backfill/merge groups:
index_a <-- index_b <-- index_c <-- index_d
where we backfill index_b
from index_a
, backfill index_c
from index_b
, and so on.
It's possible, depending on the implementation, that index_b.GetID()
is bigger than index_c.GetID()
(do you remember in my support-add-drop-column-alterPK PR we inflate this chain, de-duplicate, and re-inflate again for the next ALTER TABLE stmt, and so on. I think this makes what I described here possible)
In my PR, I need to do similar things to order the primary indexes and I had logic there to sort by the location of srcIndex (i.e. if 'B.srcIdx = A', then A comes before B in the ordering)
pkg/ccl/schemachangerccl/testdata/explain/create_index
line 14 at r1 (raw file):
│ └── Stage 1 of 1 in StatementPhase │ ├── 7 elements transitioning toward PUBLIC │ │ ├── ABSENT → BACKFILL_ONLY SecondaryIndex:{DescID 104 "t1", IndexID 2 "+id1", TemporaryIndexID 3, SourceIndexID 1 "t1_pkey"}
what is this "+id1" name for IndexID 2? Saw a bunch of them
pkg/ccl/schemachangerccl/testdata/explain_shape/create_index
line 13 at r1 (raw file):
├── execute 2 system table mutations transactions ├── backfill using primary index t1_pkey in relation t1 │ ⇒ +id1: id↗, name↗, money
How to understand/interpret this line? Can we pretty print this step with something like backfill primary index (4) "idx4(i, j, k)" from primary index (2) "idx2(i,j)" in relation "t1"
, or, if we want to use symbols/shapes, maybe something like "idx2(i,j) <-- idx4(i,j,k)`.
pkg/ccl/schemachangerccl/testdata/explain_shape/create_index
line 16 at r1 (raw file):
├── execute 2 system table mutations transactions ├── merge temporary indexes into backfilled indexes in relation t1 │ ~t1@[3] ⇒ +id1
Similarly, can we format merging step with something easier to read/understand?
30cac99
to
a67417d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks very much for having taken a look! This helps.
- This flag already exists for DML plans and is used to synthetise them somewhat.
- Indeed, yes, already done.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @Xiang-Gu)
pkg/sql/schemachanger/scbuild/build.go
line 324 at r1 (raw file):
Previously, Xiang-Gu (Xiang Gu) wrote…
why do we want the
ts==scpb.InvalidTarget
filter?
The builder state includes elements which don't have targets set, that will be their target value. I rewrote this clause into isNotDropping
which is closer to what I really want to express here.
pkg/sql/schemachanger/scpb/name_mappings.go
line 132 at r1 (raw file):
Previously, Xiang-Gu (Xiang Gu) wrote…
Did I understand this correctly: the purpose of the last return clause is for cases where certain name-related elements are needed but not involved in the plan (ts.Target); For example, altering a table won't touch the parent database but we probably want to know its parent database name, which will be stored in the builder state as an element with InvalidTargetStatus by the decomp logic.
Yes basically. I added more commentary to clarify.
pkg/sql/schemachanger/scplan/plan_explain.go
line 205 at r1 (raw file):
Previously, Xiang-Gu (Xiang Gu) wrote…
I saw this
accountFor
helper is defined multiple times. Will it be better to make it a functionfunc accountFor(label string, memAcc *int) string
and define it once?
I don't feel strongly about this but I don't think so, to me it's basically the same as this, and I have a slight preference for capturing the variable to update in a closure rather than dealing with pointers.
pkg/sql/schemachanger/scplan/plan_explain.go
line 377 at r1 (raw file):
Previously, Xiang-Gu (Xiang Gu) wrote…
Can you add a bit more comments here about the main purpose of this new EXPLAIN flag? something like "it gives a condensed summary of the operations involved in the schema change with highlights on expensive operations (backfill, validation, etc.), which helps users understand where most of the time will be spent if the schema change is to be executed."
Done! Thanks.
pkg/sql/schemachanger/scplan/plan_explain.go
line 446 at r1 (raw file):
Previously, Xiang-Gu (Xiang Gu) wrote…
Are we respecting the real ordering of the backfill and merges if the plan involves >1 backfills and merges?
IIUC, backfill and merge always appears in pairs. If we have a plan where the real ordering of the operations are
- backfill_1
- merge_1
- backfill_2
- merge_2
Will the logic print out operations in that order? Or, will it be all backfills first and then all merges (i.e. backfill_1 followed by backfill_2 followed by merge_1 followed by merge_2)?
There is no ordering during execution, these operations are performed concurrently. The ordering here is only to ensure that the explain output remains stable.
pkg/sql/schemachanger/scplan/plan_explain.go
line 596 at r1 (raw file):
Previously, Xiang-Gu (Xiang Gu) wrote…
I have a gut feeling that this might not work properly, because it's possible that we have three backfill/merge groups:
index_a <-- index_b <-- index_c <-- index_d
where we backfillindex_b
fromindex_a
, backfillindex_c
fromindex_b
, and so on.It's possible, depending on the implementation, that
index_b.GetID()
is bigger thanindex_c.GetID()
(do you remember in my support-add-drop-column-alterPK PR we inflate this chain, de-duplicate, and re-inflate again for the next ALTER TABLE stmt, and so on. I think this makes what I described here possible)In my PR, I need to do similar things to order the primary indexes and I had logic there to sort by the location of srcIndex (i.e. if 'B.srcIdx = A', then A comes before B in the ordering)
I don't understand any of this, sorry. I'm assuming this relates to the previous comment.
pkg/ccl/schemachangerccl/testdata/explain/create_index
line 14 at r1 (raw file):
Previously, Xiang-Gu (Xiang Gu) wrote…
what is this "+id1" name for IndexID 2? Saw a bunch of them
I put a plus before elements that are added in the schema change. I changed this to a suffix. These little symbols make the most sense when looking at the EXPLAIN outputs of PK changes, because the new primary index is named the same as the old one (and the intermediate one too for that matter). We might have to revisit this syntax with your PR. I'd be fine with that, I don't expect this to be definitive.
pkg/ccl/schemachangerccl/testdata/explain_shape/create_index
line 13 at r1 (raw file):
Previously, Xiang-Gu (Xiang Gu) wrote…
How to understand/interpret this line? Can we pretty print this step with something like
backfill primary index (4) "idx4(i, j, k)" from primary index (2) "idx2(i,j)" in relation "t1"
, or, if we want to use symbols/shapes, maybe something like "idx2(i,j) <-- idx4(i,j,k)`.
You're right it's incomprehensible. Fixed.
pkg/ccl/schemachangerccl/testdata/explain_shape/create_index
line 16 at r1 (raw file):
Previously, Xiang-Gu (Xiang Gu) wrote…
Similarly, can we format merging step with something easier to read/understand?
Fixed
a67417d
to
d6b4b64
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explaining and addressing my feedback. LGTM!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @postamar)
pkg/sql/schemachanger/scplan/plan_explain.go
line 446 at r1 (raw file):
Previously, postamar (Marius Posta) wrote…
There is no ordering during execution, these operations are performed concurrently. The ordering here is only to ensure that the explain output remains stable.
I see now. Input of this function explainBackfillAndMerges
are all operations within the same stage!
pkg/sql/schemachanger/scplan/plan_explain.go
line 596 at r1 (raw file):
Previously, postamar (Marius Posta) wrote…
I don't understand any of this, sorry. I'm assuming this relates to the previous comment.
Correct, I was confused by something else. The sorting here simply ensures a stable output ordering.
This commit adds ID -> name mappings to the scpb.TargetState which lies at the heart of the declarative schema changer, these mappings are then used (when present) to decorate the output of EXPLAIN (DDL) in an effort to make it more understandable. This commit also adds support for the SHAPE flag in conjunction with EXPLAIN (DDL). The generated output informs on expensive operations such as backfills or validations. Fixes cockroachdb#104043. Release note (sql change): EXPLAIN (DDL) statements now have descriptor, index, column, constraint and other ID values decorated with names when available. There is now also a new EXPLAIN (DDL, SHAPE) statement which provides information on costly operations planned by the declarative schema changer, like which index backfills and validations will get performed.
d6b4b64
to
6024f4b
Compare
Thanks for the review! Rebasing to get rid of flaky test. |
bors r+ |
Build succeeded: |
This commit adds ID -> name mappings to the scpb.TargetState which lies at the heart of the declarative schema changer, these mappings are then used (when present) to decorate the output of EXPLAIN (DDL) in an effort to make it more understandable.
This commit also adds support for the SHAPE flag in conjunction with EXPLAIN (DDL). The generated output informs on expensive operations such as backfills or validations.
Fixes #104043.
Release note (sql change): EXPLAIN (DDL) statements now have descriptor, index, column, constraint and other ID values decorated with names when available. There is now also a new EXPLAIN (DDL, SHAPE) statement which provides information on costly operations planned by the declarative schema changer, like which index backfills and validations will get performed.