Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

schemachanger: prettier EXPLAIN (DDL) output #103930

Merged
merged 1 commit into from May 30, 2023

Conversation

postamar
Copy link
Contributor

@postamar postamar commented May 25, 2023

This commit adds ID -> name mappings to the scpb.TargetState which lies at the heart of the declarative schema changer, these mappings are then used (when present) to decorate the output of EXPLAIN (DDL) in an effort to make it more understandable.

This commit also adds support for the SHAPE flag in conjunction with EXPLAIN (DDL). The generated output informs on expensive operations such as backfills or validations.

Fixes #104043.

Release note (sql change): EXPLAIN (DDL) statements now have descriptor, index, column, constraint and other ID values decorated with names when available. There is now also a new EXPLAIN (DDL, SHAPE) statement which provides information on costly operations planned by the declarative schema changer, like which index backfills and validations will get performed.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@Xiang-Gu Xiang-Gu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this; this is super valuable! I don't have any opposition of the technical solution. I have some questions on the wordings of the explain output and a few minor questions/comments.

  1. Why do you want to call this new flag "SHAPE"? I didn't immediately see the meaning.
  2. Do we want to make "explain_shape" directory work similarly to "explain" and "explain_verbose"? That is, for every tested statement in an end-to-end test, we have one file for it under the "explain_shape" directory. Furthermore, when we run the TestEndToEnd_xxx test with a "rewrite" flag, the corresponding file under "explain_shape" gets rewritten (just as the file under "explain" and "explain_verbose")

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @postamar)


pkg/sql/schemachanger/scbuild/build.go line 324 at r1 (raw file):

	scpb.ForEachNamespace(b, func(_ scpb.Status, ts scpb.TargetStatus, e *scpb.Namespace) {
		dnm, isNew := getOrCreate(e.DescriptorID)
		if isNew || ts == scpb.InvalidTarget || ts == scpb.ToPublic {

why do we want the ts==scpb.InvalidTarget filter?


pkg/sql/schemachanger/scpb/name_mappings.go line 132 at r1 (raw file):

		return name
	}
	return NameMappings(ts.NameMappings).ConstraintName(relationID, constraintID)

Did I understand this correctly: the purpose of the last return clause is for cases where certain name-related elements are needed but not involved in the plan (ts.Target); For example, altering a table won't touch the parent database but we probably want to know its parent database name, which will be stored in the builder state as an element with InvalidTargetStatus by the decomp logic.


pkg/sql/schemachanger/scplan/plan_explain.go line 205 at r1 (raw file):

			var en treeprinter.Node
			var estimatedMemAlloc int
			accountFor := func(label string) string {

I saw this accountFor helper is defined multiple times. Will it be better to make it a function func accountFor(label string, memAcc *int) string and define it once?


pkg/sql/schemachanger/scplan/plan_explain.go line 377 at r1 (raw file):

// ExplainShape returns a human-readable plan rendering for
// EXPLAIN (DDL, SHAPE) statements.

Can you add a bit more comments here about the main purpose of this new EXPLAIN flag? something like "it gives a condensed summary of the operations involved in the schema change with highlights on expensive operations (backfill, validation, etc.), which helps users understand where most of the time will be spent if the schema change is to be executed."


pkg/sql/schemachanger/scplan/plan_explain.go line 446 at r1 (raw file):

func (p Plan) explainBackfillsAndMerges(root treeprinter.Node, ops []scop.Op) error {
	gbs, gms := groupBackfillsAndMerges(ops)

Are we respecting the real ordering of the backfill and merges if the plan involves >1 backfills and merges?

IIUC, backfill and merge always appears in pairs. If we have a plan where the real ordering of the operations are

  1. backfill_1
  2. merge_1
  3. backfill_2
  4. merge_2

Will the logic print out operations in that order? Or, will it be all backfills first and then all merges (i.e. backfill_1 followed by backfill_2 followed by merge_1 followed by merge_2)?


pkg/sql/schemachanger/scplan/plan_explain.go line 596 at r1 (raw file):

	sort.Slice(gbs, func(i, j int) bool {
		if gbs[i].relationID == gbs[j].relationID {
			return gbs[i].srcIndexID < gbs[j].srcIndexID

I have a gut feeling that this might not work properly, because it's possible that we have three backfill/merge groups:
index_a <-- index_b <-- index_c <-- index_d
where we backfill index_b from index_a, backfill index_c from index_b, and so on.

It's possible, depending on the implementation, that index_b.GetID() is bigger than index_c.GetID() (do you remember in my support-add-drop-column-alterPK PR we inflate this chain, de-duplicate, and re-inflate again for the next ALTER TABLE stmt, and so on. I think this makes what I described here possible)

In my PR, I need to do similar things to order the primary indexes and I had logic there to sort by the location of srcIndex (i.e. if 'B.srcIdx = A', then A comes before B in the ordering)


pkg/ccl/schemachangerccl/testdata/explain/create_index line 14 at r1 (raw file):

 │    └── Stage 1 of 1 in StatementPhase
 │         ├── 7 elements transitioning toward PUBLIC
 │         │    ├── ABSENT → BACKFILL_ONLY    SecondaryIndex:{DescID 104 "t1", IndexID 2 "+id1", TemporaryIndexID 3, SourceIndexID 1 "t1_pkey"}

what is this "+id1" name for IndexID 2? Saw a bunch of them


pkg/ccl/schemachangerccl/testdata/explain_shape/create_index line 13 at r1 (raw file):

 ├── execute 2 system table mutations transactions
 ├── backfill using primary index t1_pkey in relation t1
 │     ⇒ +id1: id↗, name↗, money

How to understand/interpret this line? Can we pretty print this step with something like backfill primary index (4) "idx4(i, j, k)" from primary index (2) "idx2(i,j)" in relation "t1", or, if we want to use symbols/shapes, maybe something like "idx2(i,j) <-- idx4(i,j,k)`.


pkg/ccl/schemachangerccl/testdata/explain_shape/create_index line 16 at r1 (raw file):

 ├── execute 2 system table mutations transactions
 ├── merge temporary indexes into backfilled indexes in relation t1
 │     ~t1@[3] ⇒ +id1

Similarly, can we format merging step with something easier to read/understand?

@postamar postamar force-pushed the explain-ddl-cost branch 2 times, most recently from 30cac99 to a67417d Compare May 29, 2023 15:34
Copy link
Contributor Author

@postamar postamar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much for having taken a look! This helps.

  1. This flag already exists for DML plans and is used to synthetise them somewhat.
  2. Indeed, yes, already done.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @Xiang-Gu)


pkg/sql/schemachanger/scbuild/build.go line 324 at r1 (raw file):

Previously, Xiang-Gu (Xiang Gu) wrote…

why do we want the ts==scpb.InvalidTarget filter?

The builder state includes elements which don't have targets set, that will be their target value. I rewrote this clause into isNotDropping which is closer to what I really want to express here.


pkg/sql/schemachanger/scpb/name_mappings.go line 132 at r1 (raw file):

Previously, Xiang-Gu (Xiang Gu) wrote…

Did I understand this correctly: the purpose of the last return clause is for cases where certain name-related elements are needed but not involved in the plan (ts.Target); For example, altering a table won't touch the parent database but we probably want to know its parent database name, which will be stored in the builder state as an element with InvalidTargetStatus by the decomp logic.

Yes basically. I added more commentary to clarify.


pkg/sql/schemachanger/scplan/plan_explain.go line 205 at r1 (raw file):

Previously, Xiang-Gu (Xiang Gu) wrote…

I saw this accountFor helper is defined multiple times. Will it be better to make it a function func accountFor(label string, memAcc *int) string and define it once?

I don't feel strongly about this but I don't think so, to me it's basically the same as this, and I have a slight preference for capturing the variable to update in a closure rather than dealing with pointers.


pkg/sql/schemachanger/scplan/plan_explain.go line 377 at r1 (raw file):

Previously, Xiang-Gu (Xiang Gu) wrote…

Can you add a bit more comments here about the main purpose of this new EXPLAIN flag? something like "it gives a condensed summary of the operations involved in the schema change with highlights on expensive operations (backfill, validation, etc.), which helps users understand where most of the time will be spent if the schema change is to be executed."

Done! Thanks.


pkg/sql/schemachanger/scplan/plan_explain.go line 446 at r1 (raw file):

Previously, Xiang-Gu (Xiang Gu) wrote…

Are we respecting the real ordering of the backfill and merges if the plan involves >1 backfills and merges?

IIUC, backfill and merge always appears in pairs. If we have a plan where the real ordering of the operations are

  1. backfill_1
  2. merge_1
  3. backfill_2
  4. merge_2

Will the logic print out operations in that order? Or, will it be all backfills first and then all merges (i.e. backfill_1 followed by backfill_2 followed by merge_1 followed by merge_2)?

There is no ordering during execution, these operations are performed concurrently. The ordering here is only to ensure that the explain output remains stable.


pkg/sql/schemachanger/scplan/plan_explain.go line 596 at r1 (raw file):

Previously, Xiang-Gu (Xiang Gu) wrote…

I have a gut feeling that this might not work properly, because it's possible that we have three backfill/merge groups:
index_a <-- index_b <-- index_c <-- index_d
where we backfill index_b from index_a, backfill index_c from index_b, and so on.

It's possible, depending on the implementation, that index_b.GetID() is bigger than index_c.GetID() (do you remember in my support-add-drop-column-alterPK PR we inflate this chain, de-duplicate, and re-inflate again for the next ALTER TABLE stmt, and so on. I think this makes what I described here possible)

In my PR, I need to do similar things to order the primary indexes and I had logic there to sort by the location of srcIndex (i.e. if 'B.srcIdx = A', then A comes before B in the ordering)

I don't understand any of this, sorry. I'm assuming this relates to the previous comment.


pkg/ccl/schemachangerccl/testdata/explain/create_index line 14 at r1 (raw file):

Previously, Xiang-Gu (Xiang Gu) wrote…

what is this "+id1" name for IndexID 2? Saw a bunch of them

I put a plus before elements that are added in the schema change. I changed this to a suffix. These little symbols make the most sense when looking at the EXPLAIN outputs of PK changes, because the new primary index is named the same as the old one (and the intermediate one too for that matter). We might have to revisit this syntax with your PR. I'd be fine with that, I don't expect this to be definitive.


pkg/ccl/schemachangerccl/testdata/explain_shape/create_index line 13 at r1 (raw file):

Previously, Xiang-Gu (Xiang Gu) wrote…

How to understand/interpret this line? Can we pretty print this step with something like backfill primary index (4) "idx4(i, j, k)" from primary index (2) "idx2(i,j)" in relation "t1", or, if we want to use symbols/shapes, maybe something like "idx2(i,j) <-- idx4(i,j,k)`.

You're right it's incomprehensible. Fixed.


pkg/ccl/schemachangerccl/testdata/explain_shape/create_index line 16 at r1 (raw file):

Previously, Xiang-Gu (Xiang Gu) wrote…

Similarly, can we format merging step with something easier to read/understand?

Fixed

@postamar postamar marked this pull request as ready for review May 29, 2023 15:41
@postamar postamar requested review from a team as code owners May 29, 2023 15:41
@postamar postamar requested a review from yuzefovich May 29, 2023 15:41
@yuzefovich yuzefovich removed their request for review May 30, 2023 07:39
Copy link
Contributor

@Xiang-Gu Xiang-Gu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining and addressing my feedback. LGTM!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @postamar)


pkg/sql/schemachanger/scplan/plan_explain.go line 446 at r1 (raw file):

Previously, postamar (Marius Posta) wrote…

There is no ordering during execution, these operations are performed concurrently. The ordering here is only to ensure that the explain output remains stable.

I see now. Input of this function explainBackfillAndMerges are all operations within the same stage!


pkg/sql/schemachanger/scplan/plan_explain.go line 596 at r1 (raw file):

Previously, postamar (Marius Posta) wrote…

I don't understand any of this, sorry. I'm assuming this relates to the previous comment.

Correct, I was confused by something else. The sorting here simply ensures a stable output ordering.

This commit adds ID -> name mappings to the scpb.TargetState
which lies at the heart of the declarative schema changer,
these mappings are then used (when present) to decorate the
output of EXPLAIN (DDL) in an effort to make it more understandable.

This commit also adds support for the SHAPE flag in conjunction with
EXPLAIN (DDL). The generated output informs on expensive operations
such as backfills or validations.

Fixes cockroachdb#104043.

Release note (sql change): EXPLAIN (DDL) statements now have
descriptor, index, column, constraint and other ID values
decorated with names when available. There is now also a new
EXPLAIN (DDL, SHAPE) statement which provides information on
costly operations planned by the declarative schema changer,
like which index backfills and validations will get performed.
@postamar
Copy link
Contributor Author

Thanks for the review!

Rebasing to get rid of flaky test.

@postamar
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented May 30, 2023

Build succeeded:

@craig craig bot merged commit 9fc6199 into cockroachdb:master May 30, 2023
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

schemachanger: names in EXPLAIN (DDL) output instead of IDs
3 participants