Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: support indexes in regclass cast and pg_table_is_visible #90649

Merged
merged 4 commits into from
Dec 10, 2022

Conversation

rafiss
Copy link
Collaborator

@rafiss rafiss commented Oct 25, 2022

Epic: CRDB-23454
fixes #88097

sql: remove deprecated function from DatabaseCatalog

The ParseQualifiedTableName function was deprecated and unused.

sql: support casts from index name to regclass

Release note (sql change): Casts from index name to regclass are now
supported. Previously, only table names could be cast to regclass.

sql: add index on pg_namespace(oid)

This will allow us to efficiently join to this table.

sql: fix pg_table_is_visible to handle indexes

This uses the internal executor now to do the introspection using
pg_class where everything can be looked up all at once.

There's an increase in number of round trips, but it's a constant
factor.

Release note (bug fix): Fixed the pg_table_is_visible builtin function
so it correctly reports visibility of indexes based on the current
search_path.

@rafiss rafiss requested a review from ajwerner October 25, 2022 19:59
@rafiss rafiss requested a review from a team as a code owner October 25, 2022 19:59
@rafiss rafiss requested review from a team and msbutler and removed request for a team October 25, 2022 19:59
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@rafiss
Copy link
Collaborator Author

rafiss commented Oct 25, 2022

@ajwerner i added you here since this is going to intersect a bit with some changes you did in #90116

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change may have surprising performance implications. When this cast is executed, it's now no longer going to be able to leverage leases. Before we'd be doing mostly point-lookups in the happy path from in-memory data structures. Now we're going to be fetching a bunch of descriptors from the key-value store and then materializing a virtual table.

I get the motivation, but it just generally scares me a lot.

The answer I keep coming to in my head is that we need a toolkit to cache and version whole hierarchies of descriptors and we need to build data structures over them to make this sort of thing cheap. We're totally not there and I don't know when we'll get the time to be there.

How do we validate that this change is not going to trash performance of some workload?

@@ -178,12 +179,54 @@ func ParseDOid(ctx context.Context, evalCtx *Context, s string, t *types.T) (*tr
if err != nil {
return nil, err
}
id, err := evalCtx.Planner.ResolveTableName(ctx, &tn)

if tn.ExplicitCatalog {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you encapsulate this new logic? It's not very simple and we'll want to comment it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Collaborator Author

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before we'd be doing mostly point-lookups in the happy path from in-memory data structures. Now we're going to be fetching a bunch of descriptors from the key-value store and then materializing a virtual table.

I've modified this so the old happy path remains. The more expensive cast logic should only happen for indexes now. I feel like the index cast is useful for debugging, but isn't something that would be as commonly used in a hot path of a workload. Also, with the new internal executor work, it's sharing more of the descriptor leases.

How do we validate that this change is not going to trash performance of some workload?

I was hoping to see some results in the RTT benchmarks.

@@ -178,12 +179,54 @@ func ParseDOid(ctx context.Context, evalCtx *Context, s string, t *types.T) (*tr
if err != nil {
return nil, err
}
id, err := evalCtx.Planner.ResolveTableName(ctx, &tn)

if tn.ExplicitCatalog {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@ajwerner
Copy link
Contributor

How do we validate that this change is not going to trash performance of some workload?

I was hoping to see some results in the RTT benchmarks.

This is a problem with the rttanalysis benchmark. It can show decreases in round-trips if your change leads to scanning all descriptors because we do that in one round-trip. It's often very expensive.

@rafiss
Copy link
Collaborator Author

rafiss commented Oct 26, 2022

oh no

    rtt_analysis_bench.go:175: ORMQueries/django_table_introspection_4_tables: got 100, expected 2

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cool. Thanks for adding it!

JOIN %[1]spg_catalog.pg_namespace AS n ON c.relnamespace = n.oid
JOIN current_schemas AS cs ON cs.scname = n.nspname
WHERE c.relname = $1
AND $2 IS NOT NULL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when is $2 NULL here? I think this is tn.Schema() but I think that always returns a string value. What's the intention here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's never null and it should get constant-folded away

there's a comment up above:

// There is an unused $2 placeholder in the query so that the call to QueryRow can
// be consolidated.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i can make it less confusing by making an args... slice that is different in the conditional branches, and passing it through

@rafiss
Copy link
Collaborator Author

rafiss commented Oct 27, 2022

ORMQueries/django_table_introspection_4_tables: got 100, expected 2 is caused by the join to pg_namespace. The extra lookups theoretically could be avoided if we added a pg_namespace(oid) index, but right now that is not possible because schema OIDs are made through the OID hasher. But I don't see why we can't change the OIDs for schemas to be the descriptor IDs.

@rafiss
Copy link
Collaborator Author

rafiss commented Oct 31, 2022

This PR is currently blocked on #91012

@yuzefovich
Copy link
Member

I opened up #92713 to fix #91012, it'd be nice to double check that it unblocks this PR.

@rafiss rafiss force-pushed the fix-index-casts-builtins branch 3 times, most recently from 45efe60 to 73c9dd7 Compare December 2, 2022 21:45
@rafiss
Copy link
Collaborator Author

rafiss commented Dec 2, 2022

thanks @yuzefovich - your fix has unblocked this

@rafiss
Copy link
Collaborator Author

rafiss commented Dec 5, 2022

@ajwerner if you're still up to review this, this is RFAL. the failure should be resolved by #92993

The ParseQualifiedTableName function was deprecated and unused.

Release note: None
Release note (sql change): Casts from index name to regclass are now
supported. Previously, only table names could be cast to regclass.
Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: but see how you feel about my suggestion regarding temp schemas

Reviewed 1 of 16 files at r14, 1 of 16 files at r20, 3 of 3 files at r21, 4 of 6 files at r22, 8 of 8 files at r23, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @msbutler and @rafiss)


pkg/sql/pg_catalog.go line 2083 at r23 (raw file):

	indexes: []virtualIndex{
		{
			// incomplete is true because this index does not contain temporary schemas.

Does this matter? I think, if we can, it'd be better to say this is complete and just say our temp table implementation is somewhat broken. Given nobody ought to use it, that seems like a better tradeoff to me. Also, this will find the local session's temp namespace, right? WDYT?


pkg/sql/pg_catalog.go line 2107 at r23 (raw file):

						// No schema found, so no rows.
						//nolint:returnerrcheck
						return false, nil

nit: assign err to nil here and you can avoid having to defeat the linter

Code quote:

						// No schema found, so no rows.
						//nolint:returnerrcheck
						return false, nil

pkg/sql/pg_catalog.go line 2120 at r23 (raw file):

				} else if sc.SchemaKind() == catalog.SchemaPublic {
					// admin is the owner of the public schema.
					ownerOID = h.UserOid(username.MakeSQLUsernameFromPreNormalizedString("admin"))

nit: can we pull this out to a var rather than re-hashing every time?

Copy link
Collaborator Author

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tftr! added the changes

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner and @msbutler)


pkg/sql/pg_catalog.go line 2083 at r23 (raw file):
i'm fine with that tradeoff.

Also, this will find the local session's temp namespace, right?

you're right, i didn't realize. that makes me feel better about marking it as complete. added a test.


pkg/sql/pg_catalog.go line 2107 at r23 (raw file):

Previously, ajwerner wrote…

nit: assign err to nil here and you can avoid having to defeat the linter

done


pkg/sql/pg_catalog.go line 2120 at r23 (raw file):

Previously, ajwerner wrote…

nit: can we pull this out to a var rather than re-hashing every time?

done

Copy link
Collaborator Author

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bors r=ajwerner

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner and @msbutler)

@rafiss
Copy link
Collaborator Author

rafiss commented Dec 9, 2022

whoops missed a test ...

bors r-

@craig
Copy link
Contributor

craig bot commented Dec 9, 2022

Canceled.

This will allow us to efficiently join to this table.

Release note: None
This uses the internal executor now to do the introspection using
pg_class where everything can be looked up all at once.

There's an increase in number of round trips, but it's a constant
factor.

Release note (bug fix): Fixed the pg_table_is_visible builtin function
so it correctly reports visibility of indexes based on the current
search_path.
@rafiss
Copy link
Collaborator Author

rafiss commented Dec 9, 2022

turns out our implementation of the pg_is_other_temp_schema builtin directly depends on on the pg_namespace(oid) index containing temp schemas. so i've just added the fallback logic to the populate function.

bors r=ajwerner

@craig
Copy link
Contributor

craig bot commented Dec 10, 2022

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: pg_table_is_visible() doesn't work for indexes
4 participants