Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: introduce crdb_internal.statement_statistics virtual table #68715

Merged
merged 1 commit into from
Aug 18, 2021

Conversation

Azhng
Copy link
Contributor

@Azhng Azhng commented Aug 11, 2021

Depends on #68675
Related to: #64743

This commit introduces crdb_internal.statement_statistics virtual
table that exposes both cluster-wide in-memory statement statistics
as well as persited statement statistics. This new virtual table
will be used to replace crdb_internal.node_statement_statistics
virtual table, which only surface node-local in-memory statement
statsitics.

Release note (sql change): introduced new crdb_internal.statement_statistics
virtual table that surfaces both cluster-wide in-memory statement statistics
as well as persited statement statistics.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@Azhng Azhng force-pushed the sqlstats-new-stmtstats-view branch 4 times, most recently from d7f32e2 to 3d3e3ad Compare August 12, 2021 14:44
@Azhng Azhng requested a review from a team August 12, 2021 16:05
@Azhng Azhng marked this pull request as ready for review August 12, 2021 16:05
@Azhng Azhng requested review from a team as code owners August 12, 2021 16:05
@Azhng Azhng changed the title [draft] sql: introduce crdb_internal.statement_statistics virtual table sql: introduce crdb_internal.statement_statistics virtual table Aug 12, 2021
@Azhng Azhng force-pushed the sqlstats-new-stmtstats-view branch from 3d3e3ad to 2429c2b Compare August 12, 2021 22:29
@Azhng
Copy link
Contributor Author

Azhng commented Aug 13, 2021

cc: @kevin-v-ngo . Can you take a look at this new schema and let me know if you have any questions or concerns? This is what I hope will be replacing the current node_statement_statistics with.

Copy link

@kevin-v-ngo kevin-v-ngo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - left a few comments.

app_name STRING NOT NULL,
metadata JSONB NOT NULL,
statistics JSONB NOT NULL,
plan JSONB NOT NULL

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove the plan for now? Since we have a plan_hash which doesn't correlate to to this plan, it might cause confusion. Users might assume a change in the hash indicates a change in the plan which wouldn't always be the case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This view will also be used to serve the DB Console. So this column kind of has to here in order for that to happen. I think what we can do here is rename the current virtual table to something like crdb_internal.<some_name_suggests_it_should_not_be_used_by_end_user and we create a crdb_internal view on top of it to expose the schema that we want user to query.

Do you have any thoughts on the name we should use? I'm thinking something like crdb_internal.private_statement_statistics or something like that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline and renamed the column to sampled_plan.

@@ -1042,6 +1042,23 @@ CREATE TABLE crdb_internal.session_variables (
value STRING NOT NULL,
hidden BOOL NOT NULL
) {} {}
CREATE TABLE crdb_internal.statement_statistics (

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Seems like the exact same schema as system.statement_statistics right?

What's your rationale for making this a table vs a view?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a table here is necessary because we are exposing cluster-wide in-memory data. This data is not accessible via SQL anywhere in the system and that is the reason why we need to create a virtual table to populate the data ourselves in Golang instead of SQL.

fingerprint_id BYTES NOT NULL,
plan_hash INT8 NOT NULL,
app_name STRING NOT NULL,
metadata JSONB NOT NULL,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future, we may want to consider a view which would normalize the JSONB COLUMNS (statistics and metadata) so users wouldn't have to do this on their own.

It would make it easier to consume as well for those who rely on the DDL to understand what fields are available to them. Currently, you need to query the JSONB columns to know what's available.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. I think this is another point that suggest us to use switch this to a internal virtual table and exposes a user friendly virtual view on top of it.

@Azhng Azhng force-pushed the sqlstats-new-stmtstats-view branch from 2429c2b to 89f8768 Compare August 16, 2021 19:57
@Azhng Azhng requested a review from a team August 16, 2021 19:57
@Azhng Azhng force-pushed the sqlstats-new-stmtstats-view branch from 89f8768 to 977b7c7 Compare August 17, 2021 02:01
@Azhng Azhng removed request for a team August 17, 2021 02:01
@Azhng
Copy link
Contributor Author

Azhng commented Aug 17, 2021

Rebased to remove the merged commit.

metadata JSONB NOT NULL,
statistics JSONB NOT NULL,
sampled_plan JSONB NOT NULL
) CREATE TABLE crdb_internal.statement_statistics (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you duplicate the table definition here intentionally?

Comment on lines 346 to 377
// If unsafe flag is set to true, then the new entry is fetched/created without
// locking onto the mutex. This should only be used when caller is absolutely
// sure that the Container is not being used concurrently by multiple goroutines.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than calling it unsafe, what if we called it doNotLock or alreadyLocked. It might be the case that it's not explicitly locked above this call but it might be implicitly locked. It's not actually "unsafe" to not lock it, it just means that that locking is being handled at a different level.

Alternatively, much of the rest of the cockroach codebase would split this into two methods, one called getStatsForStmtWithKey and another called getStatsForStmtWithKeyLocked where the first is:

func (s *Container) getStatsForStmtWithKey(
	key stmtKey, stmtFingerprintID roachpb.StmtFingerprintID, createIfNonexistent bool, unsafe bool,
) (stats *stmtStats, created, throttled bool) {
    s.mu.Lock()
    defer s.mu.Unlock()
    return  s.getStatsForStmtWithKey(key, stmtFingerprintID, createIfNonexistent)
}

Comment on lines 34 to 37
s.mu.apps = make(map[string]*ssmemstorage.Container)

for i := range statistics {
container := s.unsafeGetStatsContainerForAppName(statistics[i].Key.KeyData.App)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if you sorted the statistics slice by App and then made a form of ssmemstorage that took a slice of either these things or of some of its fields and then we could stop talking about unsafe in any exported APIs.

Comment on lines 27 to 52
// * it bypasses the mutex locking when it's inserting new data since
// it SHOULD NOT be used concurrently by multiple goroutines.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean SHOULD NOT be used concurrently here. What should not be used concurrently?

Comment on lines +4983 to +4984
// Perform RPC fanout.
stats, err := p.extendedEvalCtx.SQLStatusServer.Statements(ctx, &serverpb.StatementsRequest{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any plans to do either streaming or memory accounting here?

@Azhng Azhng force-pushed the sqlstats-new-stmtstats-view branch 2 times, most recently from b02ef9b to f763a8c Compare August 17, 2021 15:01
Copy link
Contributor Author

@Azhng Azhng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @kevin-v-ngo)


pkg/sql/crdb_internal.go, line 4984 at r3 (raw file):

Previously, ajwerner wrote…

Do we have any plans to do either streaming or memory accounting here?

Yes: #69032. We would be introducing memory accounting for the RPC endpoint.

For streaming, do you mean the gRPC streaming RPC?


pkg/sql/logictest/testdata/logic_test/create_statements, line 1055 at r3 (raw file):

Previously, ajwerner wrote…

did you duplicate the table definition here intentionally?

This is an intentional duplication of the system table definition (with some book-keeping columns removed. e.g. node_id)


pkg/sql/sqlstats/sslocal/temp_sql_stats.go, line 28 at r3 (raw file):

Previously, ajwerner wrote…

What do you mean SHOULD NOT be used concurrently here. What should not be used concurrently?

Hmm this comment actually doesn't make sense. Removed.


pkg/sql/sqlstats/sslocal/temp_sql_stats.go, line 37 at r3 (raw file):

Previously, ajwerner wrote…

what if you sorted the statistics slice by App and then made a form of ssmemstorage that took a slice of either these things or of some of its fields and then we could stop talking about unsafe in any exported APIs.

Updated. Now this constructor will sort the slice by App field and creates ssmemstorage.Container to ingest that slice.


pkg/sql/sqlstats/ssmemstorage/ss_mem_storage.go, line 348 at r3 (raw file):

Previously, ajwerner wrote…

Rather than calling it unsafe, what if we called it doNotLock or alreadyLocked. It might be the case that it's not explicitly locked above this call but it might be implicitly locked. It's not actually "unsafe" to not lock it, it just means that that locking is being handled at a different level.

Alternatively, much of the rest of the cockroach codebase would split this into two methods, one called getStatsForStmtWithKey and another called getStatsForStmtWithKeyLocked where the first is:

func (s *Container) getStatsForStmtWithKey(
	key stmtKey, stmtFingerprintID roachpb.StmtFingerprintID, createIfNonexistent bool, unsafe bool,
) (stats *stmtStats, created, throttled bool) {
    s.mu.Lock()
    defer s.mu.Unlock()
    return  s.getStatsForStmtWithKey(key, stmtFingerprintID, createIfNonexistent)
}

Done.

Copy link
Contributor

@maryliag maryliag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 8 of 15 files at r2, 4 of 17 files at r3, 2 of 4 files at r4, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @Azhng, and @kevin-v-ngo)


pkg/sql/logictest/testdata/logic_test/create_statements, line 1055 at r3 (raw file):

Previously, Azhng (Archer Zhang) wrote…

This is an intentional duplication of the system table definition (with some book-keeping columns removed. e.g. node_id)

I don't understand the need for the duplication (and it's exactly the same, none of the two definitions have node_id as you mentioned)


pkg/sql/sqlstats/ssmemstorage/ss_mem_storage.go, line 348 at r3 (raw file):

Previously, Azhng (Archer Zhang) wrote…

Done.

The comment on the function needs to be updated. You're not passing any unsafe flag on this function anymore


pkg/sql/sqlstats/ssmemstorage/ss_mem_storage.go, line 210 at r4 (raw file):

}

// NewTempContainerFromExistingData creates a new Container by ingests a slice

nit: by ingesting

something is missing on this comment, creates by ingesting the slice into what?

@Azhng Azhng force-pushed the sqlstats-new-stmtstats-view branch 2 times, most recently from 6379927 to d9209ba Compare August 17, 2021 17:36
Copy link
Contributor Author

@Azhng Azhng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @kevin-v-ngo)


pkg/sql/logictest/testdata/logic_test/create_statements, line 1055 at r3 (raw file):

Previously, maryliag (Marylia Gutierrez) wrote…

I don't understand the need for the duplication (and it's exactly the same, none of the two definitions have node_id as you mentioned)

Oh wait you meant the duplicate in the test file? This is because this is selecting from two different columns from the crdb_internal.create_statements. (create_statements and create_nofks).


pkg/sql/sqlstats/ssmemstorage/ss_mem_storage.go, line 348 at r3 (raw file):

Previously, maryliag (Marylia Gutierrez) wrote…

The comment on the function needs to be updated. You're not passing any unsafe flag on this function anymore

Done.


pkg/sql/sqlstats/ssmemstorage/ss_mem_storage.go, line 210 at r4 (raw file):

Previously, maryliag (Marylia Gutierrez) wrote…

nit: by ingesting

something is missing on this comment, creates by ingesting the slice into what?

Done. The comment says it creates a new Container by ingesting the slice. Do you want to word it differently?

Copy link
Contributor

@maryliag maryliag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

please get the approval from @ajwerner before merging

Reviewed 1 of 17 files at r3, 1 of 4 files at r4, 1 of 1 files at r5, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner and @kevin-v-ngo)

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some nits left from me

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @Azhng, @kevin-v-ngo, and @maryliag)


pkg/sql/crdb_internal.go, line 4984 at r3 (raw file):

Previously, Azhng (Archer Zhang) wrote…

Yes: #69032. We would be introducing memory accounting for the RPC endpoint.

For streaming, do you mean the gRPC streaming RPC?

I'm not exactly sure, I suppose that that's an approach. It might not be a good one. Some form of pagination in unary RPCs might also be useful.

The more I think about it, the more I think accounting matters more. Each node shouldn't have more than some number of MBs. Maybe we do something like pre-reserve the memory before sending the RPC and have the server only send up to that many bytes or perhaps fail. Anyway, good you have a tracking issue.


pkg/sql/sqlstats/sslocal/temp_sql_stats.go, line 39 at r6 (raw file):

Quoted 4 lines of code…

	tmp := s[i]
	s[i] = s[j]
	s[j] = tmp

no need for this in go. This should be:

func (s stmtResponseList) Swap(i, j int) { s[i], s[j] = s[j], s[i] }

pkg/sql/sqlstats/ssmemstorage/ss_mem_storage.go, line 384 at r6 (raw file):

}

func (s *Container) getStatsForStmtWithKeyNotLocked(

I think this should be called Locked as in it implies that the data structure is locked. You'll find this idiom all over the kvserver code base and those methods are often used also during construction. Calling this NotLocked I find very confusing.

This commit introduces crdb_internal.statement_statistics virtual
table that exposes both cluster-wide in-memory statement statistics
as well as persited statement statistics. This new virtual table
will be used to replace crdb_internal.node_statement_statistics
virtual table, which only surface node-local in-memory statement
statsitics.

Release note (sql change): introduced new crdb_internal.statement_statistics
 virtual table that surfaces both cluster-wide in-memory statement statistics
 as well as persited statement statistics.
@Azhng Azhng force-pushed the sqlstats-new-stmtstats-view branch from 6348748 to 7c27f3f Compare August 17, 2021 21:22
Copy link
Contributor Author

@Azhng Azhng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @kevin-v-ngo and @maryliag)


pkg/sql/crdb_internal.go, line 4984 at r3 (raw file):

Previously, ajwerner wrote…

I'm not exactly sure, I suppose that that's an approach. It might not be a good one. Some form of pagination in unary RPCs might also be useful.

The more I think about it, the more I think accounting matters more. Each node shouldn't have more than some number of MBs. Maybe we do something like pre-reserve the memory before sending the RPC and have the server only send up to that many bytes or perhaps fail. Anyway, good you have a tracking issue.

I see. Sounds good,


pkg/sql/sqlstats/sslocal/temp_sql_stats.go, line 39 at r6 (raw file):

Previously, ajwerner wrote…

	tmp := s[i]
	s[i] = s[j]
	s[j] = tmp

no need for this in go. This should be:

func (s stmtResponseList) Swap(i, j int) { s[i], s[j] = s[j], s[i] }

Done.


pkg/sql/sqlstats/ssmemstorage/ss_mem_storage.go, line 384 at r6 (raw file):

Previously, ajwerner wrote…

I think this should be called Locked as in it implies that the data structure is locked. You'll find this idiom all over the kvserver code base and those methods are often used also during construction. Calling this NotLocked I find very confusing.

Done.

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @kevin-v-ngo and @maryliag)

@Azhng
Copy link
Contributor Author

Azhng commented Aug 17, 2021

TFTR!

bors r=maryliag,ajwerner

@craig
Copy link
Contributor

craig bot commented Aug 18, 2021

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Aug 18, 2021

Build succeeded:

@craig craig bot merged commit 472008d into cockroachdb:master Aug 18, 2021
Azhng added a commit to Azhng/cockroach that referenced this pull request Aug 18, 2021
This commit introduces crdb_internal.transaction_statistics virtual
table that exposes both cluster-wide in-memory transaction statistics
as well as persited transaction statistics. This new virtual table
will be used to replace crdb_internal.node_transaction_statistics
virtual table, which only surface node-local in-memory transaction
statsitics.

Follow up to cockroachdb#68715

Release note (sql change): introduced new crdb_internal.transaction_statistics
 virtual table that surfaces both cluster-wide in-memory transaction statistics
 as well as persisted transaction statistics.
Azhng added a commit to Azhng/cockroach that referenced this pull request Aug 23, 2021
This commit introduces crdb_internal.transaction_statistics virtual
table that exposes both cluster-wide in-memory transaction statistics
as well as persited transaction statistics. This new virtual table
will be used to replace crdb_internal.node_transaction_statistics
virtual table, which only surface node-local in-memory transaction
statsitics.

Follow up to cockroachdb#68715

Release note (sql change): introduced new crdb_internal.transaction_statistics
 virtual table that surfaces both cluster-wide in-memory transaction statistics
 as well as persisted transaction statistics.

Release justification: Category 4
craig bot pushed a commit that referenced this pull request Aug 23, 2021
69049: sql: implement crdb_internal.transaction_statistics r=ajwerner,maryliag a=Azhng

Depends on #68715 and #69022

# First Commit 

roachpb,sql,server: add TransactionFingerprintID to
 roachpb.CollectedTransactionStatistics

Previously, roachpb.CollectedTransactionStatistics does not contain
TransactionFingerprintID. This results in SQLStatusServer's
/_status/statements endpoint not able to return transaction
fingerprint id in the response.
This commit adds transaction fingerprint id into CollectedTransactionStatistics
and simplifies the API in multiple layers.

Release note: None

# Second Commit 

sql: introduce crdb_internal.transaction_statistics virtual table

This commit introduces crdb_internal.transaction_statistics virtual
table that exposes both cluster-wide in-memory transaction statistics
as well as persited transaction statistics. This new virtual table
will be used to replace crdb_internal.node_transaction_statistics
virtual table, which only surface node-local in-memory transaction
statsitics.

Follow up to #68715

Release justification: Category 4

Release note (sql change): introduced new crdb_internal.transaction_statistics
 virtual table that surfaces both cluster-wide in-memory transaction statistics
 as well as persisted transaction statistics.

69151: sql/resolver: wrong error when db does not exist for virtual schemas r=ajwerner a=fqazi

Fixes: #68060

Previously, when a database did not exist under a virtual
schema the code would fall through do a normal look up.
Before a recent refactor of the some of the internals,
we accidentally changed behavior, so that an undefined
relation error was returned. To address this, this patch
adds a check inside the resolver layer to return the correct
error instead of falling through.

Release note: None

69212: workload: fix pgx error cast in kv95 and schemachange r=RichardJCai a=rafiss

fixes #69189
#69100 is failing but i'm not sure if this is the cause. (cc @ajwerner)

This was done incorrectly after the recent upgrade to pgx4.
`pgconn.PgError` does not implement `error`, but `*pgconn.PgError`
does.

Release justification: test only change
Release note: None

69223: changefeedccl: Rework webhook sink flushing implementation. r=miretskiy a=miretskiy

Stop relying on wait group to implement flush logic in webhook sink.
The wait group does not respect context cancellation.  Because of that,
it is possible that the caller blocks, waiting for Flush to complete,
while immediately after blocking, the context is cancelled.
When this happens the go routines running responsible for processing
the messages may terminate prior to decrementing wait group counts.

Instead of relying on waitgroup, use synchronization provided by the
channels themselves, and introduce a new type of worker request (flush)
which correctly flushes and waits for flush to complete, while respecting
context cancellation.

Fixes #69175

Release Notes: None

Co-authored-by: Azhng <archer.xn@gmail.com>
Co-authored-by: Faizan Qazi <faizan@cockroachlabs.com>
Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>
Co-authored-by: Yevgeniy Miretskiy <yevgeniy@cockroachlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants