sql: periodically flush sqlstats #67090

Azhng · 2021-06-30T21:08:23Z

Previous PR: #67866

This commit introduces a new persisted sql stats subsystem
that wraps the existing node-local sql stats subsystem.
This new subsystem is responsible for flushing the in-meory
statistics into the system table periodically, or when it
detects memory pressure.
This replaces sql.Server's in-memory sqlStats provider.

Release note (sql change): SQL stats now can be persisted into
system.statement_statistics and system.transaction_statistics
tables by enabling the sql.stats.flush.enable cluster setting.
The interval of persistence is determined by the new
sql.stats.flush.interval cluster setting which defaults to 1 hour.

cockroach-teamcity · 2021-06-30T21:08:33Z

This change is

ajwerner

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @Azhng)

pkg/sql/sqlstats/persistedsqlstats/sqlstatsutil/json_decoding.go, line 136 at r1 (raw file):

	metadata json.JSON, result *roachpb.CollectedStatementStatistics,
) error {
	err := decodeString(metadata, "stmtTyp", &result.Stats.SQLType)

You've built a lot of machinery here but it still ends up being quite verbose to use. I think I'd rework it to be more composable. You can probably also rearrange it such that the decoding and encoding can be symmetrical for each of your top-level data types. Consider the following approach:

type jsonDecoder interface {
    decodeJSON(js json.JSON) error
}

type decimal tree.DDecimal

func (d *decimal) decodeJSON(js json.JSON) error {
    var ok bool
    decimal := (*apd.Decimal)(d)
    *decimal, ok = field.AsDecimal()
    if !ok {
	return nil, errors.New("unable to decode float64")
     }
    return nil
}

type jsonFloat float64

func (f *jsonFloat) decodeJSON(js json.JSON) error {
        var d apd.Decimal
        if err := (*decimal)(d).decodeJSON(js); err != nil {
            return err
        }
        asFloat, err := d.AsFloat()
        if err != nil {
            return err
        }
        *f = jsonFloat(asFloat)
       return nil
}

func (jf jsonField) (js json.JSON) (err error) {
        defer func() {
            if err != nil {
                err = errors.Wrapf(err, "decoding field %s", field)
            }
        }()
    	field, err := safeFetchVal(js, jf.field)
	if err != nil {
		return err
	}
        return jf.dec.decodeJSON(field)
}

type jsonField struct {
    field string
    dec  jsonDecoder
}

type jsonFields []jsonField

func (jf jsonFeilds) decodeJSON(js json.JSON) error {
    for i := range jf {
        if err := jf[i].decodeJSON(js); err != nil {
            return err
        }
    }
}

func (n *numeric) decodeJSON(obj json.JSON) error {
        fields := [...]jsonField{
            {"mean", (*jsonFloat)(&result.Mean)},
            {"sqDiff", (*jsonFloat)(&result.SquaredDiffs)},
        }
        return (jsonFields)(fields[:]).decodeJSON(obj)
}

//...

Azhng · 2021-07-26T23:14:09Z

TPCC-1000 Benchmark on 3-node roachprod (16 vCPU each):

This PR:

Audit check 9.2.1.7: PASS
Audit check 9.2.2.5.1: PASS
Audit check 9.2.2.5.2: PASS
Audit check 9.2.2.5.3: PASS
Audit check 9.2.2.5.4: PASS
Audit check 9.2.2.5.5: PASS
Audit check 9.2.2.5.6: PASS

_elapsed_______tpmC____efc__avg(ms)__p50(ms)__p90(ms)__p95(ms)__p99(ms)_pMax(ms)
 7200.0s    12586.2  97.9%     30.3     22.0     28.3     35.7     60.8   4563.4


Audit check 9.2.1.7: PASS
Audit check 9.2.2.5.1: PASS
Audit check 9.2.2.5.2: PASS
Audit check 9.2.2.5.3: PASS
Audit check 9.2.2.5.4: PASS
Audit check 9.2.2.5.5: PASS
Audit check 9.2.2.5.6: PASS

_elapsed_______tpmC____efc__avg(ms)__p50(ms)__p90(ms)__p95(ms)__p99(ms)_pMax(ms)
 7200.0s    12583.8  97.9%     28.3     21.0     28.3     33.6     58.7   3892.3

Seems like the impact increasing the avg latency by 2ms.

TPCC runs for 2 hours and I changed the flush interval to 15 mins so it's easier to observe through custom time series chart:

Total flush count:

Average flush duration:

In-memory footprint

cc: @jordanlewis

jordanlewis · 2021-07-27T14:40:34Z

Very nice!

The average flush duration seems a little suspicious - 20 microseconds is too fast for an entire roundtrip to SQL. Can you double check that?

Azhng · 2021-07-27T16:28:06Z

Ah good catch. Seems like I used the wrong Histogram. Updated!

Flush duration chart with 10s flush interval ^

Azhng · 2021-08-03T13:54:22Z

I gathered more data from the master branch and the flush branch (with different flush interval configured) using TPCC-1000.

It seems like we are not creating too much impact in terms of latency. Master averages 59.05ms p99 latency, the flush15 (15 mins flush interval) and flush60 averages 59.79ms and 60.27ms p99 latency respectively. The increase of p99 latency is below 2ms.

Though I'm choosing to ignore the pMax latency here because it seems like we are seeing a huge fluctuation of its value even on the master branch. So it serves as a poor indicator of the overall performance.

# experiement  elapsed    tpmC     efc       avg(ms)  p50(ms)  p90(ms)  p95(ms)  p99(ms)pMax(ms)
master 1       7200.0s    12583.8  97.9%     28.3     21.0     28.3     33.6     58.7   3892.3
master 2       7200.0s    12588.1  97.9%     31.0     22.0     29.4     33.6     60.8   6979.3
master 3       7200.0s    12588.8  97.9%     30.1     22.0     29.4     35.7     58.7   4563.4
master 4       7200.0s    12585.5  97.9%     33.5     21.0     28.3     33.6     60.8   6442.5
master 5       7200.0s    12589.9  97.9%     28.9     19.9     25.2     30.4     54.5   6710.9
master 6       7200.0s    12592.3  97.9%     29.3     22.0     28.3     33.6     60.8   4295.0

flush 15min 1  7200.0s    12586.2  97.9%     30.3     22.0     28.3     35.7     60.8   4563.4
flush 15min 2  7200.0s    12587.8  97.9%     33.6     21.0     28.3     32.5     58.7   8053.1
flush 15min 3  7200.0s    12589.8  97.9%     27.5     21.0     27.3     31.5     54.5   3758.1
flush 15min 4  7200.0s    12582.1  97.8%     30.3     22.0     29.4     35.7     62.9   4563.4
flush 15min 5  7200.0s    12584.7  97.9%     34.8     22.0     29.4     35.7     62.9   7247.8

flush 60min 1  7200.0s    12586.6  97.9%     30.6     22.0     30.4     37.7     65.0   4563.4
flush 60min 2  7200.0s    12591.6  97.9%     27.9     19.9     26.2     30.4     54.5   4295.0
flush 60min 3  7200.0s    12589.6  97.9%     32.9     21.0     27.3     32.5     56.6   6442.5
flush 60min 4  7200.0s    12587.4  97.9%     29.0     21.0     28.3     33.6     65.0   4563.4

Overall I think this is good news for us. I think the next step for me is to try to overload the cluster and see how the performance varies.

67866: sql: implement SQL Stats flush logic r=Azhng a=Azhng Previous PR: #67805 Next Chained PR: #67090 ## First Commit sql: remove `count` from stmt/txn stats system table Previously, system.statement_statistics and system.transaction_statistics table includes a `count` column that corresponds to `roachpb.StatementStatistics.Count` and `roachpb.TransactionStatistics.Count` fields respectively. The objective for that column is to make `INSERT ON CONFLICT DO UPDATE` style query easy. However, since early prototyping have shown that `INSERT ON CONFLICT DO UPDATE` style statement is quite inefficient, the SQL Stats flush mechanism will be implemented using separate queries INSERT and UPDATE statements. This column is no longer userful and it would require special handling. Removing this column simplifies the flush logic and removes the need for special handlings. Release note (sql change): count column is removed from system.statement_statistics and system.transaction_statistics tables. ## Second Commit sql: implement persistedsqlstats flush logic This commit implements the initial flush logic of the persisted sql stats subsystem. Release note: None 68426: kv: assert txn unused in SetFixedTimestamp r=nvanbenschoten a=nvanbenschoten This commit asserts that a transaction has not been used to read or to write by the time that `SetFixedTimestamp` is called on it. This was extracted from #68194 and modified to return an error from `SetFixedTimestamp` on misuse instead of fatal-ing. This provides a sufficient, temporary backstop for #68216 until the conn executor logic is fixed: ``` root@127.0.0.1:26257/movr> create table t (x int); CREATE TABLE root@127.0.0.1:26257/movr> insert into t values (1); INSERT 1 root@127.0.0.1:26257/movr> select crdb_internal_mvcc_timestamp, * from t; crdb_internal_mvcc_timestamp | x ---------------------------------+---- 1628094563935439000.0000000000 | 1 (1 row) root@127.0.0.1:26257/movr> begin as of system time (1628094563935439000.0000000000-1)::string; BEGIN root@127.0.0.1:26257/movr OPEN> select * from t; x ----- (0 rows) root@127.0.0.1:26257/movr OPEN> prepare y as select * from t as of system time 1628094563935439000.0000000000; ERROR: internal error: cannot set fixed timestamp, txn "sql txn" meta={id=e5e81c19 pri=0.01517572 epo=0 ts=1628094563.935438999,0 min=1628094563.935438999,0 seq=0} lock=false stat=PENDING rts=1628094563.935438999,0 wto=false gul=1628094563.935438999,0 already performed reads SQLSTATE: XX000 DETAIL: stack trace: github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_coord_sender.go:1016: SetFixedTimestamp() github.com/cockroachdb/cockroach/pkg/kv/txn.go:1200: SetFixedTimestamp() github.com/cockroachdb/cockroach/pkg/sql/conn_executor_prepare.go:278: populatePrepared() github.com/cockroachdb/cockroach/pkg/sql/conn_executor_prepare.go:220: func1() github.com/cockroachdb/cockroach/pkg/sql/conn_executor_prepare.go:226: prepare() github.com/cockroachdb/cockroach/pkg/sql/conn_executor_prepare.go:112: addPreparedStmt() github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:570: execStmtInOpenState() github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:126: execStmt() github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1626: func1() github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1628: execCmd() github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1550: run() github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:627: ServeConn() github.com/cockroachdb/cockroach/pkg/sql/pgwire/conn.go:645: func1() runtime/asm_amd64.s:1371: goexit() HINT: You have encountered an unexpected error. Please check the public issue tracker to check whether this problem is already tracked. If you cannot find it there, please report the error with details by creating a new issue. If you would rather not post publicly, please contact us directly using the support form. We appreciate your feedback. root@127.0.0.1:26257/? ERROR> ``` 68442: kv: include RangeID in rangefeed goroutine stacks r=nvanbenschoten a=nvanbenschoten This commit includes the RangeID in each of a rangefeed processor and its registations' associated goroutine stacks. This is a cheap and easy way to get better observability into the ranges that have active rangefeeds. It also tells us where those goroutines are spending their time. This will also become easier to use in Go 1.17, which improved the format of stack traces. 68443: parser: add VIRTUAL syntax to help r=RaduBerinde a=rafiss Release note: None Co-authored-by: Azhng <archer.xn@gmail.com> Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com> Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>

matthewtodd

nit: Maybe remove the big First Commit header from the PR description now?

Reviewed 5 of 7 files at r23, 23 of 23 files at r24.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @Azhng, @jordanlewis, and @maryliag)

pkg/sql/sqlstats/persistedsqlstats/provider.go, line 100 at r24 (raw file):

			select {
			case <-resetIntervalChanged:
				// Recalculate flush interval.

IIUC, in this case we stop waiting, flush immediately, and then the post statement of the for loop calculates the next flush interval / wait period, yes? That might be worth spelling out a bit more in this comment. I had to scratch my head for a bit.

pkg/sql/sqlstats/persistedsqlstats/provider.go, line 103 at r24 (raw file):

				continue
			case <-timer.C:
				timer.Read = true

This is the "normal" case, right? Can we put it first in the list?

pkg/sql/sqlstats/persistedsqlstats/provider.go, line 104 at r24 (raw file):

			case <-timer.C:
				timer.Read = true
			case <-s.memoryPressureSignal:

I could use a comment here, too. Would it be something like, "We're experiencing memory pressure, so flush SQL stats to disk immediately, rather than waiting the full flush interval, in an attempt to relieve some of that pressure?"

pkg/sql/sqlstats/persistedsqlstats/provider.go, line 120 at r24 (raw file):

// [(1 - SQLStatsFlushJitter) * SQLStatsFlushInterval),
//  (1 + SQLStatsFlushJitter) * SQLStatsFlushInterval)]
func (s *PersistedSQLStats) getWaitPeriod() time.Duration {

I see how (with the jitter) this is different from the "flush interval," but I wonder if reusing those words would help clarify a little more. So maybe this method could be called nextFlushInterval or something?

This commit introduces a new persisted sql stats subsystem that wraps the existing node-local sql stats subsystem. This new subsystem is responsible for flushing the in-meory statistics into the system table periodically, or when it detects memory pressure. This replaces sql.Server's in-memory sqlStats provider. Release note (sql change): SQL stats now can be persisted into system.statement_statistics and system.transaction_statistics tables by enabling the sql.stats.flush.enable cluster setting. The interval of persistence is determined by the new sql.stats.flush.interval cluster setting which defaults to 1 hour.

Azhng

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @jordanlewis, @maryliag, and @matthewtodd)

pkg/sql/sqlstats/persistedsqlstats/provider.go, line 100 at r24 (raw file):

Previously, matthewtodd (Matthew Todd) wrote…

IIUC, in this case we stop waiting, flush immediately, and then the post statement of the for loop calculates the next flush interval / wait period, yes? That might be worth spelling out a bit more in this comment. I had to scratch my head for a bit.

In this case, we would stop waiting but we do not flush immediately. The continue here would cause the loop to restart and skip the flush. I'll add a comment here.

pkg/sql/sqlstats/persistedsqlstats/provider.go, line 103 at r24 (raw file):

Previously, matthewtodd (Matthew Todd) wrote…

This is the "normal" case, right? Can we put it first in the list?

Done.

pkg/sql/sqlstats/persistedsqlstats/provider.go, line 104 at r24 (raw file):

Previously, matthewtodd (Matthew Todd) wrote…

I could use a comment here, too. Would it be something like, "We're experiencing memory pressure, so flush SQL stats to disk immediately, rather than waiting the full flush interval, in an attempt to relieve some of that pressure?"

Done.

pkg/sql/sqlstats/persistedsqlstats/provider.go, line 120 at r24 (raw file):

Previously, matthewtodd (Matthew Todd) wrote…

I see how (with the jitter) this is different from the "flush interval," but I wonder if reusing those words would help clarify a little more. So maybe this method could be called nextFlushInterval or something?

Done.

matthewtodd

Reviewed 7 of 7 files at r25.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @Azhng, @jordanlewis, and @maryliag)

pkg/sql/sqlstats/persistedsqlstats/provider.go, line 100 at r24 (raw file):

Previously, Azhng (Archer Zhang) wrote…

In this case, we would stop waiting but we do not flush immediately. The continue here would cause the loop to restart and skip the flush. I'll add a comment here.

Oh, I see. That's interesting. I'm curious to know if there's a particular reason we don't flush here; is there something we're trying to protect against?

Azhng

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @jordanlewis, @maryliag, and @matthewtodd)

pkg/sql/sqlstats/persistedsqlstats/provider.go, line 100 at r24 (raw file):

Previously, matthewtodd (Matthew Todd) wrote…

Oh, I see. That's interesting. I'm curious to know if there's a particular reason we don't flush here; is there something we're trying to protect against?

The reset interval change occurs when users run SET CLUSTER SETTING sql.stats.flush.interval = ... command. I think this would be a bit of unexpected if resetting this cluster setting causes flush to happen, since neither the timer has fired nor has the memory limit has been reached.

matthewtodd

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @jordanlewis, and @maryliag)

pkg/sql/sqlstats/persistedsqlstats/provider.go, line 100 at r24 (raw file):

Previously, Azhng (Archer Zhang) wrote…

The reset interval change occurs when users run SET CLUSTER SETTING sql.stats.flush.interval = ... command. I think this would be a bit of unexpected if resetting this cluster setting causes flush to happen, since neither the timer has fired nor has the memory limit has been reached.

👍

matthewtodd

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @jordanlewis, and @maryliag)

Azhng · 2021-08-10T16:50:06Z

TFTR!

bors r+

craig · 2021-08-10T19:54:43Z

Build succeeded:

GitHub CI (Cockroach)

68434: sql,jobs: create SQL Stats Compaction Job and resumer r=ajwerner,maryliag a=Azhng Previous PR #67090 Followed by #68401 ## First Commit sql,util: refactor checking for running jobs to its own util package Release note: None ## Second Commit sql,jobs: create SQL Stats Compaction Job and resumer This commit introduces the SQL Stats Compaction job type and a barebones implementation of the SQL Stats compaction. Release justification: category 4 Release note: None 69294: changefeedccl: small test changes r=stevendanna a=stevendanna Two small test changes: - Use log.TestingClearServerIdentifiers() to avoid opening a new log scope. - Don't start a schema registry when the kafka sink will already start one for us. Release justification: low risk, test only changes (category 1) Release note: None Co-authored-by: Azhng <archer.xn@gmail.com> Co-authored-by: Steven Danna <danna@cockroachlabs.com>

Azhng force-pushed the sqlstat-flush branch 9 times, most recently from e2e4ae2 to 6049ab3 Compare July 5, 2021 20:25

Azhng requested a review from a team July 5, 2021 21:19

Azhng marked this pull request as ready for review July 5, 2021 21:19

Azhng requested a review from a team July 5, 2021 21:19

Azhng requested a review from a team as a code owner July 5, 2021 21:19

Azhng requested review from dt and removed request for a team and dt July 5, 2021 21:19

This was referenced Jul 5, 2021

sql: create system tables for SQL stats #65374

Merged

sql: persistent SQL Stats main tracking issue #64743

Closed

Azhng requested a review from ajwerner July 6, 2021 17:53

Azhng force-pushed the sqlstat-flush branch 6 times, most recently from f5e5b0d to 70815a2 Compare July 12, 2021 21:48

ajwerner reviewed Jul 14, 2021

View reviewed changes

Azhng force-pushed the sqlstat-flush branch 2 times, most recently from ea32d16 to b34c11c Compare July 14, 2021 22:13

Azhng force-pushed the sqlstat-flush branch 2 times, most recently from cc95d99 to 92f2f92 Compare July 26, 2021 21:37

Azhng force-pushed the sqlstat-flush branch from 92f2f92 to 69eb34f Compare July 27, 2021 16:26

Azhng requested a review from a team as a code owner July 27, 2021 16:26

tbg removed the request for review from a team July 28, 2021 08:34

Azhng force-pushed the sqlstat-flush branch 2 times, most recently from caee1af to 55b15b0 Compare August 3, 2021 18:09

Azhng mentioned this pull request Aug 3, 2021

sql: create sql stats compaction scheduled job #68401

Merged

Azhng force-pushed the sqlstat-flush branch from 55b15b0 to a8acfc9 Compare August 4, 2021 16:47

Azhng mentioned this pull request Aug 4, 2021

sql,jobs: create SQL Stats Compaction Job and resumer #68434

Merged

Azhng force-pushed the sqlstat-flush branch from a8acfc9 to 13794ee Compare August 5, 2021 13:38

Azhng force-pushed the sqlstat-flush branch from 13794ee to e64f0cc Compare August 5, 2021 16:59

matthewtodd suggested changes Aug 5, 2021

View reviewed changes

Azhng force-pushed the sqlstat-flush branch from e64f0cc to 9104fd1 Compare August 6, 2021 15:59

Azhng commented Aug 6, 2021

View reviewed changes

matthewtodd reviewed Aug 6, 2021

View reviewed changes

Azhng commented Aug 6, 2021

View reviewed changes

matthewtodd reviewed Aug 6, 2021

View reviewed changes

matthewtodd approved these changes Aug 10, 2021

View reviewed changes

craig bot merged commit f28f98f into cockroachdb:master Aug 10, 2021

jseldess mentioned this pull request Sep 8, 2021

sql: periodically flush sqlstats cockroachdb/docs#11415

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: periodically flush sqlstats #67090

sql: periodically flush sqlstats #67090

Azhng commented Jun 30, 2021 •

edited

cockroach-teamcity commented Jun 30, 2021

ajwerner left a comment

Azhng commented Jul 26, 2021

jordanlewis commented Jul 27, 2021

Azhng commented Jul 27, 2021

Azhng commented Aug 3, 2021

matthewtodd left a comment

Azhng left a comment

matthewtodd left a comment

Azhng left a comment

matthewtodd left a comment

matthewtodd left a comment

Azhng commented Aug 10, 2021

craig bot commented Aug 10, 2021

sql: periodically flush sqlstats #67090

sql: periodically flush sqlstats #67090

Conversation

Azhng commented Jun 30, 2021 • edited

cockroach-teamcity commented Jun 30, 2021

ajwerner left a comment

Choose a reason for hiding this comment

Azhng commented Jul 26, 2021

jordanlewis commented Jul 27, 2021

Azhng commented Jul 27, 2021

Azhng commented Aug 3, 2021

matthewtodd left a comment

Choose a reason for hiding this comment

Azhng left a comment

Choose a reason for hiding this comment

matthewtodd left a comment

Choose a reason for hiding this comment

Azhng left a comment

Choose a reason for hiding this comment

matthewtodd left a comment

Choose a reason for hiding this comment

matthewtodd left a comment

Choose a reason for hiding this comment

Azhng commented Aug 10, 2021

craig bot commented Aug 10, 2021

Azhng commented Jun 30, 2021 •

edited