Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: periodically flush sqlstats #67090

Merged
merged 1 commit into from Aug 10, 2021
Merged

Conversation

Azhng
Copy link
Contributor

@Azhng Azhng commented Jun 30, 2021

Previous PR: #67866

This commit introduces a new persisted sql stats subsystem
that wraps the existing node-local sql stats subsystem.
This new subsystem is responsible for flushing the in-meory
statistics into the system table periodically, or when it
detects memory pressure.
This replaces sql.Server's in-memory sqlStats provider.

Release note (sql change): SQL stats now can be persisted into
system.statement_statistics and system.transaction_statistics
tables by enabling the sql.stats.flush.enable cluster setting.
The interval of persistence is determined by the new
sql.stats.flush.interval cluster setting which defaults to 1 hour.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@Azhng Azhng force-pushed the sqlstat-flush branch 9 times, most recently from e2e4ae2 to 6049ab3 Compare July 5, 2021 20:25
@Azhng Azhng requested a review from a team July 5, 2021 21:19
@Azhng Azhng marked this pull request as ready for review July 5, 2021 21:19
@Azhng Azhng requested a review from a team July 5, 2021 21:19
@Azhng Azhng requested a review from a team as a code owner July 5, 2021 21:19
@Azhng Azhng requested review from dt and removed request for a team and dt July 5, 2021 21:19
@Azhng Azhng requested a review from ajwerner July 6, 2021 17:53
@Azhng Azhng force-pushed the sqlstat-flush branch 6 times, most recently from f5e5b0d to 70815a2 Compare July 12, 2021 21:48
Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @Azhng)


pkg/sql/sqlstats/persistedsqlstats/sqlstatsutil/json_decoding.go, line 136 at r1 (raw file):

	metadata json.JSON, result *roachpb.CollectedStatementStatistics,
) error {
	err := decodeString(metadata, "stmtTyp", &result.Stats.SQLType)

You've built a lot of machinery here but it still ends up being quite verbose to use. I think I'd rework it to be more composable. You can probably also rearrange it such that the decoding and encoding can be symmetrical for each of your top-level data types. Consider the following approach:

type jsonDecoder interface {
    decodeJSON(js json.JSON) error
}

type decimal tree.DDecimal

func (d *decimal) decodeJSON(js json.JSON) error {
    var ok bool
    decimal := (*apd.Decimal)(d)
    *decimal, ok = field.AsDecimal()
    if !ok {
	return nil, errors.New("unable to decode float64")
     }
    return nil
}

type jsonFloat float64

func (f *jsonFloat) decodeJSON(js json.JSON) error {
        var d apd.Decimal
        if err := (*decimal)(d).decodeJSON(js); err != nil {
            return err
        }
        asFloat, err := d.AsFloat()
        if err != nil {
            return err
        }
        *f = jsonFloat(asFloat)
       return nil
}

func (jf jsonField) (js json.JSON) (err error) {
        defer func() {
            if err != nil {
                err = errors.Wrapf(err, "decoding field %s", field)
            }
        }()
    	field, err := safeFetchVal(js, jf.field)
	if err != nil {
		return err
	}
        return jf.dec.decodeJSON(field)
}

type jsonField struct {
    field string
    dec  jsonDecoder
}

type jsonFields []jsonField

func (jf jsonFeilds) decodeJSON(js json.JSON) error {
    for i := range jf {
        if err := jf[i].decodeJSON(js); err != nil {
            return err
        }
    }
}

func (n *numeric) decodeJSON(obj json.JSON) error {
        fields := [...]jsonField{
            {"mean", (*jsonFloat)(&result.Mean)},
            {"sqDiff", (*jsonFloat)(&result.SquaredDiffs)},
        }
        return (jsonFields)(fields[:]).decodeJSON(obj)
}

//...

@Azhng Azhng force-pushed the sqlstat-flush branch 2 times, most recently from ea32d16 to b34c11c Compare July 14, 2021 22:13
@Azhng Azhng force-pushed the sqlstat-flush branch 2 times, most recently from cc95d99 to 92f2f92 Compare July 26, 2021 21:37
@Azhng
Copy link
Contributor Author

Azhng commented Jul 26, 2021

TPCC-1000 Benchmark on 3-node roachprod (16 vCPU each):

This PR:

Audit check 9.2.1.7: PASS
Audit check 9.2.2.5.1: PASS
Audit check 9.2.2.5.2: PASS
Audit check 9.2.2.5.3: PASS
Audit check 9.2.2.5.4: PASS
Audit check 9.2.2.5.5: PASS
Audit check 9.2.2.5.6: PASS

_elapsed_______tpmC____efc__avg(ms)__p50(ms)__p90(ms)__p95(ms)__p99(ms)_pMax(ms)
 7200.0s    12586.2  97.9%     30.3     22.0     28.3     35.7     60.8   4563.4


Audit check 9.2.1.7: PASS
Audit check 9.2.2.5.1: PASS
Audit check 9.2.2.5.2: PASS
Audit check 9.2.2.5.3: PASS
Audit check 9.2.2.5.4: PASS
Audit check 9.2.2.5.5: PASS
Audit check 9.2.2.5.6: PASS

_elapsed_______tpmC____efc__avg(ms)__p50(ms)__p90(ms)__p95(ms)__p99(ms)_pMax(ms)
 7200.0s    12583.8  97.9%     28.3     21.0     28.3     33.6     58.7   3892.3

Seems like the impact increasing the avg latency by 2ms.

TPCC runs for 2 hours and I changed the flush interval to 15 mins so it's easier to observe through custom time series chart:

Total flush count:

Screen Shot 2021-07-26 at 19 10 19

Average flush duration:

Screen Shot 2021-07-26 at 19 10 36

In-memory footprint

Screen Shot 2021-07-26 at 19 10 41

cc: @jordanlewis

@jordanlewis
Copy link
Member

Very nice!

The average flush duration seems a little suspicious - 20 microseconds is too fast for an entire roundtrip to SQL. Can you double check that?

@Azhng Azhng requested a review from a team as a code owner July 27, 2021 16:26
@Azhng
Copy link
Contributor Author

Azhng commented Jul 27, 2021

Ah good catch. Seems like I used the wrong Histogram. Updated!

Screen Shot 2021-07-27 at 12 25 02

Flush duration chart with 10s flush interval ^

@tbg tbg removed the request for review from a team July 28, 2021 08:34
@Azhng
Copy link
Contributor Author

Azhng commented Aug 3, 2021

I gathered more data from the master branch and the flush branch (with different flush interval configured) using TPCC-1000.

It seems like we are not creating too much impact in terms of latency. Master averages 59.05ms p99 latency, the flush15 (15 mins flush interval) and flush60 averages 59.79ms and 60.27ms p99 latency respectively. The increase of p99 latency is below 2ms.

Though I'm choosing to ignore the pMax latency here because it seems like we are seeing a huge fluctuation of its value even on the master branch. So it serves as a poor indicator of the overall performance.

# experiement  elapsed    tpmC     efc       avg(ms)  p50(ms)  p90(ms)  p95(ms)  p99(ms)pMax(ms)
master 1       7200.0s    12583.8  97.9%     28.3     21.0     28.3     33.6     58.7   3892.3
master 2       7200.0s    12588.1  97.9%     31.0     22.0     29.4     33.6     60.8   6979.3
master 3       7200.0s    12588.8  97.9%     30.1     22.0     29.4     35.7     58.7   4563.4
master 4       7200.0s    12585.5  97.9%     33.5     21.0     28.3     33.6     60.8   6442.5
master 5       7200.0s    12589.9  97.9%     28.9     19.9     25.2     30.4     54.5   6710.9
master 6       7200.0s    12592.3  97.9%     29.3     22.0     28.3     33.6     60.8   4295.0

flush 15min 1  7200.0s    12586.2  97.9%     30.3     22.0     28.3     35.7     60.8   4563.4
flush 15min 2  7200.0s    12587.8  97.9%     33.6     21.0     28.3     32.5     58.7   8053.1
flush 15min 3  7200.0s    12589.8  97.9%     27.5     21.0     27.3     31.5     54.5   3758.1
flush 15min 4  7200.0s    12582.1  97.8%     30.3     22.0     29.4     35.7     62.9   4563.4
flush 15min 5  7200.0s    12584.7  97.9%     34.8     22.0     29.4     35.7     62.9   7247.8

flush 60min 1  7200.0s    12586.6  97.9%     30.6     22.0     30.4     37.7     65.0   4563.4
flush 60min 2  7200.0s    12591.6  97.9%     27.9     19.9     26.2     30.4     54.5   4295.0
flush 60min 3  7200.0s    12589.6  97.9%     32.9     21.0     27.3     32.5     56.6   6442.5
flush 60min 4  7200.0s    12587.4  97.9%     29.0     21.0     28.3     33.6     65.0   4563.4

Overall I think this is good news for us. I think the next step for me is to try to overload the cluster and see how the performance varies.

@Azhng Azhng force-pushed the sqlstat-flush branch 2 times, most recently from caee1af to 55b15b0 Compare August 3, 2021 18:09
craig bot pushed a commit that referenced this pull request Aug 5, 2021
67866: sql: implement SQL Stats flush logic r=Azhng a=Azhng

Previous PR: #67805
Next Chained PR: #67090

## First Commit

sql: remove `count` from stmt/txn stats system table

Previously, system.statement_statistics and
system.transaction_statistics table includes a `count` column
that corresponds to `roachpb.StatementStatistics.Count` and
`roachpb.TransactionStatistics.Count` fields respectively.
The objective for that column is to make
`INSERT ON CONFLICT DO UPDATE` style query easy. However,
since early prototyping have shown that
`INSERT ON CONFLICT DO UPDATE` style statement is quite inefficient,
the SQL Stats flush mechanism will be implemented using
separate queries INSERT and UPDATE statements.
This column is no longer userful and it would require special handling.
Removing this column simplifies the flush logic and removes the
need for special handlings.

Release note (sql change): count column is removed from
 system.statement_statistics and system.transaction_statistics
 tables.

## Second Commit

sql: implement persistedsqlstats flush logic

This commit implements the initial flush logic of the
persisted sql stats subsystem.

Release note: None

68426: kv: assert txn unused in SetFixedTimestamp r=nvanbenschoten a=nvanbenschoten

This commit asserts that a transaction has not been used to read or to
write by the time that `SetFixedTimestamp` is called on it.

This was extracted from #68194 and modified to return an error from
`SetFixedTimestamp` on misuse instead of fatal-ing. This provides a
sufficient, temporary backstop for #68216 until the conn executor logic
is fixed:

```
root@127.0.0.1:26257/movr> create table t (x int);
CREATE TABLE

root@127.0.0.1:26257/movr> insert into t values (1);
INSERT 1

root@127.0.0.1:26257/movr> select crdb_internal_mvcc_timestamp, * from t;
   crdb_internal_mvcc_timestamp  | x
---------------------------------+----
  1628094563935439000.0000000000 | 1
(1 row)

root@127.0.0.1:26257/movr> begin as of system time (1628094563935439000.0000000000-1)::string;
BEGIN

root@127.0.0.1:26257/movr  OPEN> select * from t;
  x
-----
(0 rows)

root@127.0.0.1:26257/movr  OPEN> prepare y as select * from t as of system time 1628094563935439000.0000000000;
ERROR: internal error: cannot set fixed timestamp, txn "sql txn" meta={id=e5e81c19 pri=0.01517572 epo=0 ts=1628094563.935438999,0 min=1628094563.935438999,0 seq=0} lock=false stat=PENDING rts=1628094563.935438999,0 wto=false gul=1628094563.935438999,0 already performed reads
SQLSTATE: XX000
DETAIL: stack trace:
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_coord_sender.go:1016: SetFixedTimestamp()
github.com/cockroachdb/cockroach/pkg/kv/txn.go:1200: SetFixedTimestamp()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_prepare.go:278: populatePrepared()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_prepare.go:220: func1()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_prepare.go:226: prepare()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_prepare.go:112: addPreparedStmt()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:570: execStmtInOpenState()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:126: execStmt()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1626: func1()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1628: execCmd()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1550: run()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:627: ServeConn()
github.com/cockroachdb/cockroach/pkg/sql/pgwire/conn.go:645: func1()
runtime/asm_amd64.s:1371: goexit()

HINT: You have encountered an unexpected error.

Please check the public issue tracker to check whether this problem is
already tracked. If you cannot find it there, please report the error
with details by creating a new issue.

If you would rather not post publicly, please contact us directly
using the support form.

We appreciate your feedback.

root@127.0.0.1:26257/? ERROR>
```

68442: kv: include RangeID in rangefeed goroutine stacks r=nvanbenschoten a=nvanbenschoten

This commit includes the RangeID in each of a rangefeed processor and
its registations' associated goroutine stacks. This is a cheap and easy
way to get better observability into the ranges that have active
rangefeeds. It also tells us where those goroutines are spending their
time.

This will also become easier to use in Go 1.17, which improved the
format of stack traces.

68443: parser: add VIRTUAL syntax to help r=RaduBerinde a=rafiss

Release note: None

Co-authored-by: Azhng <archer.xn@gmail.com>
Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>
Copy link
Contributor

@matthewtodd matthewtodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Maybe remove the big First Commit header from the PR description now?

Reviewed 5 of 7 files at r23, 23 of 23 files at r24.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @Azhng, @jordanlewis, and @maryliag)


pkg/sql/sqlstats/persistedsqlstats/provider.go, line 100 at r24 (raw file):

			select {
			case <-resetIntervalChanged:
				// Recalculate flush interval.

IIUC, in this case we stop waiting, flush immediately, and then the post statement of the for loop calculates the next flush interval / wait period, yes? That might be worth spelling out a bit more in this comment. I had to scratch my head for a bit.


pkg/sql/sqlstats/persistedsqlstats/provider.go, line 103 at r24 (raw file):

				continue
			case <-timer.C:
				timer.Read = true

This is the "normal" case, right? Can we put it first in the list?


pkg/sql/sqlstats/persistedsqlstats/provider.go, line 104 at r24 (raw file):

			case <-timer.C:
				timer.Read = true
			case <-s.memoryPressureSignal:

I could use a comment here, too. Would it be something like, "We're experiencing memory pressure, so flush SQL stats to disk immediately, rather than waiting the full flush interval, in an attempt to relieve some of that pressure?"


pkg/sql/sqlstats/persistedsqlstats/provider.go, line 120 at r24 (raw file):

// [(1 - SQLStatsFlushJitter) * SQLStatsFlushInterval),
//  (1 + SQLStatsFlushJitter) * SQLStatsFlushInterval)]
func (s *PersistedSQLStats) getWaitPeriod() time.Duration {

I see how (with the jitter) this is different from the "flush interval," but I wonder if reusing those words would help clarify a little more. So maybe this method could be called nextFlushInterval or something?

This commit introduces a new persisted sql stats subsystem
that wraps the existing node-local sql stats subsystem.
This new subsystem is responsible for flushing the in-meory
statistics into the system table periodically, or when it
detects memory pressure.
This replaces sql.Server's in-memory sqlStats provider.

Release note (sql change): SQL stats now can be persisted into
system.statement_statistics and system.transaction_statistics
tables by enabling the sql.stats.flush.enable cluster setting.
The interval of persistence is determined by the new
sql.stats.flush.interval cluster setting which defaults to 1 hour.
Copy link
Contributor Author

@Azhng Azhng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @jordanlewis, @maryliag, and @matthewtodd)


pkg/sql/sqlstats/persistedsqlstats/provider.go, line 100 at r24 (raw file):

Previously, matthewtodd (Matthew Todd) wrote…

IIUC, in this case we stop waiting, flush immediately, and then the post statement of the for loop calculates the next flush interval / wait period, yes? That might be worth spelling out a bit more in this comment. I had to scratch my head for a bit.

In this case, we would stop waiting but we do not flush immediately. The continue here would cause the loop to restart and skip the flush. I'll add a comment here.


pkg/sql/sqlstats/persistedsqlstats/provider.go, line 103 at r24 (raw file):

Previously, matthewtodd (Matthew Todd) wrote…

This is the "normal" case, right? Can we put it first in the list?

Done.


pkg/sql/sqlstats/persistedsqlstats/provider.go, line 104 at r24 (raw file):

Previously, matthewtodd (Matthew Todd) wrote…

I could use a comment here, too. Would it be something like, "We're experiencing memory pressure, so flush SQL stats to disk immediately, rather than waiting the full flush interval, in an attempt to relieve some of that pressure?"

Done.


pkg/sql/sqlstats/persistedsqlstats/provider.go, line 120 at r24 (raw file):

Previously, matthewtodd (Matthew Todd) wrote…

I see how (with the jitter) this is different from the "flush interval," but I wonder if reusing those words would help clarify a little more. So maybe this method could be called nextFlushInterval or something?

Done.

Copy link
Contributor

@matthewtodd matthewtodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 7 of 7 files at r25.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @Azhng, @jordanlewis, and @maryliag)


pkg/sql/sqlstats/persistedsqlstats/provider.go, line 100 at r24 (raw file):

Previously, Azhng (Archer Zhang) wrote…

In this case, we would stop waiting but we do not flush immediately. The continue here would cause the loop to restart and skip the flush. I'll add a comment here.

Oh, I see. That's interesting. I'm curious to know if there's a particular reason we don't flush here; is there something we're trying to protect against?

Copy link
Contributor Author

@Azhng Azhng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @jordanlewis, @maryliag, and @matthewtodd)


pkg/sql/sqlstats/persistedsqlstats/provider.go, line 100 at r24 (raw file):

Previously, matthewtodd (Matthew Todd) wrote…

Oh, I see. That's interesting. I'm curious to know if there's a particular reason we don't flush here; is there something we're trying to protect against?

The reset interval change occurs when users run SET CLUSTER SETTING sql.stats.flush.interval = ... command. I think this would be a bit of unexpected if resetting this cluster setting causes flush to happen, since neither the timer has fired nor has the memory limit has been reached.

Copy link
Contributor

@matthewtodd matthewtodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @jordanlewis, and @maryliag)


pkg/sql/sqlstats/persistedsqlstats/provider.go, line 100 at r24 (raw file):

Previously, Azhng (Archer Zhang) wrote…

The reset interval change occurs when users run SET CLUSTER SETTING sql.stats.flush.interval = ... command. I think this would be a bit of unexpected if resetting this cluster setting causes flush to happen, since neither the timer has fired nor has the memory limit has been reached.

👍

Copy link
Contributor

@matthewtodd matthewtodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @jordanlewis, and @maryliag)

@Azhng
Copy link
Contributor Author

Azhng commented Aug 10, 2021

TFTR!

bors r+

@craig craig bot merged commit f28f98f into cockroachdb:master Aug 10, 2021
@craig
Copy link
Contributor

craig bot commented Aug 10, 2021

Build succeeded:

craig bot pushed a commit that referenced this pull request Aug 24, 2021
68434: sql,jobs: create SQL Stats Compaction Job and resumer r=ajwerner,maryliag a=Azhng

Previous PR #67090
Followed by #68401

## First Commit 

sql,util: refactor checking for running jobs to its own util package

Release note: None


## Second Commit

sql,jobs: create SQL Stats Compaction Job and resumer

This commit introduces the SQL Stats Compaction job type
and a barebones implementation of the SQL Stats compaction.

Release justification: category 4

Release note: None

69294: changefeedccl: small test changes r=stevendanna a=stevendanna

Two small test changes:

- Use log.TestingClearServerIdentifiers() to avoid opening a new log
  scope.

- Don't start a schema registry when the kafka sink will already start
  one for us.

Release justification: low risk, test only changes (category 1)
Release note: None

Co-authored-by: Azhng <archer.xn@gmail.com>
Co-authored-by: Steven Danna <danna@cockroachlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants