-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
workload: add TPC-DS queries and workload #37465
Conversation
Note that the queries were generated for scale factor 1 (I forgot about that :/ ), but I think still it'll be useful to merge this since we're performing terrible even on 1 GB dataset on some queries (#37464). |
62d4821
to
33f6143
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't really look at any of the individual queries in queries.go, but everything else looks pretty good. Just a few comment
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @danhhz and @yuzefovich)
pkg/workload/tpcds/import.sql, line 5 at r1 (raw file):
-- 10 - about 1GB and 10GB respectively - are available at the moment): -- -- RESTORE DATABASE tpcds FROM 'gs://cockroach-fixtures/workload/tpcds/scalefactor=1/backup';
Can you include info here about how these were generated?
pkg/workload/tpcds/tpcds.go, line 67 at r1 (raw file):
g.flags.StringVar(&g.queriesToRunRaw, `queries`, ``, `Queries to run. Use a comma separated list of query numbers. If omitted, all queries are run`)
I'd document in one of these two flags what the precedence is if a query is in both of them
pkg/workload/tpcds/tpcds.go, line 127 at r1 (raw file):
// Tables implements the Generator interface. func (w *tpcds) Tables() []workload.Table { return []workload.Table{} }
hmm, I haven't tested it but I think if you tried to run a workload init
, workload fixtures make
, or workload fixtures import
, this would look like it succeeded when really it did nothing. I think if you define the tables with only names and schemas (but leave InitialRows unset), it'll make those tools return a helpful error
pkg/workload/tpcds/tpcds.go, line 170 at r1 (raw file):
start := timeutil.Now() if w.config.queryTimeLimit > 0 { ch := make(chan interface{}, 2)
ctxgroup is awesome for stuff like this
g := ctxgroup.WithContext(ctx)
g.GoCtx(func (ctx context.Context) error) {
rows, err = w.db.QueryContext(ctx, query)
return err
})
g.GoCtx(func (ctx context.Context) error {
select {
case <-ctx.Done():
return nil
case <- time.After(w.config.queryTimeLimit)
return errors.Errorf(`query %d timed out`, queryNum)
}
})
if err := g.Wait(); err != nil {
return err
}
If any of the GoCtx calls return an error, it will cancel the context passed to the rest of them. The QueryContext method on db will turn that into cancelling your query. I think you also want a time.Timer instead of the time.After so you can Stop it when the query is successful, but I don't recall how you use it offhand
pkg/workload/tpcds/tpcds.go, line 202 at r1 (raw file):
return err } elapsed := timeutil.Since(start)
don't you want a histogram here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @danhhz)
pkg/workload/tpcds/import.sql, line 5 at r1 (raw file):
Previously, danhhz (Daniel Harrison) wrote…
Can you include info here about how these were generated?
I wrote up my steps on a confluence page (https://cockroachlabs.atlassian.net/wiki/spaces/ENG/pages/122290457/DRAFT+Generating+TPC-DS+dataset). I added a short comment about that page.
pkg/workload/tpcds/tpcds.go, line 67 at r1 (raw file):
Previously, danhhz (Daniel Harrison) wrote…
I'd document in one of these two flags what the precedence is if a query is in both of them
Good point, done.
pkg/workload/tpcds/tpcds.go, line 127 at r1 (raw file):
Previously, danhhz (Daniel Harrison) wrote…
hmm, I haven't tested it but I think if you tried to run a
workload init
,workload fixtures make
, orworkload fixtures import
, this would look like it succeeded when really it did nothing. I think if you define the tables with only names and schemas (but leave InitialRows unset), it'll make those tools return a helpful error
You were right that previously it would silently or with little text do nothing that could look like a success, so I added the schemas as suggested. Now, workload init
still silently does nothing, workload fixtures make
gives a helpful Error: make fixture is not supported for workload tpcds
while workload fixtures import
gives weird Error: importing fixture: importing table web_sales: pq: syntax error at or near ")"
(with the name of the table being randomly chosen). Any suggestions on how to improve the errors further?
pkg/workload/tpcds/tpcds.go, line 170 at r1 (raw file):
Previously, danhhz (Daniel Harrison) wrote…
ctxgroup is awesome for stuff like this
g := ctxgroup.WithContext(ctx) g.GoCtx(func (ctx context.Context) error) { rows, err = w.db.QueryContext(ctx, query) return err }) g.GoCtx(func (ctx context.Context) error { select { case <-ctx.Done(): return nil case <- time.After(w.config.queryTimeLimit) return errors.Errorf(`query %d timed out`, queryNum) } }) if err := g.Wait(); err != nil { return err }
If any of the GoCtx calls return an error, it will cancel the context passed to the rest of them. The QueryContext method on db will turn that into cancelling your query. I think you also want a time.Timer instead of the time.After so you can Stop it when the query is successful, but I don't recall how you use it offhand
Hm, that doesn't seem to work - the query isn't getting cancelled. I ran through Goland debugger, and the second goroutine correctly returns error after the timeout, but g.Wait
appears to be blocked until all goroutines return. Also, the running query shows up in SHOW QUERIES
. I also tried timeoutCtx, cancel := context.WithCancel
, passing timeoutCtx
into ctxgroup.WithContext
, and cancelling timeoutCtx
in the second goroutine, but that didn't seem to do anything.
pkg/workload/tpcds/tpcds.go, line 202 at r1 (raw file):
Previously, danhhz (Daniel Harrison) wrote…
don't you want a histogram here?
I think that histograms are useful when the queries perform about the same amount of work and we want to see the different percentiles of latency with different concurrency settings. Here, however, I think the most useful information is simply how long a particular query took to run, and the queries have very different running times (some finish in less than a second, some don't complete in 15-20 minutes), so at the moment I see a single message from a single query run (whether it's an error or the time it took to complete the query) as the most useful information with histograms only cluttering the output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @danhhz and @yuzefovich)
pkg/workload/tpcds/tpcds.go, line 127 at r1 (raw file):
Previously, yuzefovich wrote…
You were right that previously it would silently or with little text do nothing that could look like a success, so I added the schemas as suggested. Now,
workload init
still silently does nothing,workload fixtures make
gives a helpfulError: make fixture is not supported for workload tpcds
whileworkload fixtures import
gives weirdError: importing fixture: importing table web_sales: pq: syntax error at or near ")"
(with the name of the table being randomly chosen). Any suggestions on how to improve the errors further?
Ah. The fixtures make error is coming from https://github.com/cockroachdb/cockroach/blob/master/pkg/ccl/workloadccl/fixture.go#L243-L249. Mind incorporting that same check into https://github.com/cockroachdb/cockroach/blob/master/pkg/ccl/workloadccl/fixture.go#L287? That will fix fixtures import.
Looks like init is trying to be too smart about this. Let's remove the NumBatches == 0
and continue
part of this if
: https://github.com/cockroachdb/cockroach/blob/master/pkg/workload/workload.go#L469-L471
Sorry about that, thought this has all already been fixed.
pkg/workload/tpcds/tpcds.go, line 170 at r1 (raw file):
Previously, yuzefovich wrote…
Hm, that doesn't seem to work - the query isn't getting cancelled. I ran through Goland debugger, and the second goroutine correctly returns error after the timeout, but
g.Wait
appears to be blocked until all goroutines return. Also, the running query shows up inSHOW QUERIES
. I also triedtimeoutCtx, cancel := context.WithCancel
, passingtimeoutCtx
intoctxgroup.WithContext
, and cancellingtimeoutCtx
in the second goroutine, but that didn't seem to do anything.
Yeah, g.Wait is supposed to block until all goroutines return. Looks like ctx cancellation still isn't hooked up through pgwire on a QueryContext call, so you'll have to do this the annoying way. Somewhere you'll have to grab the query id out of SHOW QUERIES and then make the case <- time.After(
branch call CANCEL QUERY. That should work.
the context.WithCancel
you added is unnecessary. returning a non-nil error from any of the GoCtx closures will do that for you
pkg/workload/tpcds/tpcds.go, line 202 at r1 (raw file):
Previously, yuzefovich wrote…
I think that histograms are useful when the queries perform about the same amount of work and we want to see the different percentiles of latency with different concurrency settings. Here, however, I think the most useful information is simply how long a particular query took to run, and the queries have very different running times (some finish in less than a second, some don't complete in 15-20 minutes), so at the moment I see a single message from a single query run (whether it's an error or the time it took to complete the query) as the most useful information with histograms only cluttering the output.
hmm, i think the percentiles would still be useful, but i agree that the histogram would currently print out too often and would be spammy. mind leaving this as a TODO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @danhhz)
pkg/workload/tpcds/tpcds.go, line 127 at r1 (raw file):
Previously, danhhz (Daniel Harrison) wrote…
Ah. The fixtures make error is coming from https://github.com/cockroachdb/cockroach/blob/master/pkg/ccl/workloadccl/fixture.go#L243-L249. Mind incorporting that same check into https://github.com/cockroachdb/cockroach/blob/master/pkg/ccl/workloadccl/fixture.go#L287? That will fix fixtures import.
Looks like init is trying to be too smart about this. Let's remove the
NumBatches == 0
andcontinue
part of thisif
: https://github.com/cockroachdb/cockroach/blob/master/pkg/workload/workload.go#L469-L471Sorry about that, thought this has all already been fixed.
Thanks for the pointers, now the errors are helpful :)
pkg/workload/tpcds/tpcds.go, line 170 at r1 (raw file):
Previously, danhhz (Daniel Harrison) wrote…
Yeah, g.Wait is supposed to block until all goroutines return. Looks like ctx cancellation still isn't hooked up through pgwire on a QueryContext call, so you'll have to do this the annoying way. Somewhere you'll have to grab the query id out of SHOW QUERIES and then make the
case <- time.After(
branch call CANCEL QUERY. That should work.the
context.WithCancel
you added is unnecessary. returning a non-nil error from any of the GoCtx closures will do that for you
As it turns out, there is an extremely easy way to enforce time limit - with statement_timeout
session variable, so no need for any of these complications we were considering.
pkg/workload/tpcds/tpcds.go, line 202 at r1 (raw file):
Previously, danhhz (Daniel Harrison) wrote…
hmm, i think the percentiles would still be useful, but i agree that the histogram would currently print out too often and would be spammy. mind leaving this as a TODO?
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @danhhz)
pkg/testutils/lint/lint_test.go, line 972 at r2 (raw file):
stream.GrepNot(`.*\.lock`), stream.GrepNot(`^storage\/engine\/rocksdb_error_dict\.go$`), stream.GrepNot(`^workload/tpcds/tpcds.go$`),
There is a spelling error in tpcdsStoreSchema
- "s_tax_precentage", so I disabled the misspelling check on the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two last comments. feel free to merge without another review from me once you've addressed them. great work!
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @danhhz and @yuzefovich)
pkg/workload/tpcds/tpcds.go, line 170 at r1 (raw file):
Previously, yuzefovich wrote…
As it turns out, there is an extremely easy way to enforce time limit - with
statement_timeout
session variable, so no need for any of these complications we were considering.
oh nice! as always, there's a hiccup. the database/sql.DB object you're using here pools connections, so there's nothing to guarantee that your SET runs on the same connection as your query. maybe the SetMaxOpenConns/SetMaxIdleConns means this does actually always work, but even if so, it's really brittle.
However, the good news is you can pass SET parameters as part of your connection string, so just stick this into your gosql.Open call above instead. I think you have to add it to each of the urls that go into the strings.Join call
pkg/workload/tpcds/tpcds.go, line 300 at r2 (raw file):
log.Infof(ctx, "[Q%d] return %d rows after %4.2f seconds", queryNum, numRows, elapsed.Seconds()) return rows.Close()
defer this instead so it gets called in the various error paths
pkg/workload/tpch/tpch.go, line 211 at r2 (raw file):
log.Infof(ctx, "[%s] return %d rows after %4.2f seconds:\n %s", queryName, numRows, elapsed.Seconds(), query) return rows.Close()
ditto. good catch by the way!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @danhhz)
pkg/workload/tpcds/tpcds.go, line 170 at r1 (raw file):
Previously, danhhz (Daniel Harrison) wrote…
oh nice! as always, there's a hiccup. the database/sql.DB object you're using here pools connections, so there's nothing to guarantee that your SET runs on the same connection as your query. maybe the SetMaxOpenConns/SetMaxIdleConns means this does actually always work, but even if so, it's really brittle.
However, the good news is you can pass SET parameters as part of your connection string, so just stick this into your gosql.Open call above instead. I think you have to add it to each of the urls that go into the strings.Join call
Thanks, this does the trick.
pkg/workload/tpcds/tpcds.go, line 300 at r2 (raw file):
Previously, danhhz (Daniel Harrison) wrote…
defer this instead so it gets called in the various error paths
Thanks, done.
pkg/workload/tpch/tpch.go, line 211 at r2 (raw file):
Previously, danhhz (Daniel Harrison) wrote…
ditto. good catch by the way!
Done.
Adds TPC-DS queries and a simple workload to run them. The dataset must be imported manually prior to running the workload. Some queries need modifications before they can be parsed by CRDB, so they are disabled. Release note: None
Thanks for the reviews and suggestions! bors r+ |
37465: workload: add TPC-DS queries and workload r=yuzefovich a=yuzefovich Adds TPC-DS queries and a simple workload to run them. The dataset must be imported manually prior to running the workload. Some queries need modifications before they can be parsed by CRDB, so they are disabled. Addresses: #37436. Release note: None Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Build succeeded |
Adds TPC-DS queries and a simple workload to run them. The dataset
must be imported manually prior to running the workload. Some
queries need modifications before they can be parsed by CRDB, so
they are disabled.
Addresses: #37436.
Release note: None