Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

workload: add TPC-DS queries and workload #37465

Merged
merged 1 commit into from
May 16, 2019
Merged

Conversation

yuzefovich
Copy link
Member

Adds TPC-DS queries and a simple workload to run them. The dataset
must be imported manually prior to running the workload. Some
queries need modifications before they can be parsed by CRDB, so
they are disabled.

Addresses: #37436.

Release note: None

@yuzefovich yuzefovich requested review from danhhz and a team May 11, 2019 01:38
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@yuzefovich
Copy link
Member Author

Note that the queries were generated for scale factor 1 (I forgot about that :/ ), but I think still it'll be useful to merge this since we're performing terrible even on 1 GB dataset on some queries (#37464).

@yuzefovich yuzefovich force-pushed the tpcds branch 5 times, most recently from 62d4821 to 33f6143 Compare May 12, 2019 03:19
Copy link
Contributor

@danhhz danhhz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't really look at any of the individual queries in queries.go, but everything else looks pretty good. Just a few comment

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @danhhz and @yuzefovich)


pkg/workload/tpcds/import.sql, line 5 at r1 (raw file):

-- 10 - about 1GB and 10GB respectively - are available at the moment):
--
--   RESTORE DATABASE tpcds FROM 'gs://cockroach-fixtures/workload/tpcds/scalefactor=1/backup';

Can you include info here about how these were generated?


pkg/workload/tpcds/tpcds.go, line 67 at r1 (raw file):

		g.flags.StringVar(&g.queriesToRunRaw, `queries`,
			``,
			`Queries to run. Use a comma separated list of query numbers. If omitted, all queries are run`)

I'd document in one of these two flags what the precedence is if a query is in both of them


pkg/workload/tpcds/tpcds.go, line 127 at r1 (raw file):

// Tables implements the Generator interface.
func (w *tpcds) Tables() []workload.Table { return []workload.Table{} }

hmm, I haven't tested it but I think if you tried to run a workload init, workload fixtures make, or workload fixtures import, this would look like it succeeded when really it did nothing. I think if you define the tables with only names and schemas (but leave InitialRows unset), it'll make those tools return a helpful error


pkg/workload/tpcds/tpcds.go, line 170 at r1 (raw file):

	start := timeutil.Now()
	if w.config.queryTimeLimit > 0 {
		ch := make(chan interface{}, 2)

ctxgroup is awesome for stuff like this

g := ctxgroup.WithContext(ctx)
g.GoCtx(func (ctx context.Context) error) {
    rows, err = w.db.QueryContext(ctx, query)
    return err
})
g.GoCtx(func (ctx context.Context) error {
    select {
    case <-ctx.Done():
        return nil
    case <- time.After(w.config.queryTimeLimit)
        return errors.Errorf(`query %d timed out`, queryNum)
    }
})
if err := g.Wait(); err != nil {
    return err
}

If any of the GoCtx calls return an error, it will cancel the context passed to the rest of them. The QueryContext method on db will turn that into cancelling your query. I think you also want a time.Timer instead of the time.After so you can Stop it when the query is successful, but I don't recall how you use it offhand


pkg/workload/tpcds/tpcds.go, line 202 at r1 (raw file):

		return err
	}
	elapsed := timeutil.Since(start)

don't you want a histogram here?

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @danhhz)


pkg/workload/tpcds/import.sql, line 5 at r1 (raw file):

Previously, danhhz (Daniel Harrison) wrote…

Can you include info here about how these were generated?

I wrote up my steps on a confluence page (https://cockroachlabs.atlassian.net/wiki/spaces/ENG/pages/122290457/DRAFT+Generating+TPC-DS+dataset). I added a short comment about that page.


pkg/workload/tpcds/tpcds.go, line 67 at r1 (raw file):

Previously, danhhz (Daniel Harrison) wrote…

I'd document in one of these two flags what the precedence is if a query is in both of them

Good point, done.


pkg/workload/tpcds/tpcds.go, line 127 at r1 (raw file):

Previously, danhhz (Daniel Harrison) wrote…

hmm, I haven't tested it but I think if you tried to run a workload init, workload fixtures make, or workload fixtures import, this would look like it succeeded when really it did nothing. I think if you define the tables with only names and schemas (but leave InitialRows unset), it'll make those tools return a helpful error

You were right that previously it would silently or with little text do nothing that could look like a success, so I added the schemas as suggested. Now, workload init still silently does nothing, workload fixtures make gives a helpful Error: make fixture is not supported for workload tpcds while workload fixtures import gives weird Error: importing fixture: importing table web_sales: pq: syntax error at or near ")" (with the name of the table being randomly chosen). Any suggestions on how to improve the errors further?


pkg/workload/tpcds/tpcds.go, line 170 at r1 (raw file):

Previously, danhhz (Daniel Harrison) wrote…

ctxgroup is awesome for stuff like this

g := ctxgroup.WithContext(ctx)
g.GoCtx(func (ctx context.Context) error) {
    rows, err = w.db.QueryContext(ctx, query)
    return err
})
g.GoCtx(func (ctx context.Context) error {
    select {
    case <-ctx.Done():
        return nil
    case <- time.After(w.config.queryTimeLimit)
        return errors.Errorf(`query %d timed out`, queryNum)
    }
})
if err := g.Wait(); err != nil {
    return err
}

If any of the GoCtx calls return an error, it will cancel the context passed to the rest of them. The QueryContext method on db will turn that into cancelling your query. I think you also want a time.Timer instead of the time.After so you can Stop it when the query is successful, but I don't recall how you use it offhand

Hm, that doesn't seem to work - the query isn't getting cancelled. I ran through Goland debugger, and the second goroutine correctly returns error after the timeout, but g.Wait appears to be blocked until all goroutines return. Also, the running query shows up in SHOW QUERIES. I also tried timeoutCtx, cancel := context.WithCancel, passing timeoutCtx into ctxgroup.WithContext, and cancelling timeoutCtx in the second goroutine, but that didn't seem to do anything.


pkg/workload/tpcds/tpcds.go, line 202 at r1 (raw file):

Previously, danhhz (Daniel Harrison) wrote…

don't you want a histogram here?

I think that histograms are useful when the queries perform about the same amount of work and we want to see the different percentiles of latency with different concurrency settings. Here, however, I think the most useful information is simply how long a particular query took to run, and the queries have very different running times (some finish in less than a second, some don't complete in 15-20 minutes), so at the moment I see a single message from a single query run (whether it's an error or the time it took to complete the query) as the most useful information with histograms only cluttering the output.

Copy link
Contributor

@danhhz danhhz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @danhhz and @yuzefovich)


pkg/workload/tpcds/tpcds.go, line 127 at r1 (raw file):

Previously, yuzefovich wrote…

You were right that previously it would silently or with little text do nothing that could look like a success, so I added the schemas as suggested. Now, workload init still silently does nothing, workload fixtures make gives a helpful Error: make fixture is not supported for workload tpcds while workload fixtures import gives weird Error: importing fixture: importing table web_sales: pq: syntax error at or near ")" (with the name of the table being randomly chosen). Any suggestions on how to improve the errors further?

Ah. The fixtures make error is coming from https://github.com/cockroachdb/cockroach/blob/master/pkg/ccl/workloadccl/fixture.go#L243-L249. Mind incorporting that same check into https://github.com/cockroachdb/cockroach/blob/master/pkg/ccl/workloadccl/fixture.go#L287? That will fix fixtures import.

Looks like init is trying to be too smart about this. Let's remove the NumBatches == 0 and continue part of this if: https://github.com/cockroachdb/cockroach/blob/master/pkg/workload/workload.go#L469-L471

Sorry about that, thought this has all already been fixed.


pkg/workload/tpcds/tpcds.go, line 170 at r1 (raw file):

Previously, yuzefovich wrote…

Hm, that doesn't seem to work - the query isn't getting cancelled. I ran through Goland debugger, and the second goroutine correctly returns error after the timeout, but g.Wait appears to be blocked until all goroutines return. Also, the running query shows up in SHOW QUERIES. I also tried timeoutCtx, cancel := context.WithCancel, passing timeoutCtx into ctxgroup.WithContext, and cancelling timeoutCtx in the second goroutine, but that didn't seem to do anything.

Yeah, g.Wait is supposed to block until all goroutines return. Looks like ctx cancellation still isn't hooked up through pgwire on a QueryContext call, so you'll have to do this the annoying way. Somewhere you'll have to grab the query id out of SHOW QUERIES and then make the case <- time.After( branch call CANCEL QUERY. That should work.

the context.WithCancel you added is unnecessary. returning a non-nil error from any of the GoCtx closures will do that for you


pkg/workload/tpcds/tpcds.go, line 202 at r1 (raw file):

Previously, yuzefovich wrote…

I think that histograms are useful when the queries perform about the same amount of work and we want to see the different percentiles of latency with different concurrency settings. Here, however, I think the most useful information is simply how long a particular query took to run, and the queries have very different running times (some finish in less than a second, some don't complete in 15-20 minutes), so at the moment I see a single message from a single query run (whether it's an error or the time it took to complete the query) as the most useful information with histograms only cluttering the output.

hmm, i think the percentiles would still be useful, but i agree that the histogram would currently print out too often and would be spammy. mind leaving this as a TODO?

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @danhhz)


pkg/workload/tpcds/tpcds.go, line 127 at r1 (raw file):

Previously, danhhz (Daniel Harrison) wrote…

Ah. The fixtures make error is coming from https://github.com/cockroachdb/cockroach/blob/master/pkg/ccl/workloadccl/fixture.go#L243-L249. Mind incorporting that same check into https://github.com/cockroachdb/cockroach/blob/master/pkg/ccl/workloadccl/fixture.go#L287? That will fix fixtures import.

Looks like init is trying to be too smart about this. Let's remove the NumBatches == 0 and continue part of this if: https://github.com/cockroachdb/cockroach/blob/master/pkg/workload/workload.go#L469-L471

Sorry about that, thought this has all already been fixed.

Thanks for the pointers, now the errors are helpful :)


pkg/workload/tpcds/tpcds.go, line 170 at r1 (raw file):

Previously, danhhz (Daniel Harrison) wrote…

Yeah, g.Wait is supposed to block until all goroutines return. Looks like ctx cancellation still isn't hooked up through pgwire on a QueryContext call, so you'll have to do this the annoying way. Somewhere you'll have to grab the query id out of SHOW QUERIES and then make the case <- time.After( branch call CANCEL QUERY. That should work.

the context.WithCancel you added is unnecessary. returning a non-nil error from any of the GoCtx closures will do that for you

As it turns out, there is an extremely easy way to enforce time limit - with statement_timeout session variable, so no need for any of these complications we were considering.


pkg/workload/tpcds/tpcds.go, line 202 at r1 (raw file):

Previously, danhhz (Daniel Harrison) wrote…

hmm, i think the percentiles would still be useful, but i agree that the histogram would currently print out too often and would be spammy. mind leaving this as a TODO?

Done.

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @danhhz)


pkg/testutils/lint/lint_test.go, line 972 at r2 (raw file):

			stream.GrepNot(`.*\.lock`),
			stream.GrepNot(`^storage\/engine\/rocksdb_error_dict\.go$`),
			stream.GrepNot(`^workload/tpcds/tpcds.go$`),

There is a spelling error in tpcdsStoreSchema - "s_tax_precentage", so I disabled the misspelling check on the file.

Copy link
Contributor

@danhhz danhhz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: two last comments. feel free to merge without another review from me once you've addressed them. great work!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @danhhz and @yuzefovich)


pkg/workload/tpcds/tpcds.go, line 170 at r1 (raw file):

Previously, yuzefovich wrote…

As it turns out, there is an extremely easy way to enforce time limit - with statement_timeout session variable, so no need for any of these complications we were considering.

oh nice! as always, there's a hiccup. the database/sql.DB object you're using here pools connections, so there's nothing to guarantee that your SET runs on the same connection as your query. maybe the SetMaxOpenConns/SetMaxIdleConns means this does actually always work, but even if so, it's really brittle.

However, the good news is you can pass SET parameters as part of your connection string, so just stick this into your gosql.Open call above instead. I think you have to add it to each of the urls that go into the strings.Join call


pkg/workload/tpcds/tpcds.go, line 300 at r2 (raw file):

	log.Infof(ctx, "[Q%d] return %d rows after %4.2f seconds",
		queryNum, numRows, elapsed.Seconds())
	return rows.Close()

defer this instead so it gets called in the various error paths


pkg/workload/tpch/tpch.go, line 211 at r2 (raw file):

	log.Infof(ctx, "[%s] return %d rows after %4.2f seconds:\n  %s",
		queryName, numRows, elapsed.Seconds(), query)
	return rows.Close()

ditto. good catch by the way!

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @danhhz)


pkg/workload/tpcds/tpcds.go, line 170 at r1 (raw file):

Previously, danhhz (Daniel Harrison) wrote…

oh nice! as always, there's a hiccup. the database/sql.DB object you're using here pools connections, so there's nothing to guarantee that your SET runs on the same connection as your query. maybe the SetMaxOpenConns/SetMaxIdleConns means this does actually always work, but even if so, it's really brittle.

However, the good news is you can pass SET parameters as part of your connection string, so just stick this into your gosql.Open call above instead. I think you have to add it to each of the urls that go into the strings.Join call

Thanks, this does the trick.


pkg/workload/tpcds/tpcds.go, line 300 at r2 (raw file):

Previously, danhhz (Daniel Harrison) wrote…

defer this instead so it gets called in the various error paths

Thanks, done.


pkg/workload/tpch/tpch.go, line 211 at r2 (raw file):

Previously, danhhz (Daniel Harrison) wrote…

ditto. good catch by the way!

Done.

Adds TPC-DS queries and a simple workload to run them. The dataset
must be imported manually prior to running the workload. Some
queries need modifications before they can be parsed by CRDB, so
they are disabled.

Release note: None
@yuzefovich
Copy link
Member Author

Thanks for the reviews and suggestions!

bors r+

craig bot pushed a commit that referenced this pull request May 16, 2019
37465: workload: add TPC-DS queries and workload r=yuzefovich a=yuzefovich

Adds TPC-DS queries and a simple workload to run them. The dataset
must be imported manually prior to running the workload. Some
queries need modifications before they can be parsed by CRDB, so
they are disabled.

Addresses: #37436.

Release note: None

Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
@craig
Copy link
Contributor

craig bot commented May 16, 2019

Build succeeded

@craig craig bot merged commit c54a228 into cockroachdb:master May 16, 2019
@yuzefovich yuzefovich deleted the tpcds branch May 16, 2019 01:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants