Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[KYUUBI #1496] Support tpcds benchmark
<!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://kyuubi.readthedocs.io/en/latest/community/contributions.html 2. If the PR is related to an issue in https://github.com/apache/incubator-kyuubi/issues, add '[KYUUBI #XXXX]' in your PR title, e.g., '[KYUUBI #XXXX] Your PR title ...'. 3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][KYUUBI #XXXX] Your PR title ...'. --> ### _Why are the changes needed?_ <!-- Please clarify why the changes are needed. For instance, 1. If you add a feature, you can talk about the use case of it. 2. If you fix a bug, you can clarify why it is a bug. --> Support tpcds benchmark in `dev/kyuubi-tpcds` module. Add a `README.md` in `dev/kyuubi-tpcds` module to show how to use. The mian code is from [databricks-spark-sql-perf](https://github.com/databricks/spark-sql-perf) ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [x] [Run test](https://kyuubi.readthedocs.io/en/latest/develop_tools/testing.html#running-tests) locally before make a pull request Closes #1496 from ulysses-you/tpcds-benchmark. Closes #1496 d4afe2d [ulysses-you] comment 54a146e [ulysses-you] pom 91e7169 [ulysses-you] docs 20eadc4 [ulysses-you] benchmark Authored-by: ulysses-you <ulyssesyou18@gmail.com> Signed-off-by: ulysses-you <ulyssesyou@apache.org>
- Loading branch information
1 parent
dad48c9
commit 37a4e5c
Showing
117 changed files
with
7,419 additions
and
563 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
<!-- | ||
- Licensed to the Apache Software Foundation (ASF) under one or more | ||
- contributor license agreements. See the NOTICE file distributed with | ||
- this work for additional information regarding copyright ownership. | ||
- The ASF licenses this file to You under the Apache License, Version 2.0 | ||
- (the "License"); you may not use this file except in compliance with | ||
- the License. You may obtain a copy of the License at | ||
- | ||
- http://www.apache.org/licenses/LICENSE-2.0 | ||
- | ||
- Unless required by applicable law or agreed to in writing, software | ||
- distributed under the License is distributed on an "AS IS" BASIS, | ||
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
- See the License for the specific language governing permissions and | ||
- limitations under the License. | ||
--> | ||
|
||
# Introduction | ||
This module includes tpcds data generator and benchmark. | ||
|
||
# How to use | ||
|
||
package jar with following command: | ||
`./build/mvn install -DskipTests -Ptpcds -pl dev/kyuubi-tpcds -am` | ||
|
||
## data generator | ||
|
||
Support options: | ||
|
||
| key | default | description | | ||
|-------------|---------|------------------------------| | ||
| db | default | the databases to write data | | ||
| scaleFactor | 1 | the scale factor of tpcds | | ||
|
||
Example: the following command to generate 10GB data with new database `tpcds_sf10`. | ||
|
||
```shell | ||
$SPARK_HOME/bin/spark-submit \ | ||
--class org.apache.kyuubi.tpcds.DataGenerator \ | ||
kyuubi-tpcds-*.jar --db tpcds_sf10 --scaleFactor 10 | ||
``` | ||
|
||
## do benchmark | ||
|
||
Support options: | ||
|
||
| key | default | description | | ||
|------------|----------------------|--------------------------------------------------------| | ||
| db | none(required) | the tpcds database | | ||
| benchmark | tpcds-v2.4-benchmark | the name of application | | ||
| iterations | 3 | the number of iterations to run | | ||
| filter | a | filter on the name of the queries to run, e.g. q1-v2.4 | | ||
|
||
Example: the following command to benchmark tpcds sf10 with exists database `tpcds_sf10`. | ||
|
||
```shell | ||
$SPARK_HOME/bin/spark-submit \ | ||
--class org.apache.kyuubi.tpcds.benchmark.RunBenchmark \ | ||
kyuubi-tpcds-*.jar --db tpcds_sf10 | ||
``` | ||
|
||
We also support run one of the tpcds query: | ||
```shell | ||
$SPARK_HOME/bin/spark-submit \ | ||
--class org.apache.kyuubi.tpcds.benchmark.RunBenchmark \ | ||
kyuubi-tpcds-*.jar --db tpcds_sf10 --filter q1-v2.4 | ||
``` | ||
|
||
The result of tpcds benchmark like: | ||
|
||
| name | minTimeMs | maxTimeMs | avgTimeMs | stdDev | stdDevPercent | | ||
|---------|-----------|-------------|------------|----------|----------------| | ||
| q1-v2.4 | 50.522384 | 868.010383 | 323.398267 | 471.6482 | 145.8413108576 | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
-- | ||
-- Licensed to the Apache Software Foundation (ASF) under one or more | ||
-- contributor license agreements. See the NOTICE file distributed with | ||
-- this work for additional information regarding copyright ownership. | ||
-- The ASF licenses this file to You under the Apache License, Version 2.0 | ||
-- (the "License"); you may not use this file except in compliance with | ||
-- the License. You may obtain a copy of the License at | ||
-- | ||
-- http://www.apache.org/licenses/LICENSE-2.0 | ||
-- | ||
-- Unless required by applicable law or agreed to in writing, software | ||
-- distributed under the License is distributed on an "AS IS" BASIS, | ||
-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
-- See the License for the specific language governing permissions and | ||
-- limitations under the License. | ||
-- | ||
|
||
--q1.sql-- | ||
|
||
WITH customer_total_return AS | ||
(SELECT sr_customer_sk AS ctr_customer_sk, sr_store_sk AS ctr_store_sk, | ||
sum(sr_return_amt) AS ctr_total_return | ||
FROM store_returns, date_dim | ||
WHERE sr_returned_date_sk = d_date_sk AND d_year = 2000 | ||
GROUP BY sr_customer_sk, sr_store_sk) | ||
SELECT c_customer_id | ||
FROM customer_total_return ctr1, store, customer | ||
WHERE ctr1.ctr_total_return > | ||
(SELECT avg(ctr_total_return)*1.2 | ||
FROM customer_total_return ctr2 | ||
WHERE ctr1.ctr_store_sk = ctr2.ctr_store_sk) | ||
AND s_store_sk = ctr1.ctr_store_sk | ||
AND s_state = 'TN' | ||
AND ctr1.ctr_customer_sk = c_customer_sk | ||
ORDER BY c_customer_id LIMIT 100 | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,64 @@ | ||
-- | ||
-- Licensed to the Apache Software Foundation (ASF) under one or more | ||
-- contributor license agreements. See the NOTICE file distributed with | ||
-- this work for additional information regarding copyright ownership. | ||
-- The ASF licenses this file to You under the Apache License, Version 2.0 | ||
-- (the "License"); you may not use this file except in compliance with | ||
-- the License. You may obtain a copy of the License at | ||
-- | ||
-- http://www.apache.org/licenses/LICENSE-2.0 | ||
-- | ||
-- Unless required by applicable law or agreed to in writing, software | ||
-- distributed under the License is distributed on an "AS IS" BASIS, | ||
-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
-- See the License for the specific language governing permissions and | ||
-- limitations under the License. | ||
-- | ||
|
||
--q10.sql-- | ||
|
||
select | ||
cd_gender, cd_marital_status, cd_education_status, count(*) cnt1, | ||
cd_purchase_estimate, count(*) cnt2, cd_credit_rating, count(*) cnt3, | ||
cd_dep_count, count(*) cnt4, cd_dep_employed_count, count(*) cnt5, | ||
cd_dep_college_count, count(*) cnt6 | ||
from | ||
customer c, customer_address ca, customer_demographics | ||
where | ||
c.c_current_addr_sk = ca.ca_address_sk and | ||
ca_county in ('Rush County','Toole County','Jefferson County', | ||
'Dona Ana County','La Porte County') and | ||
cd_demo_sk = c.c_current_cdemo_sk AND | ||
exists (select * from store_sales, date_dim | ||
where c.c_customer_sk = ss_customer_sk AND | ||
ss_sold_date_sk = d_date_sk AND | ||
d_year = 2002 AND | ||
d_moy between 1 AND 1+3) AND | ||
(exists (select * from web_sales, date_dim | ||
where c.c_customer_sk = ws_bill_customer_sk AND | ||
ws_sold_date_sk = d_date_sk AND | ||
d_year = 2002 AND | ||
d_moy between 1 AND 1+3) or | ||
exists (select * from catalog_sales, date_dim | ||
where c.c_customer_sk = cs_ship_customer_sk AND | ||
cs_sold_date_sk = d_date_sk AND | ||
d_year = 2002 AND | ||
d_moy between 1 AND 1+3)) | ||
group by cd_gender, | ||
cd_marital_status, | ||
cd_education_status, | ||
cd_purchase_estimate, | ||
cd_credit_rating, | ||
cd_dep_count, | ||
cd_dep_employed_count, | ||
cd_dep_college_count | ||
order by cd_gender, | ||
cd_marital_status, | ||
cd_education_status, | ||
cd_purchase_estimate, | ||
cd_credit_rating, | ||
cd_dep_count, | ||
cd_dep_employed_count, | ||
cd_dep_college_count | ||
LIMIT 100 | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,90 @@ | ||
-- | ||
-- Licensed to the Apache Software Foundation (ASF) under one or more | ||
-- contributor license agreements. See the NOTICE file distributed with | ||
-- this work for additional information regarding copyright ownership. | ||
-- The ASF licenses this file to You under the Apache License, Version 2.0 | ||
-- (the "License"); you may not use this file except in compliance with | ||
-- the License. You may obtain a copy of the License at | ||
-- | ||
-- http://www.apache.org/licenses/LICENSE-2.0 | ||
-- | ||
-- Unless required by applicable law or agreed to in writing, software | ||
-- distributed under the License is distributed on an "AS IS" BASIS, | ||
-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
-- See the License for the specific language governing permissions and | ||
-- limitations under the License. | ||
-- | ||
|
||
--q11.sql-- | ||
|
||
with year_total as ( | ||
select c_customer_id customer_id | ||
,c_first_name customer_first_name | ||
,c_last_name customer_last_name | ||
,c_preferred_cust_flag customer_preferred_cust_flag | ||
,c_birth_country customer_birth_country | ||
,c_login customer_login | ||
,c_email_address customer_email_address | ||
,d_year dyear | ||
,sum(ss_ext_list_price-ss_ext_discount_amt) year_total | ||
,'s' sale_type | ||
from customer, store_sales, date_dim | ||
where c_customer_sk = ss_customer_sk | ||
and ss_sold_date_sk = d_date_sk | ||
group by c_customer_id | ||
,c_first_name | ||
,c_last_name | ||
,c_preferred_cust_flag | ||
,c_birth_country | ||
,c_login | ||
,c_email_address | ||
,d_year | ||
union all | ||
select c_customer_id customer_id | ||
,c_first_name customer_first_name | ||
,c_last_name customer_last_name | ||
,c_preferred_cust_flag customer_preferred_cust_flag | ||
,c_birth_country customer_birth_country | ||
,c_login customer_login | ||
,c_email_address customer_email_address | ||
,d_year dyear | ||
,sum(ws_ext_list_price-ws_ext_discount_amt) year_total | ||
,'w' sale_type | ||
from customer, web_sales, date_dim | ||
where c_customer_sk = ws_bill_customer_sk | ||
and ws_sold_date_sk = d_date_sk | ||
group by | ||
c_customer_id, c_first_name, c_last_name, c_preferred_cust_flag, c_birth_country, | ||
c_login, c_email_address, d_year) | ||
select | ||
t_s_secyear.customer_id | ||
,t_s_secyear.customer_first_name | ||
,t_s_secyear.customer_last_name | ||
,t_s_secyear.customer_preferred_cust_flag | ||
from year_total t_s_firstyear | ||
,year_total t_s_secyear | ||
,year_total t_w_firstyear | ||
,year_total t_w_secyear | ||
where t_s_secyear.customer_id = t_s_firstyear.customer_id | ||
and t_s_firstyear.customer_id = t_w_secyear.customer_id | ||
and t_s_firstyear.customer_id = t_w_firstyear.customer_id | ||
and t_s_firstyear.sale_type = 's' | ||
and t_w_firstyear.sale_type = 'w' | ||
and t_s_secyear.sale_type = 's' | ||
and t_w_secyear.sale_type = 'w' | ||
and t_s_firstyear.dyear = 2001 | ||
and t_s_secyear.dyear = 2001+1 | ||
and t_w_firstyear.dyear = 2001 | ||
and t_w_secyear.dyear = 2001+1 | ||
and t_s_firstyear.year_total > 0 | ||
and t_w_firstyear.year_total > 0 | ||
and case when t_w_firstyear.year_total > 0 then t_w_secyear.year_total / t_w_firstyear.year_total else 0.0 end | ||
> case when t_s_firstyear.year_total > 0 then t_s_secyear.year_total / t_s_firstyear.year_total else 0.0 end | ||
order by | ||
t_s_secyear.customer_id | ||
,t_s_secyear.customer_first_name | ||
,t_s_secyear.customer_last_name | ||
, | ||
t_s_secyear.customer_preferred_cust_flag | ||
LIMIT 100 | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
-- | ||
-- Licensed to the Apache Software Foundation (ASF) under one or more | ||
-- contributor license agreements. See the NOTICE file distributed with | ||
-- this work for additional information regarding copyright ownership. | ||
-- The ASF licenses this file to You under the Apache License, Version 2.0 | ||
-- (the "License"); you may not use this file except in compliance with | ||
-- the License. You may obtain a copy of the License at | ||
-- | ||
-- http://www.apache.org/licenses/LICENSE-2.0 | ||
-- | ||
-- Unless required by applicable law or agreed to in writing, software | ||
-- distributed under the License is distributed on an "AS IS" BASIS, | ||
-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
-- See the License for the specific language governing permissions and | ||
-- limitations under the License. | ||
-- | ||
|
||
--q12.sql-- | ||
|
||
select i_item_id, | ||
i_item_desc, i_category, i_class, i_current_price, | ||
sum(ws_ext_sales_price) as itemrevenue, | ||
sum(ws_ext_sales_price)*100/sum(sum(ws_ext_sales_price)) over | ||
(partition by i_class) as revenueratio | ||
from | ||
web_sales, item, date_dim | ||
where | ||
ws_item_sk = i_item_sk | ||
and i_category in ('Sports', 'Books', 'Home') | ||
and ws_sold_date_sk = d_date_sk | ||
and d_date between cast('1999-02-22' as date) | ||
and (cast('1999-02-22' as date) + interval '30' day) | ||
group by | ||
i_item_id, i_item_desc, i_category, i_class, i_current_price | ||
order by | ||
i_category, i_class, i_item_id, i_item_desc, revenueratio | ||
LIMIT 100 | ||
|
Oops, something went wrong.