Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-34795][SQL][TESTS] Adds a new job in GitHub Actions to check the output of TPC-DS queries #31886

Closed
wants to merge 15 commits into from

Conversation

maropu
Copy link
Member

@maropu maropu commented Mar 18, 2021

What changes were proposed in this pull request?

This PR proposes to add a new job in GitHub Actions to check the output of TPC-DS queries.

NOTE: I've checked that the new job took 17m 35s in the GitHub Actions env.

Why are the changes needed?

There are some cases where we noticed runtime-realted bugs after merging commits (e.g. .SPARK-33822). Therefore, I think it is worth adding a new job in GitHub Actions to check query output of TPC-DS (sf=1).

Does this PR introduce any user-facing change?

No.

How was this patch tested?

The new test added.

@@ -405,7 +382,7 @@ class SQLQueryTestSuite extends QueryTest with SharedSparkSession with SQLHelper
val goldenOutput = {
s"-- Automatically generated by ${getClass.getSimpleName}\n" +
s"-- Number of queries: ${outputs.size}\n\n\n" +
outputs.zipWithIndex.map{case (qr, i) => qr.toString}.mkString("\n\n\n") + "\n"
outputs.mkString("\n\n\n") + "\n"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just removed the unnecessary code here.

@maropu
Copy link
Member Author

maropu commented Mar 18, 2021

NOTE: I saw the Xiao's comment in the recent release process and then I made this PR.

@SparkQA
Copy link

SparkQA commented Mar 18, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40809/

@dongjoon-hyun
Copy link
Member

@SparkQA
Copy link

SparkQA commented Mar 19, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40809/

@maropu
Copy link
Member Author

maropu commented Mar 19, 2021

Yea, but this PR is not for the benchmark purpose.

@HyukjinKwon
Copy link
Member

I prefer to have this in GA ... cc @wangyum who should be the best person to review.
cc @cloud-fan too

@cloud-fan
Copy link
Contributor

+1 to have this check!

@maropu
Copy link
Member Author

maropu commented Mar 21, 2021

NOTE: I'm checking if the result in the golden files is correct query-by-query.

@SparkQA
Copy link

SparkQA commented Mar 21, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40878/

@SparkQA
Copy link

SparkQA commented Mar 21, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40878/

@SparkQA
Copy link

SparkQA commented Mar 21, 2021

Test build #136296 has finished for PR 31886 at commit 4185e73.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 21, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40889/

@SparkQA
Copy link

SparkQA commented Mar 21, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40889/

@SparkQA
Copy link

SparkQA commented Mar 21, 2021

Test build #136307 has finished for PR 31886 at commit 582c480.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 22, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40905/

@SparkQA
Copy link

SparkQA commented Mar 22, 2021

Test build #136323 has finished for PR 31886 at commit 7738415.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

-- !query schema
struct<i_item_id:string,i_item_desc:string,s_state:string,store_sales_quantitycount:bigint,store_sales_quantityave:double,store_sales_quantitystdev:double,store_sales_quantitycov:double,as_store_returns_quantitycount:bigint,as_store_returns_quantityave:double,as_store_returns_quantitystdev:double,store_returns_quantitycov:double,catalog_sales_quantitycount:bigint,catalog_sales_quantityave:double,catalog_sales_quantitystdev:double,catalog_sales_quantitycov:double>
-- !query output

Copy link
Member Author

@maropu maropu Mar 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be

+AAAAAAAAKPFEAAAA       Recently right  TN      1       99.0    NULL    NULL    1       66.0    NULL    NULL    1       32.0    NULL    NULL

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be caused by the wrong schema of date_dim: https://github.com/apache/spark/pull/31012/files#r599447123

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in #31012

-- !query schema
struct<i_product_name:string>
-- !query output

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be:

ableationableought                                
anticallyeingese                                  
callycallyeingese                                 
oughtationableought                               

A PR to fix this is: #31940

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR seems to have two independent themes. Could you split this PR into two?

  • One is having build_and_test.yml change only.
  • One is the other test suite change and golden-file fix.

@maropu
Copy link
Member Author

maropu commented Mar 23, 2021

okay, @dongjoon-hyun ~ After I finish checking all the TPCDS query results, I'll do that (I'm still checking them)

@SparkQA
Copy link

SparkQA commented Mar 23, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40987/

@SparkQA
Copy link

SparkQA commented Mar 23, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40987/

@SparkQA
Copy link

SparkQA commented Mar 23, 2021

Test build #136403 has finished for PR 31886 at commit 20f9e64.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 30, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41257/

@SparkQA
Copy link

SparkQA commented Mar 30, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41257/

@SparkQA
Copy link

SparkQA commented Mar 30, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41260/

@SparkQA
Copy link

SparkQA commented Mar 30, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41260/

@SparkQA
Copy link

SparkQA commented Mar 30, 2021

Test build #136675 has finished for PR 31886 at commit 2d891e9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum
Copy link
Member

wangyum commented Mar 30, 2021

Please wait me a few hours. I will verify the result first.

@SparkQA
Copy link

SparkQA commented Mar 30, 2021

Test build #136678 has finished for PR 31886 at commit 0a19585.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

@wangyum, don't worry. There are few more issues to address (#31886 (comment)). It will take more.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine. I prefer to switch maropu/spark-tpcds-sf-1 to databricks/spark-sql-perf. I roughly checked it internally, and it seems fine to open source this repo in the future as well.

We could merge this one, and maybe @wangyum can follow-up to switch to databricks/spark-sql-perf? I will leave it to both of you to merge and how to proceed.

LGTM (pending #31886 (comment)) in any event.

@SparkQA
Copy link

SparkQA commented Mar 30, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41297/

@SparkQA
Copy link

SparkQA commented Mar 30, 2021

Test build #136715 has finished for PR 31886 at commit 841005c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu maropu closed this in 46f96e9 Mar 31, 2021
@maropu
Copy link
Member Author

maropu commented Mar 31, 2021

Thanks, @HyukjinKwon @wangyum . Merged to master.

@maropu
Copy link
Member Author

maropu commented Mar 31, 2021

If someone finds the failure of the new GA job, please let me know. cc: @dongjoon-hyun @cloud-fan @viirya @gatorsmile

@maropu
Copy link
Member Author

maropu commented Mar 31, 2021

After we resolve the pending issues on databricks/spark-sql-perf, I'm planning to add this job into branch-3.1./3.0, too.

@HyukjinKwon
Copy link
Member

Thanks @maropu.

@viirya
Copy link
Member

viirya commented Mar 31, 2021

Thanks @maropu

@gatorsmile
Copy link
Member

This is awesome! We should do it 5 years ago. :-)

dongjoon-hyun pushed a commit that referenced this pull request May 8, 2021
… Adds a new job in GitHub Actions to check the output of TPC-DS queries

### What changes were proposed in this pull request?

This PR proposes to add a new job in GitHub Actions to check the output of TPC-DS queries.

NOTE: To generate TPC-DS table data in GA jobs, this PR includes generator code implemented in #32243 and #32460.

This is the backport PR of #31886.

### Why are the changes needed?

There are some cases where we noticed runtime-realted bugs after merging commits (e.g. .SPARK-33822). Therefore, I think it is worth adding a new job in GitHub Actions to check query output of TPC-DS (sf=1).

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

The new test added.

Closes #32462 from maropu/TPCDSQUeryTestSuite-Branch3.1.

Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun pushed a commit that referenced this pull request May 9, 2021
… Adds a new job in GitHub Actions to check the output of TPC-DS queries

### What changes were proposed in this pull request?

This PR proposes to add a new job in GitHub Actions to check the output of TPC-DS queries.

NOTE: To generate TPC-DS table data in GA jobs, this PR includes generator code implemented in #32243 and #32460.

This is the backport PR of #31886.

### Why are the changes needed?

There are some cases where we noticed runtime-realted bugs after merging commits (e.g. .SPARK-33822). Therefore, I think it is worth adding a new job in GitHub Actions to check query output of TPC-DS (sf=1).

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

The new test added.

Closes #32479 from maropu/TPCDSQueryTestSuite-Branch3.0.

Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
flyrain pushed a commit to flyrain/spark that referenced this pull request Sep 21, 2021
… Adds a new job in GitHub Actions to check the output of TPC-DS queries

### What changes were proposed in this pull request?

This PR proposes to add a new job in GitHub Actions to check the output of TPC-DS queries.

NOTE: To generate TPC-DS table data in GA jobs, this PR includes generator code implemented in apache#32243 and apache#32460.

This is the backport PR of apache#31886.

### Why are the changes needed?

There are some cases where we noticed runtime-realted bugs after merging commits (e.g. .SPARK-33822). Therefore, I think it is worth adding a new job in GitHub Actions to check query output of TPC-DS (sf=1).

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

The new test added.

Closes apache#32462 from maropu/TPCDSQUeryTestSuite-Branch3.1.

Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants