Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-34130][SQL] Impove preformace for char varchar padding and length check with StaticInvoke #31199

Closed
wants to merge 17 commits into from

Conversation

yaooqinn
Copy link
Member

@yaooqinn yaooqinn commented Jan 15, 2021

What changes were proposed in this pull request?

This could reduce the generate.java size to prevent codegen fallback which causes performance regression.

here is a case from tpcds that could be fixed by this improvement
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133964/testReport/org.apache.spark.sql.execution/LogicalPlanTagInSparkPlanSuite/q41/

The original case generate 20K bytes, we are trying to reduce it to less than 8k

Why are the changes needed?

performance improvement as in the PR benchmark test, the performance w/ codegen is 2~3x better than w/o codegen.

Does this PR introduce any user-facing change?

no

How was this patch tested?

yes, it's a code reflect so the existing ut should be enough

cross-check with #31012 where the tpcds shall all pass

benchmark compared with master

================================================================================================
Char Varchar Read Side Perf
================================================================================================

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
Read with length 20, hasSpaces: false:    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
read string with length 20                         1571           1667          83         63.6          15.7       1.0X
read char with length 20                           1710           1764          58         58.5          17.1       0.9X
read varchar with length 20                        1774           1792          16         56.4          17.7       0.9X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
Read with length 40, hasSpaces: false:    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
read string with length 40                         1824           1927          91         54.8          18.2       1.0X
read char with length 40                           1788           1928         137         55.9          17.9       1.0X
read varchar with length 40                        1676           1700          40         59.7          16.8       1.1X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
Read with length 60, hasSpaces: false:    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
read string with length 60                         1727           1762          30         57.9          17.3       1.0X
read char with length 60                           1628           1674          43         61.4          16.3       1.1X
read varchar with length 60                        1651           1665          13         60.6          16.5       1.0X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
Read with length 80, hasSpaces: true:     Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
read string with length 80                         1748           1778          28         57.2          17.5       1.0X
read char with length 80                           1673           1678           9         59.8          16.7       1.0X
read varchar with length 80                        1667           1684          27         60.0          16.7       1.0X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
Read with length 100, hasSpaces: true:    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
read string with length 100                        1709           1743          48         58.5          17.1       1.0X
read char with length 100                          1610           1664          67         62.1          16.1       1.1X
read varchar with length 100                       1614           1673          53         61.9          16.1       1.1X


================================================================================================
Char Varchar Write Side Perf
================================================================================================

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
Write with length 20, hasSpaces: false:   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
write string with length 20                        2277           2327          67          4.4         227.7       1.0X
write char with length 20                          2421           2443          19          4.1         242.1       0.9X
write varchar with length 20                       2393           2419          27          4.2         239.3       1.0X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
Write with length 40, hasSpaces: false:   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
write string with length 40                        2249           2290          38          4.4         224.9       1.0X
write char with length 40                          2386           2444          57          4.2         238.6       0.9X
write varchar with length 40                       2397           2405          12          4.2         239.7       0.9X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
Write with length 60, hasSpaces: false:   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
write string with length 60                        2326           2367          41          4.3         232.6       1.0X
write char with length 60                          2478           2501          37          4.0         247.8       0.9X
write varchar with length 60                       2475           2503          24          4.0         247.5       0.9X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
Write with length 80, hasSpaces: true:    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
write string with length 80                        9367           9773         354          1.1         936.7       1.0X
write char with length 80                         10454          10621         238          1.0        1045.4       0.9X
write varchar with length 80                      18943          19503         571          0.5        1894.3       0.5X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
Write with length 100, hasSpaces: true:   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
write string with length 100                      11055          11104          59          0.9        1105.5       1.0X
write char with length 100                        12204          12275          63          0.8        1220.4       0.9X
write varchar with length 100                     21737          22275         574          0.5        2173.7       0.5X

@SparkQA
Copy link

SparkQA commented Jan 15, 2021

Test build #134122 has started for PR 31199 at commit 1f8326e.

@SparkQA
Copy link

SparkQA commented Jan 15, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38701/

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a performance PR, could you provide a benchmark result, @yaooqinn ?

Or, do we need to have one benchmark suite on CHAR/VARCHAR, @cloud-fan ?

@SparkQA
Copy link

SparkQA commented Jan 15, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38705/

@SparkQA
Copy link

SparkQA commented Jan 15, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38701/

@SparkQA
Copy link

SparkQA commented Jan 15, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38705/

@github-actions github-actions bot added the SQL label Jan 15, 2021
@SparkQA
Copy link

SparkQA commented Jan 15, 2021

Test build #134118 has finished for PR 31199 at commit 0caf4ed.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yaooqinn
Copy link
Member Author

cc @cloud-fan @maropu

@SparkQA
Copy link

SparkQA commented Jan 16, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38727/

@SparkQA
Copy link

SparkQA commented Jan 16, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38727/

@SparkQA
Copy link

SparkQA commented Jan 16, 2021

Test build #134143 has finished for PR 31199 at commit 2010462.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 16, 2021

Test build #134144 has finished for PR 31199 at commit 54f30ee.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yaooqinn
Copy link
Member Author

yaooqinn commented Jan 17, 2021

Codegen vs NonCodegen reslut from master

the performance itself does not change much, because the purpose here is to prevent queries from fallback to non-codegen

================================================================================================
Char Varchar Read Side Perf
================================================================================================

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
read char with length 20:                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
read char with length 20 wholestage off            4830           5393         796         20.7          48.3       1.0X
read char with length 20 wholestage on             1693           1755          42         59.1          16.9       2.9X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
read varchar with length 20:                Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
--------------------------------------------------------------------------------------------------------------------------
read varchar with length 20 wholestage off           4763           4884         171         21.0          47.6       1.0X
read varchar with length 20 wholestage on            1802           1888          59         55.5          18.0       2.6X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
read char with length 40:                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
read char with length 40 wholestage off            4685           4718          46         21.3          46.9       1.0X
read char with length 40 wholestage on             1858           1904          39         53.8          18.6       2.5X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
read varchar with length 40:                Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
--------------------------------------------------------------------------------------------------------------------------
read varchar with length 40 wholestage off           4730           4749          27         21.1          47.3       1.0X
read varchar with length 40 wholestage on            1738           1831          99         57.5          17.4       2.7X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
read char with length 60:                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
read char with length 60 wholestage off            4936           4984          68         20.3          49.4       1.0X
read char with length 60 wholestage on             1787           1835          46         55.9          17.9       2.8X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
read varchar with length 60:                Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
--------------------------------------------------------------------------------------------------------------------------
read varchar with length 60 wholestage off           4642           4740         138         21.5          46.4       1.0X
read varchar with length 60 wholestage on            1694           1755          48         59.0          16.9       2.7X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
read char with length 80:                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
read char with length 80 wholestage off            4709           4736          38         21.2          47.1       1.0X
read char with length 80 wholestage on             1787           1869          76         56.0          17.9       2.6X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
read varchar with length 80:                Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
--------------------------------------------------------------------------------------------------------------------------
read varchar with length 80 wholestage off           4613           4721         153         21.7          46.1       1.0X
read varchar with length 80 wholestage on            1757           1814          59         56.9          17.6       2.6X


================================================================================================
Char Varchar Write Side Perf
================================================================================================

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
write char with length 20:                Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
write char with length 20 wholestage off           7312           7439         180          1.4         731.2       1.0X
write char with length 20 wholestage on            5957           6332         355          1.7         595.7       1.2X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
write varchar with length 20:                Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
---------------------------------------------------------------------------------------------------------------------------
write varchar with length 20 wholestage off           9318          10265        1339          1.1         931.8       1.0X
write varchar with length 20 wholestage on            8722           9268         512          1.1         872.2       1.1X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
write char with length 40:                Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
write char with length 40 wholestage off          10051          10257         292          1.0        1005.1       1.0X
write char with length 40 wholestage on            8534           8917         316          1.2         853.4       1.2X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
write varchar with length 40:                Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
---------------------------------------------------------------------------------------------------------------------------
write varchar with length 40 wholestage off          13713          14250         759          0.7        1371.3       1.0X
write varchar with length 40 wholestage on           12070          12609         516          0.8        1207.0       1.1X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
write char with length 60:                Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
write char with length 60 wholestage off          11728          12975        1763          0.9        1172.8       1.0X
write char with length 60 wholestage on           10211          10515         256          1.0        1021.1       1.1X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
write varchar with length 60:                Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
---------------------------------------------------------------------------------------------------------------------------
write varchar with length 60 wholestage off          17845          17924         113          0.6        1784.5       1.0X
write varchar with length 60 wholestage on           16338          17353        1330          0.6        1633.8       1.1X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
write char with length 80:                Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
write char with length 80 wholestage off          14381          14656         390          0.7        1438.1       1.0X
write char with length 80 wholestage on           13991          14556         455          0.7        1399.1       1.0X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
write varchar with length 80:                Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
---------------------------------------------------------------------------------------------------------------------------
write varchar with length 80 wholestage off          21476          22547        1515          0.5        2147.6       1.0X
write varchar with length 80 wholestage on           21092          22393         899          0.5        2109.2       1.0X

* Trailing spaces do not count in the length check. We don't need to retain the trailing
* spaces, as we will pad char type columns/fields at read time.
*/
public static UTF8String charTypeWriteCheck(UTF8String inputStr, int limit) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

charTypeWriteSideCheck to be consistent.

}.getOrElse(expr)
}

private def raiseError(typeName: String, length: Int): Expression = {
val errMsg = UTF8String.fromString(s"Exceeds $typeName type length limitation: $length")
RaiseError(Literal(errMsg, StringType), StringType)
RaiseError(Literal(errMsg, StringType), StringType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: identation is wrong

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed it

StaticInvoke(
classOf[CharVarcharCodegenUtils],
StringType,
"varcharTypeWriteSidePadAndCheck",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just varcharTypeWriteSideCheck to show the general purpose. It also does trim but we didn't mention it in the name either.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@SparkQA
Copy link

SparkQA commented Jan 18, 2021

Test build #134185 has finished for PR 31199 at commit 3d82a79.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 18, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38784/

@SparkQA
Copy link

SparkQA commented Jan 18, 2021

Test build #134194 has finished for PR 31199 at commit 79c2ecd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 18, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38784/

@SparkQA
Copy link

SparkQA commented Jan 18, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38787/

@SparkQA
Copy link

SparkQA commented Jan 18, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38787/

@SparkQA
Copy link

SparkQA commented Jan 18, 2021

Test build #134196 has finished for PR 31199 at commit f665d33.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 18, 2021

Test build #134199 has finished for PR 31199 at commit 1e19de4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 18, 2021

Test build #134202 has finished for PR 31199 at commit 3f81d69.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

Ideally, we shouldn't backport perf improvement, but not passing TPCDSQuerySuite with char/varchar in table schema concerns me (see #31012). Since it only touches the char/varchar code path, I feel it's safe and better to have this "fix" in branch-3.1 as well. I won't treat it as a blocker though (won't fail rc1).

@HyukjinKwon what do you think?

@yaooqinn
Copy link
Member Author

+1 for @cloud-fan 's decision. As far as I'm concerned, query 41 from TPCDS is a quite normal case, 3.1 generates oversized codegen classes that let it fall back to the non-codegen mode and cause huge performance regession. Users can hit this regularly in real-world cases if they work on their pre-defined tables with char/varchar

@HyukjinKwon
Copy link
Member

Yes, I think it's fine to port back. It's not very common case but we can think it's like a perf regression for varchar cases compared to Spark 3.1 in a way.

@cloud-fan
Copy link
Contributor

thanks, merging to master/3.1!

cloud-fan pushed a commit that referenced this pull request Jan 19, 2021
…gth check with StaticInvoke

### What changes were proposed in this pull request?

This could reduce the `generate.java` size to prevent codegen fallback which causes performance regression.

here is a case from tpcds that could be fixed by this improvement
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133964/testReport/org.apache.spark.sql.execution/LogicalPlanTagInSparkPlanSuite/q41/

The original case generate 20K bytes, we are trying to reduce it to less than 8k
### Why are the changes needed?

performance improvement as in the PR benchmark test, the performance  w/ codegen is 2~3x better than w/o codegen.

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

yes, it's a code reflect so the existing ut should be enough

cross-check with #31012 where the tpcds shall all pass

benchmark compared with master

```logtalk
================================================================================================
Char Varchar Read Side Perf
================================================================================================

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU  2.40GHz
Read with length 20, hasSpaces: false:    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
read string with length 20                         1571           1667          83         63.6          15.7       1.0X
read char with length 20                           1710           1764          58         58.5          17.1       0.9X
read varchar with length 20                        1774           1792          16         56.4          17.7       0.9X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU  2.40GHz
Read with length 40, hasSpaces: false:    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
read string with length 40                         1824           1927          91         54.8          18.2       1.0X
read char with length 40                           1788           1928         137         55.9          17.9       1.0X
read varchar with length 40                        1676           1700          40         59.7          16.8       1.1X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU  2.40GHz
Read with length 60, hasSpaces: false:    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
read string with length 60                         1727           1762          30         57.9          17.3       1.0X
read char with length 60                           1628           1674          43         61.4          16.3       1.1X
read varchar with length 60                        1651           1665          13         60.6          16.5       1.0X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU  2.40GHz
Read with length 80, hasSpaces: true:     Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
read string with length 80                         1748           1778          28         57.2          17.5       1.0X
read char with length 80                           1673           1678           9         59.8          16.7       1.0X
read varchar with length 80                        1667           1684          27         60.0          16.7       1.0X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU  2.40GHz
Read with length 100, hasSpaces: true:    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
read string with length 100                        1709           1743          48         58.5          17.1       1.0X
read char with length 100                          1610           1664          67         62.1          16.1       1.1X
read varchar with length 100                       1614           1673          53         61.9          16.1       1.1X

================================================================================================
Char Varchar Write Side Perf
================================================================================================

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU  2.40GHz
Write with length 20, hasSpaces: false:   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
write string with length 20                        2277           2327          67          4.4         227.7       1.0X
write char with length 20                          2421           2443          19          4.1         242.1       0.9X
write varchar with length 20                       2393           2419          27          4.2         239.3       1.0X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU  2.40GHz
Write with length 40, hasSpaces: false:   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
write string with length 40                        2249           2290          38          4.4         224.9       1.0X
write char with length 40                          2386           2444          57          4.2         238.6       0.9X
write varchar with length 40                       2397           2405          12          4.2         239.7       0.9X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU  2.40GHz
Write with length 60, hasSpaces: false:   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
write string with length 60                        2326           2367          41          4.3         232.6       1.0X
write char with length 60                          2478           2501          37          4.0         247.8       0.9X
write varchar with length 60                       2475           2503          24          4.0         247.5       0.9X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU  2.40GHz
Write with length 80, hasSpaces: true:    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
write string with length 80                        9367           9773         354          1.1         936.7       1.0X
write char with length 80                         10454          10621         238          1.0        1045.4       0.9X
write varchar with length 80                      18943          19503         571          0.5        1894.3       0.5X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU  2.40GHz
Write with length 100, hasSpaces: true:   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
write string with length 100                      11055          11104          59          0.9        1105.5       1.0X
write char with length 100                        12204          12275          63          0.8        1220.4       0.9X
write varchar with length 100                     21737          22275         574          0.5        2173.7       0.5X

```

Closes #31199 from yaooqinn/SPARK-34130.

Authored-by: Kent Yao <yao@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 6fa2fb9)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@cloud-fan cloud-fan closed this in 6fa2fb9 Jan 19, 2021
@yaooqinn
Copy link
Member Author

thanks for merging

@maropu
Copy link
Member

maropu commented Jan 20, 2021

late lgtm. Nice fix and thanks, @yaooqinn

@dongjoon-hyun
Copy link
Member

So, does it happen only with CBO and new CHAR/VARCHAR combination, @cloud-fan ?

@cloud-fan
Copy link
Contributor

It's unrelated to CBO. It's about the generated code is too big and stops whole-stage-codegen.

skestle pushed a commit to skestle/spark that referenced this pull request Feb 3, 2021
…gth check with StaticInvoke

### What changes were proposed in this pull request?

This could reduce the `generate.java` size to prevent codegen fallback which causes performance regression.

here is a case from tpcds that could be fixed by this improvement
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133964/testReport/org.apache.spark.sql.execution/LogicalPlanTagInSparkPlanSuite/q41/

The original case generate 20K bytes, we are trying to reduce it to less than 8k
### Why are the changes needed?

performance improvement as in the PR benchmark test, the performance  w/ codegen is 2~3x better than w/o codegen.

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

yes, it's a code reflect so the existing ut should be enough

cross-check with apache#31012 where the tpcds shall all pass

benchmark compared with master

```logtalk
================================================================================================
Char Varchar Read Side Perf
================================================================================================

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU  2.40GHz
Read with length 20, hasSpaces: false:    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
read string with length 20                         1571           1667          83         63.6          15.7       1.0X
read char with length 20                           1710           1764          58         58.5          17.1       0.9X
read varchar with length 20                        1774           1792          16         56.4          17.7       0.9X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU  2.40GHz
Read with length 40, hasSpaces: false:    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
read string with length 40                         1824           1927          91         54.8          18.2       1.0X
read char with length 40                           1788           1928         137         55.9          17.9       1.0X
read varchar with length 40                        1676           1700          40         59.7          16.8       1.1X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU  2.40GHz
Read with length 60, hasSpaces: false:    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
read string with length 60                         1727           1762          30         57.9          17.3       1.0X
read char with length 60                           1628           1674          43         61.4          16.3       1.1X
read varchar with length 60                        1651           1665          13         60.6          16.5       1.0X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU  2.40GHz
Read with length 80, hasSpaces: true:     Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
read string with length 80                         1748           1778          28         57.2          17.5       1.0X
read char with length 80                           1673           1678           9         59.8          16.7       1.0X
read varchar with length 80                        1667           1684          27         60.0          16.7       1.0X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU  2.40GHz
Read with length 100, hasSpaces: true:    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
read string with length 100                        1709           1743          48         58.5          17.1       1.0X
read char with length 100                          1610           1664          67         62.1          16.1       1.1X
read varchar with length 100                       1614           1673          53         61.9          16.1       1.1X

================================================================================================
Char Varchar Write Side Perf
================================================================================================

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU  2.40GHz
Write with length 20, hasSpaces: false:   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
write string with length 20                        2277           2327          67          4.4         227.7       1.0X
write char with length 20                          2421           2443          19          4.1         242.1       0.9X
write varchar with length 20                       2393           2419          27          4.2         239.3       1.0X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU  2.40GHz
Write with length 40, hasSpaces: false:   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
write string with length 40                        2249           2290          38          4.4         224.9       1.0X
write char with length 40                          2386           2444          57          4.2         238.6       0.9X
write varchar with length 40                       2397           2405          12          4.2         239.7       0.9X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU  2.40GHz
Write with length 60, hasSpaces: false:   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
write string with length 60                        2326           2367          41          4.3         232.6       1.0X
write char with length 60                          2478           2501          37          4.0         247.8       0.9X
write varchar with length 60                       2475           2503          24          4.0         247.5       0.9X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU  2.40GHz
Write with length 80, hasSpaces: true:    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
write string with length 80                        9367           9773         354          1.1         936.7       1.0X
write char with length 80                         10454          10621         238          1.0        1045.4       0.9X
write varchar with length 80                      18943          19503         571          0.5        1894.3       0.5X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU  2.40GHz
Write with length 100, hasSpaces: true:   Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
write string with length 100                      11055          11104          59          0.9        1105.5       1.0X
write char with length 100                        12204          12275          63          0.8        1220.4       0.9X
write varchar with length 100                     21737          22275         574          0.5        2173.7       0.5X

```

Closes apache#31199 from yaooqinn/SPARK-34130.

Authored-by: Kent Yao <yao@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants