[SPARK-25224][SQL] Improvement of Spark SQL ThriftServer memory management #22219

Dooyoung-Hwang · 2018-08-24T12:25:47Z

What changes were proposed in this pull request?

Spark SQL only have two options for managing thriftserver memory - enable spark.sql.thriftServer.incrementalCollect or not

The case of enabling spark.sql.thriftServer.incrementalCollects

Pros
- thriftserver can handle large output without OOM.
Cons
- Performance degradation because of executing task partition by partition.
- Handle queries with count-limit inefficiently because of executing all partitions.
- Does not cache result for FETCH_FIRST

The case of disabling spark.sql.thriftServer.incrementalCollects

Pros
- Good performance for small output
Cons
- Memory peak usage is too large because allocating decompressed & deserialized rows in "batch" manner, and OOM could occur for large output.
- It is difficult to measure memory peak usage of Query, so configuring spark.driver.maxResultSize is very difficult.
- If decompressed & deserialized rows fills up eden area of JVM Heap, they moves to old Gen and could increase possibility of "Full GC" that stops the world.

The improvement idea of solving these problems is below.

DataSet does not decompress & deserialize result, and just return total row count & iterator to SQL-Executor. By doing that, only compressed data reside in memory, so that the memory usage is not only much lower than before but can be controlled with spark.driver.maxResultSize config.
After SQL-Executor get total row count & iterator from DataSet, SQL-Executor could decide whether deserializing them collectively or iteratively with considering returned row count.

How was this patch tested?

Add test cases.

kiszk · 2018-08-24T15:20:03Z

Did you verify this feature manually?

kiszk · 2018-08-24T15:21:09Z

sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala

nit: two more spaces for indentation?

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala

kiszk · 2018-08-24T15:22:57Z

...r/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala

nit: Is it better to put them into one line?

kiszk · 2018-08-24T15:23:02Z

...r/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala

nit: Is it better to put them into one line?

Dooyoung-Hwang · 2018-08-24T19:37:51Z

Yes, I verified results of a variety of queries, and memory & performance.

This patch passed all our query test. And there was no performance degradation in our test cases.

And below is result of memory test.
I checked memory utilization of Old Gen in JVM Heap when executes query of 2,481,284 rows. (I Executed "jstat -gc thriftserver-pid" and checked OU field.)

After patch : 283910.0KB -> 316108.3KB => 31.44MB increases
Before patch : 279425.6KB -> 1511834.2KB => 1203.52MB increases

Memory improvement is very large, because the size of compressed result buffer surprisingly smaller than I expected. Decompressed InternalRows are collected immediately after sending them while Young GC is done, so the usage of Old Gen Heap is much smaller than before.

kiszk · 2018-08-26T18:09:33Z

sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala

Would it be good to make it public?

yea, why is it public

And add a description for it.

I think it would be good to make public. There is no "action" to decode rows incrementally except toLocalIterator. toLocalIterator has poor performance for data sources that have many partitions or for selecting rows with limited count. So it would be good choice providing another option for reducing memory pressure of decompressing & deserializing result rows.

We can always make it public later if there is such requirement. We should be careful to add public api.

Yeah, we need to consider the decision carefully. At least, if we would decide it as public, is it better to add @experimental?

Add description to collectCountAndIterator, but don't make it public or experimental yet.

kiszk · 2018-08-26T18:19:45Z

Would it possible to prepare test cases? IIUC, this feature can be enabled without thriftServer by writing some test code.

viirya · 2018-08-27T02:36:49Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala

Please add some description for this method.

viirya · 2018-08-27T02:38:15Z

sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala

And add a description for it.

viirya · 2018-08-27T02:39:42Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala

Most of above change looks just refactoring. Looks fine but may be avoided to reduce diff.

@kiszk
Yeah, I'll prepare test cases.

@viirya
Above are changed to execute decodeUnsafeRows lazily for reduce peak memory. Changing type of numPartsToTry to val may be refactoring part that can be separated from this patch. If reviewers want to revert this refactoring, I can separate it and make another trivial pull request for it.

@kiszk
Do we revert this commit to reduce diff?

It is also fine to revert this.

viirya · 2018-08-27T03:00:21Z

...r/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala

When incremental collect is disabled, and users want to use FETCH_FIRST, we expect the returned rows are cached and can get iterators again. Now FETCH_FIRST will trigger re-execution no matter incremental collect or not. I think this maybe performance regression in some cases.

Yes, the case you commented (incremental collect is disabled & FETCH_FIRST) has performance degradation. If total rows are bigger than batchCollectLimit, I thought it is not suitable case for caching decompressed rows because of memory pressure. If FETCH_FIRST caches compressed rows(not decompressed rows) regardless of row count, the result that exceed batchLimit can be cached too. But "Iterator" return type may not a good choice for that. Instead "View" of scala is proper choice, because "Iterator" can be created again with "View" of compressed rows, but it causes much more source change so I didn't do that.

@viirya
I share my idea of solving the problem you commented.

Change the return type of "collectCountAndIterator" to tuple of (Long, SeqView)

The SeqView is created from encoded result array(which is the result of getByteArrayRdd().collect() in SparkPlan), and holds deserializing operations defined in DataSet.

Change type of resultList in SparkExecuteStatementOperation to Option[Iterable[SparkRow]], because both Array & SeqView are Iterable.

ThriftServer checks if row count exceeds THRIFTSERVER_BATCH_COLLECTION_LIMIT, and decide.
-> if row count > THRIFTSERVER_BATCH_COLLECTION_LIMIT => resultList cache SeqView.
-> else resultList caches Array which is collected from SeqView. => resultList cache Array.

How do you think about this idea?

I think we should try to cache encoded result if row count >THRIFTSERVER_BATCH_COLLECTION_LIMIT when incremental collect is disabled. It sounds to me more close to what the mode does.

Otherwise its behavior looks close as incremental collect mode as it does re-execution. Besides, it collects all data back to driver in encoded format.

I will try to cache it. Thank you for reply.

HyukjinKwon · 2018-08-27T15:00:04Z

sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala

I don't think we should expose this as an API. This JIRA/PR don't target this API anyway, right? shall we just leave it as private?

Ok. I agree with you.

Dooyoung-Hwang · 2018-08-28T06:46:41Z

Add test cases.

Dooyoung-Hwang · 2018-08-28T16:12:01Z

Change the accessor of collectCountAndIterator to private[sql]. And updated doc of feature that I define in ThriftServer.

Dooyoung-Hwang · 2018-08-30T12:08:05Z

@kiszk @viirya @HyukjinKwon @cloud-fan
Could you review this patch?

kiszk · 2018-09-01T09:55:04Z

@Dooyoung-Hwang Would it possible to add a test case to verify result with and without incrementalCollects by changing a value of spark.sql.thriftServer.batchDeserializeLimit?

maropu · 2018-09-04T04:46:18Z

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

nit: s"client'. Only valid if ${THRIFTSERVER_INCREMENTAL_COLLECT.key} is false. " +

maropu · 2018-09-04T04:49:08Z

sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala

If this is a thriftserver specific issue, can we do the same thing by fixing code only in the thriftserver package?
IMHO we'd be better not to modify code in the sql package as much as possible.

Currently, there is no API to deserialize result iteratively in Dataset.

You mean there is no way to implement that functionality outside Dataset?

Yes, that's what I mean. I thought that 'deserializer' is declared with private, so there is now way to get 'deserializer' out of Dataset.

how about changing private to private[sql], then implementing this based on the deserializer?

Ok, you don't prefer adding function to Dataset. If withAction & deserializer are changed to private[sql], this implementation can be moved out. Is this function useful for other SQL server to reduce memory usage of query execution? I don't think it looks good because Projection is created in the outside of Dataset.

maropu · 2018-09-04T07:16:43Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala

You'd be better to separate this PR into two parts you proposed in the PR description.

Do you mean to separate this PR by sql part and thriftserver part?

For example, collectCountAndSeqView and executeTakeSeqView depend on each other? If no, please split them into separate PRs.

maropu · 2018-09-04T07:21:38Z

...r/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala

Can you keep the current behavior? Then, please implement a SeqView iteration model turned on/off by a new option.

Yea, I'll change this feature as boolean. Thank you for review.

tooptoop4 · 2018-10-10T02:32:32Z

@HyukjinKwon can this be merged?

HyukjinKwon · 2018-10-14T09:12:26Z

ok to test

HyukjinKwon · 2018-10-14T09:12:38Z

nope not yet. It needs some more review iterations.

SparkQA · 2018-10-14T12:42:30Z

Test build #97363 has finished for PR 22219 at commit ffafd62.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-10-15T11:46:12Z

Test build #97376 has finished for PR 22219 at commit f05570c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-10-15T15:29:10Z

Test build #97388 has finished for PR 22219 at commit 2a41b70.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

Dooyoung-Hwang · 2018-10-16T11:49:38Z

Dear reviewers (cc : @dongjoon-hyun )

I updated these.

No behavior changes, if the new config is off. So, PR SPARK-25353 is not required for this PR anymore.
Apply review comments. (Define config as boolean, and do not add function to DataSet.)
Add test case to test fetch_next & fetch_first.

SparkQA · 2018-10-16T15:21:07Z

Test build #97452 has finished for PR 22219 at commit e5baa50.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2018-10-17T05:01:34Z

cc @srinathshankar @yuchenhuo

SparkQA · 2018-10-18T14:20:46Z

Test build #97530 has finished for PR 22219 at commit 136a4f9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2018-10-23T02:18:11Z

ok to test

SparkQA · 2018-10-23T06:02:08Z

Test build #97893 has finished for PR 22219 at commit 136a4f9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tooptoop4 · 2019-02-15T21:04:36Z

please merge

Dooyoung-Hwang · 2019-02-18T06:34:06Z

I cannot progress further, because there is no review-comment anymore.
If committers are positively considering committing this PR, I will be willing to follow the review process.
This patch was applied to my company several months ago, and have helped my company's infra get out from "OutOfMemory of thrift server".

yassineazzouz · 2019-02-21T20:51:10Z

I have been using this patch for sometimes too and I can confirm that it helped a lot with OutOfMemory and GC issues on the thrift server. I believe it could benefit other users if merged.

tooptoop4 · 2019-02-23T11:00:28Z

@Dooyoung-Hwang sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala conflict?

When collect result with this api, only compressed rows reside in heap memory, so that heap usage of collection can be controlled by caller side adaptively.

1. executeTakeIterator returns iterator of decoded rows but only encoded rows reside in heap space 2. Now excuteTake use executeTakeIterator. So it decodes row exactly n-times.

- So that Thriftserver can decide whether to decompress & deserialize them all together or do it incrementally with considering total row count. If total row count is bigger than configured threshold(spark.sql.thriftServer. batchCollectionLimit), rows are decoded incrementally before they are sent to client. Else case is collected all together for performance. - Add feature to SQLConf for configuring output result rows. spark.sql.thriftServer.batchCollectionLimit : When a count of result row exceed this, result rows are collected incrementally. Only valid for non-incremental collection. Default is no limit. - Use return type to SeqView to use cached result for the FETCH_FIRST

SparkQA · 2019-02-25T13:23:51Z

Test build #102744 has finished for PR 22219 at commit aa0cb41.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tooptoop4 · 2019-02-25T19:57:33Z

@HyukjinKwon pls merge

gatorsmile · 2019-02-25T23:57:23Z

Can we implement something like JDBC' ResultSet

Dooyoung-Hwang · 2019-02-26T16:10:54Z

Maybe FETCH_REVERSE or previous() would be difficult, because this feature is based on the Iterator of scala.

SparkQA · 2019-02-27T00:37:00Z

Test build #102801 has finished for PR 22219 at commit c4c2177.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

kaushikCanada · 2019-10-25T22:53:33Z

has this patch been introduced into spark? can i use this on the new spark 2.4?

maropu · 2019-10-26T01:39:03Z

Net merged yet.

AmplabJenkins · 2019-11-18T20:37:13Z

Can one of the admins verify this patch?

github-actions · 2020-02-27T00:13:03Z

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

RamakrishnaChilaka · 2021-11-09T06:03:29Z

@Dooyoung-Hwang @maropu , is this merged ? we are facing these issues in production. if its merged, can we use this in spark 3.2.0

dongjoon-hyun · 2021-11-09T16:18:30Z

This is not merged, @RamakrishnaChilaka .

RamakrishnaChilaka · 2021-11-09T17:55:51Z

@dongjoon-hyun , will this be merged ? if yes, can we please update this Patch, so that it works for spark 3.2.0. Please confirm. thanks.

kiszk reviewed Aug 24, 2018

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala Outdated Show resolved Hide resolved

kiszk reviewed Aug 24, 2018

View reviewed changes

kiszk reviewed Aug 26, 2018

View reviewed changes

viirya reviewed Aug 27, 2018

View reviewed changes

HyukjinKwon reviewed Aug 27, 2018

View reviewed changes

Dooyoung-Hwang force-pushed the master branch from fbda573 to ffafd62 Compare August 31, 2018 17:36

maropu reviewed Sep 4, 2018

View reviewed changes

Dooyoung-Hwang mentioned this pull request Sep 6, 2018

[SPARK-25353][SQL] executeTake in SparkPlan is modified to avoid unnecessary decoding. #22347

Closed

Dooyoung-Hwang force-pushed the master branch from ffafd62 to f05570c Compare October 15, 2018 08:00

Dooyoung Hwang added 6 commits February 25, 2019 15:35

Add collectCountAndIterator api to DataSet.

7adcb16

When collect result with this api, only compressed rows reside in heap memory, so that heap usage of collection can be controlled by caller side adaptively.

Add executeTakeIterator to SparkPlan

1da407f

1. executeTakeIterator returns iterator of decoded rows but only encoded rows reside in heap space 2. Now excuteTake use executeTakeIterator. So it decodes row exactly n-times.

Do not change executeTake & Change feature to boolean

5f52297

Add test case of incrementalDeserialize config

6aa2f65

Refactor collection function of SeqView using implicit class

aa0cb41

Dooyoung-Hwang force-pushed the master branch from 136a4f9 to aa0cb41 Compare February 25, 2019 08:59

Add test case that uses ResultSet to get query result.

c4c2177

dongjoon-hyun added the SQL label Jun 14, 2019

github-actions bot added the Stale label Feb 27, 2020

github-actions bot closed this Feb 28, 2020

[SPARK-25224][SQL] Improvement of Spark SQL ThriftServer memory management #22219

[SPARK-25224][SQL] Improvement of Spark SQL ThriftServer memory management #22219

Uh oh!

Conversation

Dooyoung-Hwang commented Aug 24, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

The case of enabling spark.sql.thriftServer.incrementalCollects

The case of disabling spark.sql.thriftServer.incrementalCollects

The improvement idea of solving these problems is below.

How was this patch tested?

Uh oh!

kiszk commented Aug 24, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Dooyoung-Hwang commented Aug 24, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kiszk commented Aug 26, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Dooyoung-Hwang Aug 27, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Dooyoung-Hwang Aug 29, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Dooyoung-Hwang commented Aug 28, 2018

Uh oh!

Dooyoung-Hwang commented Aug 28, 2018

Uh oh!

Dooyoung-Hwang commented Aug 30, 2018

Uh oh!

kiszk commented Sep 1, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Dooyoung-Hwang commented Aug 24, 2018 •

edited

Loading

Dooyoung-Hwang Aug 27, 2018 •

edited

Loading

Dooyoung-Hwang Aug 29, 2018 •

edited

Loading