[SPARK-33753][CORE] Reduce the memory footprint and gc of the cache (hadoopJobMetadata) #30725

cxzl25 · 2020-12-11T08:42:59Z

What changes were proposed in this pull request?

Modify cache(hadoopJobMetadata) softValues to weakValues.

Why are the changes needed?

Reduce driver memory pressure, gc time and frequency, job execution time.

HadoopRDD uses soft-reference map to cache jobconf (rdd_id -> jobconf)
When the number of hive partitions read by the driver is large, HadoopRDD.getPartitions will create many jobconfs and add them to the cache.
The executor will also create a jobconf, add it to the cache, and share it among exeuctors.

The number of jobconfs in the driver cache increases the memory pressure. When the driver memory configuration is not high, full gc becoming very frequent, and these jobconfs are hardly reused.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Exist UT
Manual test

AmplabJenkins · 2020-12-11T08:48:15Z

Can one of the admins verify this patch?

cxzl25 · 2020-12-11T08:48:31Z

core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala


  protected val jobConfCacheKey: String = "rdd_%d_job_conf".format(id)

-  protected val inputFormatCacheKey: String = "rdd_%d_input_format".format(id)


SPARK-9585 Removed the inputformat cache.

xkrogen · 2020-12-11T16:29:25Z

Do we really need a separate copy of the JobConf cached for each partition ID? Is there any opportunity for us to reduce the number of JobConfs to begin with? It seems like all of the partitions should be able to safely share the same conf object...?

Regardless, weak references seem more appropriate here than soft.

cxzl25 · 2020-12-13T07:32:43Z

Do we really need a separate copy of the JobConf cached for each partition ID?

Needs. HadoopRDD#getJobConf has such a comment.

spark/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala

Lines 144 to 153 in f8277d3

    
           protected def getJobConf(): JobConf = { 
        
             val conf: Configuration = broadcastedConf.value.value 
        
             if (shouldCloneJobConf) { 
        
               // Hadoop Configuration objects are not thread-safe, which may lead to various problems if 
        
               // one job modifies a configuration while another reads it (SPARK-2546).  This problem occurs 
        
               // somewhat rarely because most jobs treat the configuration as though it's immutable.  One 
        
               // solution, implemented here, is to clone the Configuration object.  Unfortunately, this 
        
               // clone can be very expensive.  To avoid unexpected performance regressions for workloads and 
        
               // Hadoop versions that do not suffer from these thread-safety issues, this cloning is 
        
               // disabled by default.

cxzl25 · 2020-12-14T10:52:58Z

It seems that the github test is ok, and there is a performance improvement in the production environment test.
Can you review this pr if you have time ? @cloud-fan

xkrogen · 2020-12-14T16:18:09Z

I see, thanks for the reference. So IIUC this patch is primarily targeting the spark.hadoop.cloneConf = true use case?

cxzl25 · 2020-12-15T14:04:04Z

I see, thanks for the reference. So IIUC this patch is primarily targeting the spark.hadoop.cloneConf = true use case?

No.
When spark.hadoop.cloneConf=false, HadoopRDD#getPartitions will create a jobconf and add it to hadoopJobMetadata cache.
When the number of partitions of the queried hive table is large, many jobconf objects will be created and added to the cache.
When the drvier memory configuration is small, the driver will use all the memory, and then full gc.

If your hadoop client version is above 2.7, or use the patch of HADOOP-11209, you can enable spark.hadoop.cloneConf=true, at this time the driver will not have too many jobconf objects.

xkrogen · 2020-12-15T16:11:51Z

Thanks for the further explanation, that is very helpful. Seems like potentially the comment in HadoopRDD#getJobConf
should be updated since the concurrency bugs have been fixed in Hadoop since 2.7.0, a pretty old version.

There is still one point I don't understand. It seems that the key for the JobConf in the cache is based on the ID of the RDD, not a per-partition key:

protected val jobConfCacheKey: String = "rdd_%d_job_conf".format(id)

So I would expect there to be one cached entry in hadoopJobMetadata per RDD. How do we end up with one JobConf per partition? Is it because the check if jobConf in cache -> if not put into cache steps are not synchronized, and many threads simultaneously decide that the conf isn't present and then put many copies of the conf into the cache? Or have I missed something?

Thanks for bearing with me as I try to understand this issue!

cxzl25 · 2020-12-16T08:06:52Z

To clarify, the partition here refers to the partition of the hive table, not the rdd partition.
For example, using spark sql to read a hive table, the hive table has 10,000 partitions.
HadoopTableReader#makeRDDForPartitionedTable will create 10,000 Rdd, which means there are 10,000 jobconfs.

cloud-fan · 2020-12-16T08:34:07Z

core/src/main/scala/org/apache/spark/SparkEnv.scala

+  // (e.g., HadoopRDD uses this to cache JobConfs).
  private[spark] val hadoopJobMetadata =
-    CacheBuilder.newBuilder().softValues().build[String, AnyRef]().asMap()
+    CacheBuilder.newBuilder().weakValues().build[String, AnyRef]().asMap()


is it better to put a size limitation for this cache? then soft reference should also be fine.

Use limited size, soft-reference cache can reduce the number of YGC than weak-reference.
But what size should it be limited to?
In fact, the driver rarely has the opportunity to reuse the jobconf of the cache, and it makes sense to share the jobconf in the executor.

Whether or not a limit is in place (which could also be a good back-stop to prevent a huge cache), this could be fine - I think the only risk is that weak references are quite readily reclaimed, so this risks losing most of the caching.

xkrogen · 2020-12-16T16:20:19Z

To clarify, the partition here refers to the partition of the hive table, not the rdd partition.

Now it all makes sense. Thanks for the clarification. Seems I needed to read your original message more carefully.

github-actions · 2021-04-01T00:20:26Z

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

use weakValues

994dd19

github-actions bot added the CORE label Dec 11, 2020

cxzl25 commented Dec 11, 2020

View reviewed changes

cxzl25 added 6 commits December 13, 2020 15:34

trigger test

3d76213

trigger test

371b9d5

Merge remote-tracking branch 'origin' into SPARK-33753

156778e

trigger test

56d2744

Merge remote-tracking branch 'origin' into SPARK-33753

ee5709f

Merge remote-tracking branch 'origin' into SPARK-33753

5c125cd

cloud-fan reviewed Dec 16, 2020

View reviewed changes

github-actions bot added the Stale label Apr 1, 2021

github-actions bot closed this Apr 2, 2021


		protected val jobConfCacheKey: String = "rdd_%d_job_conf".format(id)

		protected val inputFormatCacheKey: String = "rdd_%d_input_format".format(id)

[SPARK-33753][CORE] Reduce the memory footprint and gc of the cache (hadoopJobMetadata) #30725

[SPARK-33753][CORE] Reduce the memory footprint and gc of the cache (hadoopJobMetadata) #30725

Uh oh!

Conversation

cxzl25 commented Dec 11, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

AmplabJenkins commented Dec 11, 2020

Uh oh!

cxzl25 Dec 11, 2020

Choose a reason for hiding this comment

Uh oh!

xkrogen commented Dec 11, 2020

Uh oh!

cxzl25 commented Dec 13, 2020

Uh oh!

cxzl25 commented Dec 14, 2020

Uh oh!

xkrogen commented Dec 14, 2020

Uh oh!

cxzl25 commented Dec 15, 2020

Uh oh!

xkrogen commented Dec 15, 2020

Uh oh!

cxzl25 commented Dec 16, 2020

Uh oh!

cloud-fan Dec 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cxzl25 Dec 16, 2020

Choose a reason for hiding this comment

Uh oh!

srowen Dec 21, 2020

Choose a reason for hiding this comment

Uh oh!

xkrogen commented Dec 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 1, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

cloud-fan Dec 16, 2020 •

edited

Loading

xkrogen commented Dec 16, 2020 •

edited

Loading