[SPARK-27232][SQL]Ignore file locality in InMemoryFileIndex if spark.locality.wait is set to zero #24175

WangGuangxin · 2019-03-22T03:43:46Z

What changes were proposed in this pull request?

InMemoryFileIndex needs to request file block location information in order to do locality schedule in TaskSetManager.

Usually this is a time-cost task. For example, In our production env, there are 24 partitions, with totally 149925 files and 83TB in size. It costs about 10 minutes to request file block locations before submit a spark job. Even though I set spark.sql.sources.parallelPartitionDiscovery.threshold to 24 to make it parallelized, it also needs 2 minutes.

Anyway, this is a waste if we don't care about the locality of files(for example, storage and computation are separate).

So there should be a conf to control whether we need to send getFileBlockLocations request to HDFS NN. If user set spark.locality.wait to 0, file block location information is meaningless.

Here in this PR, if spark.locality.wait is set to 0, it will not request file location information anymore, which will save several seconds to minutes.

How was this patch tested?

tested manually

AmplabJenkins · 2019-03-22T04:32:50Z

Can one of the admins verify this patch?

WangGuangxin · 2019-03-25T06:54:16Z

@srowen Could you please help review this patch?

srowen

Seems plausible, but I don't really know this code well enough to endorse it. Your best bet is to contact people who may have worked on this code most recently. @peter-toth @adrian-ionescu do you have any thoughts? Maybe @cloud-fan for good measure.

peter-toth · 2019-03-25T16:14:44Z

Sorry, I don't know this part of the code either. But it looks like PartitionedFileUtil.getBlockLocations can return an empty array even without this change so it looks viable. But I can't judge if it should be bound to spark.locality.wait.

adrian-ionescu

I'm not very familiar with this code either, but at a high-level I'd say: let's try to come up with some micro-benchmark that can quantify the perf improvement this brings.

adrian-ionescu · 2019-03-26T12:49:05Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InMemoryFileIndex.scala

@@ -168,10 +168,12 @@ object InMemoryFileIndex extends Logging {
      filter: PathFilter,
      sparkSession: SparkSession): Seq[(Path, Seq[FileStatus])] = {

+    val ignoreFileLocality = sparkSession.sparkContext.conf.get[Long](config.LOCALITY_WAIT) == 0L


I think it'd be safer to check the more specific confs as well and only perform this optimization if they're all 0.

spark/core/src/main/scala/org/apache/spark/internal/config/package.scala

Lines 1160 to 1167 in e4b36df

private[spark] val LOCALITY_WAIT_PROCESS = ConfigBuilder("spark.locality.wait.process")

.fallbackConf(LOCALITY_WAIT)

private[spark] val LOCALITY_WAIT_NODE = ConfigBuilder("spark.locality.wait.node")

.fallbackConf(LOCALITY_WAIT)

private[spark] val LOCALITY_WAIT_RACK = ConfigBuilder("spark.locality.wait.rack")

.fallbackConf(LOCALITY_WAIT)

peter-toth · 2019-03-26T13:11:44Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InMemoryFileIndex.scala

-              new BlockLocation(loc.getNames, loc.getHosts, loc.getOffset, loc.getLength)
+              fs.getFileBlockLocations(f, 0, f.getLen).map { loc =>
+                // Store BlockLocation objects to consume less memory
+                if (loc.getClass == classOf[BlockLocation]) {


This part doesn't change in this PR. The new thing here is that we don't look up the block locations, but return an empty array instead.

Oh, sorry, you're right about that. Please disregard.

peter-toth · 2019-03-26T13:11:44Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InMemoryFileIndex.scala

-              new BlockLocation(loc.getNames, loc.getHosts, loc.getOffset, loc.getLength)
+              fs.getFileBlockLocations(f, 0, f.getLen).map { loc =>
+                // Store BlockLocation objects to consume less memory
+                if (loc.getClass == classOf[BlockLocation]) {


This part doesn't change in this PR. The new thing here is that we don't look up the block locations, but return an empty array instead.

squito · 2019-04-01T19:50:18Z

@LantaoJin just pointed me at this based on some discussion in #23951. I totally understand the use case for this, but it needs to use a new config. Even with locality wait == 0, spark still tries to schedule tasks to take advantage of locality. It just means spark won't wait until it gets an offer with better locality. In fact I regularly recommend users to turn locality wait == 0 even on colocated clusters.

Furthermore, even in disagg clusters, you don't necessarily want to turn all locality wait to 0, right? I mean you still might want to wait for locality persisted data from cached rdds?

#23951 pointed out a case for skipping rack resolution entirely on disagg clusters. This is another good case. I'm not entirely sure if they should be controlled by the same thing ... I wonder if there is some hdfs-specific thing which might be appropriate here. Eg. you might have "semi" disagg clusters with most data living remotely, but some small local hdfs. I'm not sure if there is an easy way to figure this out.

WangGuangxin · 2019-05-28T02:08:38Z

Close this since there is a better solution #24672

ignore file locality in InMemoryFileIndex

5e873cb

srowen reviewed Mar 25, 2019

View reviewed changes

adrian-ionescu reviewed Mar 26, 2019

View reviewed changes

peter-toth reviewed Mar 26, 2019

View reviewed changes

LantaoJin mentioned this pull request Apr 1, 2019

[SPARK-13704][CORE][YARN] Re-implement RackResolver to reduce resolving time #23951

Closed

WangGuangxin closed this May 28, 2019

wangshisan mentioned this pull request Sep 21, 2019

[SPARK-29189][SQL] Add an option to ignore block locations when listing file #25869

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-27232][SQL]Ignore file locality in InMemoryFileIndex if spark.locality.wait is set to zero #24175

[SPARK-27232][SQL]Ignore file locality in InMemoryFileIndex if spark.locality.wait is set to zero #24175

WangGuangxin commented Mar 22, 2019

AmplabJenkins commented Mar 22, 2019

WangGuangxin commented Mar 25, 2019

srowen left a comment

peter-toth commented Mar 25, 2019 •

edited

adrian-ionescu left a comment •

edited

adrian-ionescu Mar 26, 2019

peter-toth Mar 26, 2019

adrian-ionescu Mar 26, 2019

peter-toth Mar 26, 2019

squito commented Apr 1, 2019

WangGuangxin commented May 28, 2019

	private[spark] val LOCALITY_WAIT_PROCESS = ConfigBuilder("spark.locality.wait.process")
	.fallbackConf(LOCALITY_WAIT)

	private[spark] val LOCALITY_WAIT_NODE = ConfigBuilder("spark.locality.wait.node")
	.fallbackConf(LOCALITY_WAIT)

	private[spark] val LOCALITY_WAIT_RACK = ConfigBuilder("spark.locality.wait.rack")
	.fallbackConf(LOCALITY_WAIT)

[SPARK-27232][SQL]Ignore file locality in InMemoryFileIndex if spark.locality.wait is set to zero #24175

[SPARK-27232][SQL]Ignore file locality in InMemoryFileIndex if spark.locality.wait is set to zero #24175

Conversation

WangGuangxin commented Mar 22, 2019

What changes were proposed in this pull request?

How was this patch tested?

AmplabJenkins commented Mar 22, 2019

WangGuangxin commented Mar 25, 2019

srowen left a comment

Choose a reason for hiding this comment

peter-toth commented Mar 25, 2019 • edited

adrian-ionescu left a comment • edited

Choose a reason for hiding this comment

adrian-ionescu Mar 26, 2019

Choose a reason for hiding this comment

peter-toth Mar 26, 2019

Choose a reason for hiding this comment

adrian-ionescu Mar 26, 2019

Choose a reason for hiding this comment

peter-toth Mar 26, 2019

Choose a reason for hiding this comment

squito commented Apr 1, 2019

WangGuangxin commented May 28, 2019

peter-toth commented Mar 25, 2019 •

edited

adrian-ionescu left a comment •

edited