[SPARK-19705][SQL] Preferred location supporting HDFS cache for FileS… #17035
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
…canRDD
Added support of HDFS cache using TaskLocation.inMemoryLocationTag
NewHadoopRDD and HadoopRDD both support HDFS cache using TaskLocation.inMemoryLocationTag
where "hdfs_cache_" is added to hostname which is then interpretted by scheduler
With this enhacement same tag ("hdfs_cache_") will be added to hostname if FilePartition only contains single file and the file is cached on one or more host
Current implementation would not cased where FilePartition would have multiple files as preferredLocation calculation is more complex.
What changes were proposed in this pull request?
Added support of HDFS cache using TaskLocation.inMemoryLocationTag
NewHadoopRDD and HadoopRDD both support HDFS cache using TaskLocation.inMemoryLocationTag
where "hdfs_cache_" is added to hostname which is then interpretted by scheduler
With this enhacement same tag ("hdfs_cache_") will be added to hostname if FilePartition only contains single file and the file is cached on one or more host
Current implementation would not work where FilePartition would have multiple files as preferredLocation calculation is more complex.
How was this patch tested?
Add unit tests at FileSourceStrategySuite.scala
Please review http://spark.apache.org/contributing.html before opening a pull request.