[CARBONDATA-2767][CarbonStore] Fix task locality issue #2528

QiangCai · 2018-07-19T10:35:05Z

If the Spark cluster and the Hadoop cluster are two different machine cluster, the Spark tasks will run in RACK_LOCAL mode.

So no need to provide the preferred locations to the task.

Any interfaces changed?
Any backward compatibility impacted?
Document update required?
Testing done
Please provide details on
- Whether new unit test cases have been added or why no new tests are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance test report.
- Any additional information to help reviewers in testing this change.
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

CarbonDataQA · 2018-07-19T12:49:52Z

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7315/

CarbonDataQA · 2018-07-19T13:26:13Z

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6079/

dependency

CarbonDataQA · 2018-07-23T07:12:49Z

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7388/

CarbonDataQA · 2018-07-23T08:12:14Z

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6149/

ravipesala · 2018-07-23T13:16:51Z

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5962/

jackylk · 2018-07-24T03:16:37Z

core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java

@@ -1882,6 +1882,13 @@

  public static final String CARBON_MERGE_INDEX_IN_SEGMENT_DEFAULT = "true";

+  /**
+   * config carbon scan task locality


Please provide more detail, like what scheduling behavior will be used for true and false

jackylk · 2018-07-24T03:17:20Z

integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonScanRDD.scala

@@ -87,6 +87,8 @@ class CarbonScanRDD[T: ClassTag](
  }
  private var vectorReader = false

+  private val isTaskLocality = CarbonProperties.isTaskLocality


it can be transient

CarbonDataQA · 2018-07-24T09:51:08Z

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7444/

CarbonDataQA · 2018-07-24T11:05:58Z

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6199/

jackylk · 2018-07-24T11:14:17Z

store/sql/pom.xml

+    <dependency>
+      <groupId>com.amazonaws</groupId>
+      <artifactId>aws-java-sdk-s3</artifactId>
+      <version>	1.10.6</version>


CarbonDataQA · 2018-07-25T02:17:02Z

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7472/

xuchuanyin · 2018-07-25T02:51:40Z

integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonScanRDD.scala

+        .split
+        .value
+        .getLocations
+        .filter(_ != "localhost")


What will happen if I configure TaskLocality and run the job in local machine or local pseudo distributed mode?
Besides, if you really want to exclude local machine, except 'localhost', why host name of local machine is not considered?

It doesn't want to exclude local machine.
Because maybe all machines have configured "127.0.0.1 localhost", "localhost" is unuseful.

CarbonDataQA · 2018-07-25T02:58:18Z

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6229/

jackylk · 2018-07-25T07:54:15Z

pom.xml

@@ -110,7 +110,7 @@
  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <snappy.version>1.1.2.6</snappy.version>
-    <hadoop.version>2.7.2</hadoop.version>
+    <hadoop.version>2.8.3</hadoop.version>


please add this in Horizon profile only

CarbonDataQA · 2018-07-25T08:45:06Z

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7492/

jackylk · 2018-07-25T12:47:43Z

LGTM

If the Spark cluster and the Hadoop cluster are two different machine cluster, the Spark tasks will run in RACK_LOCAL mode. This closes #2528

CarbonDataQA · 2018-07-25T13:22:04Z

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6253/

If the Spark cluster and the Hadoop cluster are two different machine cluster, the Spark tasks will run in RACK_LOCAL mode. This closes apache#2528

fix task locality issue

0adf03b

dependency

QiangCai changed the title ~~[WIP][CarbonStore] Fix task locality issue~~ [CARBONDATA-2767][CarbonStore] Fix task locality issue Jul 23, 2018

QiangCai force-pushed the locality branch from 6059676 to 0adf03b Compare July 23, 2018 06:21

jackylk reviewed Jul 24, 2018

View reviewed changes

fix comments

261b364

jackylk reviewed Jul 24, 2018

View reviewed changes

fix comment

871632c

xuchuanyin reviewed Jul 25, 2018

View reviewed changes

jackylk reviewed Jul 25, 2018

View reviewed changes

fix comments

5bbc057

asfgit pushed a commit that referenced this pull request Jul 25, 2018

[CARBONDATA-2767][CarbonStore] Fix task locality issue

2d46288

If the Spark cluster and the Hadoop cluster are two different machine cluster, the Spark tasks will run in RACK_LOCAL mode. This closes #2528

QiangCai closed this Jul 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CARBONDATA-2767][CarbonStore] Fix task locality issue #2528

[CARBONDATA-2767][CarbonStore] Fix task locality issue #2528

QiangCai commented Jul 19, 2018 •

edited

CarbonDataQA commented Jul 19, 2018

CarbonDataQA commented Jul 19, 2018

CarbonDataQA commented Jul 23, 2018

CarbonDataQA commented Jul 23, 2018

ravipesala commented Jul 23, 2018

jackylk Jul 24, 2018

jackylk Jul 24, 2018

CarbonDataQA commented Jul 24, 2018

CarbonDataQA commented Jul 24, 2018

jackylk Jul 24, 2018

CarbonDataQA commented Jul 25, 2018

xuchuanyin Jul 25, 2018

QiangCai Jul 25, 2018

CarbonDataQA commented Jul 25, 2018

jackylk Jul 25, 2018 •

edited

CarbonDataQA commented Jul 25, 2018

jackylk commented Jul 25, 2018

CarbonDataQA commented Jul 25, 2018

[CARBONDATA-2767][CarbonStore] Fix task locality issue #2528

[CARBONDATA-2767][CarbonStore] Fix task locality issue #2528

Conversation

QiangCai commented Jul 19, 2018 • edited

CarbonDataQA commented Jul 19, 2018

CarbonDataQA commented Jul 19, 2018

CarbonDataQA commented Jul 23, 2018

CarbonDataQA commented Jul 23, 2018

ravipesala commented Jul 23, 2018

jackylk Jul 24, 2018

Choose a reason for hiding this comment

jackylk Jul 24, 2018

Choose a reason for hiding this comment

CarbonDataQA commented Jul 24, 2018

CarbonDataQA commented Jul 24, 2018

jackylk Jul 24, 2018

Choose a reason for hiding this comment

CarbonDataQA commented Jul 25, 2018

xuchuanyin Jul 25, 2018

Choose a reason for hiding this comment

QiangCai Jul 25, 2018

Choose a reason for hiding this comment

CarbonDataQA commented Jul 25, 2018

jackylk Jul 25, 2018 • edited

Choose a reason for hiding this comment

CarbonDataQA commented Jul 25, 2018

jackylk commented Jul 25, 2018

CarbonDataQA commented Jul 25, 2018

QiangCai commented Jul 19, 2018 •

edited

jackylk Jul 25, 2018 •

edited