[CARBONDATA-3132]Correct the task disrtibution in case of compaction when the actual block nodes and active nodes are different #2953

akashrn5 · 2018-11-27T06:50:10Z

Why This PR?

There is an unequal distribution of tasks during compaction
ex: When the load is done using replication factor 2 and all nodes are active and during compaction one node is down, basically it is not active executor, so the task distribution should take care to distribute the tasks equally among all the active executors instead of giving more tasks to single executor and less to other executor. But sometimes the unequal distribution happens and the compaction becomes sow.

Solution

Currently we are not getting active executors before the node block mapping and sending the list of active executors as null, which will lead to the above problem sometimes. so get the active executors and send for node block mapping logic which will handle to distribute equally.

Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:

Any interfaces changed?
NA
Any backward compatibility impacted?
NA
Document update required?
NA
Testing done
tested using 3 node and 6 node cluster
Please provide details on
- Whether new unit test cases have been added or why no new tests are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance test report.
- Any additional information to help reviewers in testing this change.
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
NA

akashrn5 · 2018-11-27T06:53:14Z

tools/cli/src/test/java/org/apache/carbondata/tool/CarbonCliTest.java

@@ -205,7 +206,7 @@ public void testSummaryOutputAll() {
    expectedOutput = buildLines(
        "## version Details",
        "written_by  Version         ",
-        "TestUtil    1.6.0-SNAPSHOT  ");
+        "TestUtil    "+ CarbonVersionConstants.CARBONDATA_VERSION+"  ");


removed the harded value here

CarbonDataQA · 2018-11-27T07:04:30Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1544/

qiuchenjian · 2018-11-27T07:19:35Z

processing/src/main/java/org/apache/carbondata/processing/util/CarbonLoaderUtil.java

@@ -536,7 +537,8 @@ public static Dictionary getDictionary(AbsoluteTableIdentifier absoluteTableIden
   */
  public static Map<String, List<Distributable>> nodeBlockMapping(List<Distributable> blockInfos) {
    // -1 if number of nodes has to be decided based on block location information
-    return nodeBlockMapping(blockInfos, -1);
+    return nodeBlockMapping(blockInfos, -1, null,


the choice of BlockAssignmentStrategy should be judged by CarbonProperties.getInstance().isLoadSkewedDataOptimizationEnabled() ?

yes, in compaction case BlockAssignmentStrategy.BLOCK_NUM_FIRST is default , same as before

here i just did the refactoring

…lock nodes and active nodes are different

CarbonDataQA · 2018-11-27T08:03:11Z

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1756/

CarbonDataQA · 2018-11-27T08:14:25Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1546/

CarbonDataQA · 2018-11-27T09:27:50Z

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9806/

CarbonDataQA · 2018-11-27T09:39:31Z

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1758/

ravipesala · 2018-11-27T12:26:50Z

integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonMergerRDD.scala

+    val activeNodes = DistributionUtil
+      .ensureExecutorsAndGetNodeList(taskInfoList.asScala, sparkContext)
+
+    val nodeBlockMap = CarbonLoaderUtil.nodeBlockMapping(taskInfoList, -1, activeNodes.asJava)


Below code is redundant, please remove it

CarbonDataQA · 2018-11-27T12:48:39Z

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1554/

CarbonDataQA · 2018-11-27T12:52:54Z

Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9813/

CarbonDataQA · 2018-11-27T12:53:17Z

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1765/

…executors

CarbonDataQA · 2018-11-27T13:09:24Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1555/

CarbonDataQA · 2018-11-27T14:15:50Z

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9814/

CarbonDataQA · 2018-11-27T14:35:24Z

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1766/

ravipesala · 2018-11-28T10:00:31Z

LGTM

…when the actual block nodes and active nodes are different Why This PR? There is an unequal distribution of tasks during compaction ex: When the load is done using replication factor 2 and all nodes are active and during compaction one node is down, basically it is not active executor, so the task distribution should take care to distribute the tasks equally among all the active executors instead of giving more tasks to single executor and less to other executor. But sometimes the unequal distribution happens and the compaction becomes sow. Solution Currently we are not getting active executors before the node block mapping and sending the list of active executors as null, which will lead to the above problem sometimes. so get the active executors and send for node block mapping logic which will handle to distribute equally. This closes #2953

akashrn5 force-pushed the distribute branch from caaa044 to 9a247bb Compare November 27, 2018 06:52

akashrn5 commented Nov 27, 2018

View reviewed changes

qiuchenjian reviewed Nov 27, 2018

View reviewed changes

Correct the task disrtibution in case of compaction when the actual b…

4a5107d

…lock nodes and active nodes are different

akashrn5 force-pushed the distribute branch from 9a247bb to 4a5107d Compare November 27, 2018 08:02

ravipesala reviewed Nov 27, 2018

View reviewed changes

remove unwanted code and remove multiple calls to function to ensure …

6c35e04

…executors

akashrn5 force-pushed the distribute branch from 5c928bb to 6c35e04 Compare November 27, 2018 12:57

asfgit closed this in eeeaf50 Nov 28, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CARBONDATA-3132]Correct the task disrtibution in case of compaction when the actual block nodes and active nodes are different #2953

[CARBONDATA-3132]Correct the task disrtibution in case of compaction when the actual block nodes and active nodes are different #2953

akashrn5 commented Nov 27, 2018

akashrn5 Nov 27, 2018

CarbonDataQA commented Nov 27, 2018

qiuchenjian Nov 27, 2018

akashrn5 Nov 27, 2018

akashrn5 Nov 27, 2018

CarbonDataQA commented Nov 27, 2018

CarbonDataQA commented Nov 27, 2018

CarbonDataQA commented Nov 27, 2018

CarbonDataQA commented Nov 27, 2018

ravipesala Nov 27, 2018

akashrn5 Nov 27, 2018

CarbonDataQA commented Nov 27, 2018

CarbonDataQA commented Nov 27, 2018

CarbonDataQA commented Nov 27, 2018

CarbonDataQA commented Nov 27, 2018

CarbonDataQA commented Nov 27, 2018

CarbonDataQA commented Nov 27, 2018

ravipesala commented Nov 28, 2018

[CARBONDATA-3132]Correct the task disrtibution in case of compaction when the actual block nodes and active nodes are different #2953

[CARBONDATA-3132]Correct the task disrtibution in case of compaction when the actual block nodes and active nodes are different #2953

Conversation

akashrn5 commented Nov 27, 2018

Why This PR?

Solution

akashrn5 Nov 27, 2018

Choose a reason for hiding this comment

CarbonDataQA commented Nov 27, 2018

qiuchenjian Nov 27, 2018

Choose a reason for hiding this comment

akashrn5 Nov 27, 2018

Choose a reason for hiding this comment

akashrn5 Nov 27, 2018

Choose a reason for hiding this comment

CarbonDataQA commented Nov 27, 2018

CarbonDataQA commented Nov 27, 2018

CarbonDataQA commented Nov 27, 2018

CarbonDataQA commented Nov 27, 2018

ravipesala Nov 27, 2018

Choose a reason for hiding this comment

akashrn5 Nov 27, 2018

Choose a reason for hiding this comment

CarbonDataQA commented Nov 27, 2018

CarbonDataQA commented Nov 27, 2018

CarbonDataQA commented Nov 27, 2018

CarbonDataQA commented Nov 27, 2018

CarbonDataQA commented Nov 27, 2018

CarbonDataQA commented Nov 27, 2018

ravipesala commented Nov 28, 2018