[CARBONDATA-2484][LUCENE]Refactor distributable code and lauch job to clear the datamap from executor(clears segmentMap and remove datamap from cache) #2310

akashrn5 · 2018-05-15T14:35:22Z

Problem:
During query, blockletDataMapFactory maintains a segmentMap which has mapping of
segmentId -> list of index file, and this will be used while getting the extended blocklet
by checking whether the blocklet present in the index or not.
In case of Lucene, the datamap job will be launched and during pruning the segmentMap will be added
in executor and this map will be cleared in driver when drop table is called, but it will not be cleared in executor.
so when the query is fired after table or datamap is dropped, the lucene query fails.

Solution:
So when drop table or drop datamap is called a job is launched which clears the datamaps from segmentMap and cache and
then clears in driver.

This PR also refactors the datamap job classes and other common classes.

Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:

Any interfaces changed?
YES
Any backward compatibility impacted?
NA
Document update required?
NA
Testing done
tested in cluster
Please provide details on
- Whether new unit test cases have been added or why no new tests are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance test report.
- Any additional information to help reviewers in testing this change.
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
NA

CarbonDataQA · 2018-05-15T15:30:52Z

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5898/

CarbonDataQA · 2018-05-15T15:40:24Z

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4744/

ravipesala · 2018-05-15T17:27:43Z

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4932/

akashrn5 · 2018-05-16T09:25:31Z

retest this please

CarbonDataQA · 2018-05-16T10:03:16Z

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5907/

CarbonDataQA · 2018-05-16T10:30:54Z

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4754/

ravipesala · 2018-05-17T05:38:31Z

core/src/main/java/org/apache/carbondata/core/datamap/DataMapChooser.java

+  public DataMapExprWrapper getAllDataMapsForClear(CarbonTable carbonTable)
+      throws IOException {
+    List<TableDataMap> allDataMapFG =
+        DataMapStoreManager.getInstance().getAllVisibleDataMap(carbonTable);


Not just visible, you should get all datamaps

ravipesala · 2018-05-17T06:01:57Z

core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java

+  public CarbonTable getCarbonTable(AbsoluteTableIdentifier identifier) {
+    CarbonTable carbonTable = null;
+    try {
+      carbonTable = CarbonTable


First try getting the table from cache using CarbonMetadata.getInstance.getCarbonTable(dbName, tableName) , if cannot get then read from disk

ravipesala · 2018-05-17T06:03:47Z

core/src/main/java/org/apache/carbondata/core/datamap/DataMapUtil.java

+    }
+    DistributableDataMapFormat dataMapFormat =
+        createDataMapJob(carbonTable, dataMapExprWrapper, validSegments, null, className, true);
+    dataMapJob.execute((DistributableDataMapFormat) dataMapFormat, null);


No need to typecast

ravipesala · 2018-05-17T06:06:21Z

core/src/main/java/org/apache/carbondata/core/datamap/DataMapUtil.java

+    } else {
+      return;
+    }
+    DistributableDataMapFormat dataMapFormat =


What if dataMapExprWrapper is null?

dataMapExprWrapper will be null, if the table does not have datamaps, that check is already there

So won't it fai in line 100, if you don't do null check?

no, if there are no datamaps present, it will return

ravipesala · 2018-05-17T06:07:34Z

core/src/main/java/org/apache/carbondata/core/datamap/DataMapUtil.java

+    }
+  }
+
+  public static void setDataMapJob(Configuration configuration, Object dataMapJob)


Add comments to all public methods

ravipesala · 2018-05-17T06:08:16Z

core/src/main/java/org/apache/carbondata/core/datamap/DataMapUtil.java

+    dataMapJob.execute((DistributableDataMapFormat) dataMapFormat, null);
+  }
+
+  public static DistributableDataMapFormat createDataMapJob(CarbonTable carbonTable,


Make it private

ravipesala · 2018-05-17T06:08:41Z

core/src/main/java/org/apache/carbondata/core/datamap/DataMapUtil.java

+   * @param className
+   * @return
+   */
+  public static Object createDataMapJob(String className) {


Make it private

this will be called from CarbonInputFormatUtil in function setDataMapJobIfConfigured, so we can keep it public

ravipesala · 2018-05-17T06:11:01Z

core/src/main/java/org/apache/carbondata/core/datamap/DistributableDataMapFormat.java

  DistributableDataMapFormat(CarbonTable table,
      DataMapExprWrapper dataMapExprWrapper, List<Segment> validSegments,
-      List<PartitionSpec> partitions, String className) {


Remove the className, it seems not used

ravipesala · 2018-05-17T06:13:38Z

core/src/main/java/org/apache/carbondata/core/datamap/DistributableDataMapFormat.java

-        TableDataMap dataMap = DataMapStoreManager.getInstance()
+        distributable = (DataMapDistributableWrapper) inputSplit;
+        // clear the segmentMap and from cache in executor when there are invalid segments
+        SegmentStatusManager ssm = new SegmentStatusManager(table.getAbsoluteTableIdentifier());


Don't read table statusfile in executor, pass the invalid segments from driver

ravipesala · 2018-05-17T06:18:32Z

datamap/lucene/src/main/java/org/apache/carbondata/datamap/lucene/LuceneDataMapFactoryBase.java

@@ -162,14 +162,16 @@ public DataMapBuilder createBuilder(Segment segment, String shardName) {
   * Get all distributable objects of a segmentid
   */
  @Override
-  public List<DataMapDistributable> toDistributable(Segment segment) {
+  public List<DataMapDistributable> toDistributable(Segment segment, boolean isJobToClearDataMaps) {


Don't change datamap interface, if the segment.getFilteredIndexShardNames() is null then get all.

ravipesala · 2018-05-17T06:18:55Z

...inmaxdatamap/main/java/org/apache/carbondata/datamap/examples/MinMaxIndexDataMapFactory.java

@@ -123,7 +123,8 @@ public DataMapBuilder createBuilder(Segment segment, String shardName)
   * @param segment
   * @return
   */
-  @Override public List<DataMapDistributable> toDistributable(Segment segment) {
+  @Override public List<DataMapDistributable> toDistributable(Segment segment,


Don't change interface

ravipesala · 2018-05-17T06:22:28Z

core/src/main/java/org/apache/carbondata/core/datamap/DataMapUtil.java

+   * @param conf
+   * @throws IOException
+   */
+  public static void setDataMapJobIfConfigured(Configuration conf) throws IOException {


I don't think this method is required here,let it be there in inputformat

CarbonDataQA · 2018-05-17T08:46:35Z

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5932/

CarbonDataQA · 2018-05-17T08:49:01Z

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4777/

ravipesala · 2018-05-17T09:46:55Z

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4962/

CarbonDataQA · 2018-05-17T10:17:04Z

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5935/

CarbonDataQA · 2018-05-17T10:24:51Z

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4780/

CarbonDataQA · 2018-05-17T11:30:11Z

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5937/

CarbonDataQA · 2018-05-17T11:37:47Z

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4782/

ravipesala · 2018-05-17T12:32:51Z

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4965/

CarbonDataQA · 2018-05-17T13:58:46Z

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5941/

ravipesala · 2018-05-17T14:00:40Z

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4967/

CarbonDataQA · 2018-05-17T14:08:02Z

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4786/

ravipesala · 2018-05-17T14:41:04Z

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4968/

CarbonDataQA · 2018-05-17T15:03:44Z

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4789/

CarbonDataQA · 2018-05-17T15:06:35Z

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5945/

ravipesala · 2018-05-17T17:23:49Z

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4972/

ravipesala · 2018-05-17T18:16:56Z

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4974/

… cache from executor

ravipesala · 2018-05-18T06:05:56Z

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4984/

CarbonDataQA · 2018-05-18T06:10:37Z

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4798/

CarbonDataQA · 2018-05-18T06:11:06Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5954/

akashrn5 · 2018-05-18T06:16:15Z

retest this please

CarbonDataQA · 2018-05-18T07:10:57Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5955/

CarbonDataQA · 2018-05-18T07:12:17Z

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4799/

ravipesala · 2018-05-18T08:16:23Z

LGTM

… clear the datamap from executor(clears segmentMap and remove datamap from cache) Problem: During query, blockletDataMapFactory maintains a segmentMap which has mapping of segmentId -> list of index file, and this will be used while getting the extended blocklet by checking whether the blocklet present in the index or not. In case of Lucene, the datamap job will be launched and during pruning the segmentMap will be added in executor and this map will be cleared in driver when drop table is called, but it will not be cleared in executor. so when the query is fired after table or datamap is dropped, the lucene query fails. Solution: So when drop table or drop datamap is called a job is launched which clears the datamaps from segmentMap and cache and then clears in driver. This PR also refactors the datamap job classes and other common classes This closes apache#2310

akashrn5 changed the title ~~[WIP]refactor distributable code and lauch job to clear the segmentmap and cache from executor~~ [WIP]Refactor distributable code and lauch job to clear the datamap from executor(clears segmentMap and remove datamap from cache) May 15, 2018

ravipesala reviewed May 17, 2018

View reviewed changes

akashrn5 force-pushed the refactor_clear_datamaps branch from 51e80e7 to 0618255 Compare May 17, 2018 07:59

akashrn5 force-pushed the refactor_clear_datamaps branch 2 times, most recently from 460b503 to dcd61f3 Compare May 17, 2018 09:31

akashrn5 force-pushed the refactor_clear_datamaps branch from dcd61f3 to 3875025 Compare May 17, 2018 09:56

akashrn5 force-pushed the refactor_clear_datamaps branch from 3875025 to cd004d1 Compare May 17, 2018 12:18

akashrn5 force-pushed the refactor_clear_datamaps branch from cd004d1 to e598736 Compare May 17, 2018 13:53

refactor distributable code and lauch job to clear the segmentmap and…

59fb6b1

… cache from executor

akashrn5 force-pushed the refactor_clear_datamaps branch from e598736 to 59fb6b1 Compare May 18, 2018 05:15

asfgit closed this in 2018048 May 18, 2018

[CARBONDATA-2484][LUCENE]Refactor distributable code and lauch job to clear the datamap from executor(clears segmentMap and remove datamap from cache) #2310

[CARBONDATA-2484][LUCENE]Refactor distributable code and lauch job to clear the datamap from executor(clears segmentMap and remove datamap from cache) #2310

Conversation

akashrn5 commented May 15, 2018 • edited

CarbonDataQA commented May 15, 2018

CarbonDataQA commented May 15, 2018

ravipesala commented May 15, 2018

akashrn5 commented May 16, 2018

CarbonDataQA commented May 16, 2018

CarbonDataQA commented May 16, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CarbonDataQA commented May 17, 2018

CarbonDataQA commented May 17, 2018

ravipesala commented May 17, 2018

CarbonDataQA commented May 17, 2018

CarbonDataQA commented May 17, 2018

CarbonDataQA commented May 17, 2018

CarbonDataQA commented May 17, 2018

ravipesala commented May 17, 2018

CarbonDataQA commented May 17, 2018

ravipesala commented May 17, 2018

CarbonDataQA commented May 17, 2018

ravipesala commented May 17, 2018

CarbonDataQA commented May 17, 2018

CarbonDataQA commented May 17, 2018

ravipesala commented May 17, 2018

ravipesala commented May 17, 2018

ravipesala commented May 18, 2018

CarbonDataQA commented May 18, 2018

CarbonDataQA commented May 18, 2018

akashrn5 commented May 18, 2018

CarbonDataQA commented May 18, 2018

CarbonDataQA commented May 18, 2018

ravipesala commented May 18, 2018

akashrn5 commented May 15, 2018 •

edited