[CARBONDATA-1526] [PreAgg] Added support to compact segments in pre-agg table #1605

kunal642 · 2017-12-04T05:28:23Z

This PR will add to compact the pre-aggregate tables.

A pre-aggregate table can be compacted using the alter command i.e alter table table_name compact 'minor/major'.
If a table with some pre-aggregate table is compacted, then all the pre-aggregate tables are also compacted with the parent table

Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:

Any interfaces changed?
Any backward compatibility impacted?
Document update required?
Testing done
Please provide details on
- Whether new unit test cases have been added or why no new tests are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance test report.
- Any additional information to help reviewers in testing this change.
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

kunal642 · 2017-12-04T05:28:57Z

@jackylk @ravipesala Can you please start first level review.

ravipesala · 2017-12-04T05:33:33Z

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2051/

CarbonDataQA · 2017-12-04T05:45:22Z

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1661/

ravipesala · 2017-12-04T06:48:31Z

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2054/

CarbonDataQA · 2017-12-04T07:55:25Z

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1667/

CarbonDataQA · 2017-12-04T15:22:13Z

Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/417/

ravipesala · 2017-12-04T15:28:00Z

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2075/

CarbonDataQA · 2017-12-04T16:23:41Z

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1689/

ravipesala · 2017-12-05T04:16:41Z

@kunal642 Please add the description of the work done in this PR.

kunal642 · 2017-12-05T06:59:34Z

@ravipesala description added

jackylk · 2017-12-05T14:36:08Z

integration/spark-common/src/main/scala/org/apache/carbondata/events/AlterTableEvents.scala

@@ -133,25 +133,25 @@ case class AlterTableRenameAbortEvent(carbonTable: CarbonTable,
 /**
 *
 * @param carbonTable
- * @param carbonLoadModel
+ * @param carbonMergerMapping


Add description of all these parameters

jackylk · 2017-12-05T14:36:59Z

integration/spark-common/src/main/scala/org/apache/carbondata/events/AlterTableEvents.scala

 * @param mergedLoadName
 * @param sQLContext
 */
-case class AlterTableCompactionPreEvent(sparkSession: SparkSession, carbonTable: CarbonTable,
-    carbonLoadModel: CarbonLoadModel,
+case class AlterTableCompactionPreEvent(carbonTable: CarbonTable,


move carbonTable to next line, same for all events

jackylk · 2017-12-05T14:37:29Z

integration/spark-common/src/main/scala/org/apache/carbondata/events/AlterTableEvents.scala

    mergedLoadName: String,
    sQLContext: SQLContext) extends Event with AlterTableCompactionEventInfo


 /**
 *
 * @param carbonTable
- * @param carbonLoadModel
+ * @param carbonMergerMapping


Add description of all these parameters, same for all related events

jackylk · 2017-12-05T14:39:47Z

integration/spark-common/src/main/scala/org/apache/carbondata/events/LoadEvents.scala

+ * Class for handling operations after data load completion and before final commit of load
+ * operation. Example usage: For loading pre-aggregate tables
+ */
+case class LoadTablePreStatusUpdateEvent(sparkSession: SparkSession,


The usage of this is not clear, why is it needed

This event will be used to perform some task just before updating the carbontable status.
For example: When loading data into parent table we need to start load for all child datamaps so that if any of the load fails then parent table status file would not be written.

jackylk · 2017-12-05T14:43:11Z

integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/Compactor.scala

-        carbonLoadModel,
-        mergedLoadName,
-        sc)
+      AlterTableCompactionPreEvent(carbonTable, carbonMergerMapping, mergedLoadName, sc)


please modify line 55, it is not storePath, it is tablePath. change CarbonMergerMapping definition also

jackylk · 2017-12-05T14:44:16Z

integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala

@@ -492,6 +492,10 @@ object CarbonDataRDDFactory {
        throw new Exception("No Data to load")
      }
      writeDictionary(carbonLoadModel, result, writeAll = false)
+      val loadTablePreStatusUpdateEvent = LoadTablePreStatusUpdateEvent(sqlContext.sparkSession,


Why is it required?

Used to do some operation before commiting table status of parent

jackylk · 2017-12-05T14:45:41Z