[CARBONDATA-2163][CARBONDATA-2164] Remove spark dependency in core and processing module #1973

jackylk · 2018-02-12T03:39:56Z

The assembly JAR of store-sdk module should be small, but currently it includes spark JAR because carbon-core, carbon-processing, carbon-hadoop modules depends on spark.
This PR removes spark dependency in carbon-core and carbon-processing module

Any interfaces changed?
No
Any backward compatibility impacted?
No
Document update required?
No
Testing done
No logic is modified, rerun all test suites
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
NA

In CarbonTablePath, there is a deprecated partition id which is always 0, it should be removed to avoid confusion. This closes apache#1765

This PR adds support for creating external table with existing carbondata files, using Hive syntax. CREATE EXTERNAL TABLE tableName STORED BY 'carbondata' LOCATION 'path' This closes apache#1749

1.Provide support for s3 in carbondata. 2.Added S3Example to create carbon table on s3. 3.Added S3CSVExample to load carbon table using csv from s3. This closes apache#1805

Unified concepts in scan process flow: 1.QueryModel contains all parameter for scan, it is created by API in CarbonTable. (In future, CarbonTable will be the entry point for various table operations) 2.Use term ColumnChunk to represent one column in one blocklet, and use ChunkIndex in reader to read specified column chunk 3.Use term ColumnPage to represent one page in one ColumnChunk 4.QueryColumn => ProjectionColumn, indicating it is for projection This closes apache#1874

… static method Refactory CarbonTablePath: 1.Remove CarbonStorePath and use CarbonTablePath only. 2.Make CarbonTablePath an utility without object creation, it can avoid creating object before using it, thus code is cleaner and GC is less. This closes apache#1768

…er to executor for s3 implementation in cluster mode. Problem : hadoopconf was not getting propagated from driver to the executor that's why load was failing to the distributed environment. Solution: Setting the Hadoop conf in base class CarbonRDD How to verify this PR : Execute the load in the cluster mode It should be a success using location s3. This closes apache#1860

Datamap Example. Implementation of Min Max Index through Datamap. And Using the Index while prunning. This closes apache#1359

Implemented interfaces for FG datamap and integrated to filterscanner to use the pruned bitset from FG datamap. FG Query flow as follows. 1.The user can add FG datamap to any table and implement there interfaces. 2. Any filter query which hits the table with datamap will call prune method of FGdatamap. 3. The prune method of FGDatamap return list FineGrainBlocklet , these blocklets contain the information of block, blocklet, page and rowids information as well. 4. The pruned blocklets are internally wriitten to file and returns only the block , blocklet and filepath information as part of Splits. 5. Based on the splits scanrdd schedule the tasks. 6. In filterscanner we check the datamapwriterpath from split and reNoteads the bitset if exists. And pass this bitset as input to it. This closes apache#1471

This closes apache#1952

…temp row Pick up the no-sort fields in the row and pack them as bytes array and skip parsing them during merge sort to reduce CPU consumption This closes apache#1792

…ading Carbondata assign blocks to nodes at the beginning of data loading. Previous block allocation strategy is block number based and it will suffer skewed data problem if the size of input files differs a lot. We introduced a size based block allocation strategy to optimize data loading performance in skewed data scenario. This closes apache#1808

… data loading" This reverts commit 6dd8b03.

…or sort temp row" This reverts commit de92ea9.

InterfaceAudience and InterfaceStability annotation should be added for user and developer 1.InetfaceAudience can be User and Developer 2.InterfaceStability can be Stable, Evolving, Unstable This closes apache#1968

Added a new module called store-sdk, and added a CarbonWriter API, it can be used to write Carbondata files to a specified folder, without Spark and Hadoop dependency. User can use this API in any environment. This closes apache#1967

CarbonDataQA · 2018-02-12T03:42:39Z

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3707/

CarbonDataQA · 2018-02-12T03:44:42Z

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2467/

ravipesala · 2018-02-12T03:45:04Z

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3510/

CarbonDataQA · 2018-02-12T06:15:51Z

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3709/

CarbonDataQA · 2018-02-12T06:17:36Z

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2469/

ravipesala · 2018-02-12T06:20:01Z

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3512/

CarbonDataQA · 2018-02-12T07:13:39Z

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3711/

ravipesala · 2018-02-12T07:40:32Z

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3513/

CarbonDataQA · 2018-02-12T07:46:43Z

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2471/

jackylk · 2018-02-12T07:52:47Z

retest this please

To make assembling JAR of store-sdk module, it should not depend on carbon-spark module This closes apache#1970

…temp row Pick up the no-sort fields in the row and pack them as bytes array and skip parsing them during merge sort to reduce CPU consumption This closes apache#1792

ravipesala · 2018-02-12T08:19:53Z

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3514/

CarbonDataQA · 2018-02-12T08:32:10Z

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2473/

CarbonDataQA · 2018-02-12T08:32:59Z

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3714/

…ading Carbondata assign blocks to nodes at the beginning of data loading. Previous block allocation strategy is block number based and it will suffer skewed data problem if the size of input files differs a lot. We introduced a size based block allocation strategy to optimize data loading performance in skewed data scenario. This closes apache#1808

Support generating assembling JAR for store-sdk module and remove junit dependency This closes apache#1976

fix style fix style fix style

CarbonDataQA · 2018-02-13T16:37:51Z

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3740/

ravipesala · 2018-02-13T16:39:03Z

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3539/

CarbonDataQA · 2018-02-13T16:41:24Z

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2500/

CarbonDataQA · 2018-03-04T16:29:30Z

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4068/

CarbonDataQA · 2018-03-04T16:31:32Z

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2822/

ravipesala · 2018-03-04T17:00:58Z

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3762/

jackylk and others added 16 commits February 1, 2018 00:15

[CARBONDATA-1992] Remove partitionId in CarbonTablePath

952665a

In CarbonTablePath, there is a deprecated partition id which is always 0, it should be removed to avoid confusion. This closes apache#1765

[CARBONDATA-1968] Add external table support

111c382

This PR adds support for creating external table with existing carbondata files, using Hive syntax. CREATE EXTERNAL TABLE tableName STORED BY 'carbondata' LOCATION 'path' This closes apache#1749

[CARBONDATA-1827] S3 Carbon Implementation

80b42ac

1.Provide support for s3 in carbondata. 2.Added S3Example to create carbon table on s3. 3.Added S3CSVExample to load carbon table using csv from s3. This closes apache#1805

[REBASE] Solve conflict after rebasing master

71c2d8c

[CARBONDATA-1480]Min Max Index Example for DataMap

cae74a8

Datamap Example. Implementation of Min Max Index through Datamap. And Using the Index while prunning. This closes apache#1359

[HotFix][CheckStyle] Fix import related checkstyle

cd7eed6

This closes apache#1952

[CARBONDATA-2018][DataLoad] Optimization in reading/writing for sort …

de92ea9

…temp row Pick up the no-sort fields in the row and pack them as bytes array and skip parsing them during merge sort to reduce CPU consumption This closes apache#1792

Revert "[CARBONDATA-2023][DataLoad] Add size base block allocation in…

e5c32ac

… data loading" This reverts commit 6dd8b03.

Revert "[CARBONDATA-2018][DataLoad] Optimization in reading/writing f…

e1c6448

…or sort temp row" This reverts commit de92ea9.

[CARBONDATA-2156] Add interface annotation

7f5751a

InterfaceAudience and InterfaceStability annotation should be added for user and developer 1.InetfaceAudience can be User and Developer 2.InterfaceStability can be Stable, Evolving, Unstable This closes apache#1968

[CARBONDATA-1997] Add CarbonWriter SDK API

a848ccf

Added a new module called store-sdk, and added a CarbonWriter API, it can be used to write Carbondata files to a specified folder, without Spark and Hadoop dependency. User can use this API in any environment. This closes apache#1967

jackylk changed the title ~~[CARBONDATA-2161][CARBONDATA-2162] Remove spark dependency in core and processing module~~ [CARBONDATA-2163][CARBONDATA-2164] Remove spark dependency in core and processing module Feb 12, 2018

jackylk and others added 2 commits February 12, 2018 16:06

[CARBONDATA-2159] Remove carbon-spark dependency in store-sdk module

0d50f65

To make assembling JAR of store-sdk module, it should not depend on carbon-spark module This closes apache#1970

[CARBONDATA-2018][DataLoad] Optimization in reading/writing for sort …

937bdb8

…temp row Pick up the no-sort fields in the row and pack them as bytes array and skip parsing them during merge sort to reduce CPU consumption This closes apache#1792

xuchuanyin and others added 3 commits February 12, 2018 19:22

Support generating assembling JAR for store-sdk module

0b15217

Support generating assembling JAR for store-sdk module and remove junit dependency This closes apache#1976

remove spark dependency in carbon-core and carbon-processing module

7547cf8

fix style fix style fix style

jackylk force-pushed the sdk-remove-spark-dependency-in-core2 branch from c62c71d to 7547cf8 Compare February 13, 2018 16:00

asfgit force-pushed the carbonstore branch from c738afb to 8104735 Compare March 4, 2018 15:49

asfgit closed this Mar 12, 2018

jackylk mentioned this pull request Mar 18, 2018

[CARBONDATA-2165]Remove spark in carbon-hadoop module #2074

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CARBONDATA-2163][CARBONDATA-2164] Remove spark dependency in core and processing module #1973

[CARBONDATA-2163][CARBONDATA-2164] Remove spark dependency in core and processing module #1973

jackylk commented Feb 12, 2018 •

edited

CarbonDataQA commented Feb 12, 2018

CarbonDataQA commented Feb 12, 2018

ravipesala commented Feb 12, 2018

CarbonDataQA commented Feb 12, 2018

CarbonDataQA commented Feb 12, 2018

ravipesala commented Feb 12, 2018

CarbonDataQA commented Feb 12, 2018

ravipesala commented Feb 12, 2018

CarbonDataQA commented Feb 12, 2018

jackylk commented Feb 12, 2018

ravipesala commented Feb 12, 2018

CarbonDataQA commented Feb 12, 2018

CarbonDataQA commented Feb 12, 2018

CarbonDataQA commented Feb 13, 2018

ravipesala commented Feb 13, 2018

CarbonDataQA commented Feb 13, 2018

CarbonDataQA commented Mar 4, 2018

CarbonDataQA commented Mar 4, 2018

ravipesala commented Mar 4, 2018

[CARBONDATA-2163][CARBONDATA-2164] Remove spark dependency in core and processing module #1973

[CARBONDATA-2163][CARBONDATA-2164] Remove spark dependency in core and processing module #1973

Conversation

jackylk commented Feb 12, 2018 • edited

CarbonDataQA commented Feb 12, 2018

CarbonDataQA commented Feb 12, 2018

ravipesala commented Feb 12, 2018

CarbonDataQA commented Feb 12, 2018

CarbonDataQA commented Feb 12, 2018

ravipesala commented Feb 12, 2018

CarbonDataQA commented Feb 12, 2018

ravipesala commented Feb 12, 2018

CarbonDataQA commented Feb 12, 2018

jackylk commented Feb 12, 2018

ravipesala commented Feb 12, 2018

CarbonDataQA commented Feb 12, 2018

CarbonDataQA commented Feb 12, 2018

CarbonDataQA commented Feb 13, 2018

ravipesala commented Feb 13, 2018

CarbonDataQA commented Feb 13, 2018

CarbonDataQA commented Mar 4, 2018

CarbonDataQA commented Mar 4, 2018

ravipesala commented Mar 4, 2018

jackylk commented Feb 12, 2018 •

edited