Mirror of Apache CarbonData
Scala Java Thrift Python Shell Batchfile
brijoobopanna and chenliang613 [HOTFIX] Correct the sentence to be meaningful
[HOTFIX] Correct the sentence to be meaningful

This closes #2637
Latest commit 9ee0f35 Aug 15, 2018
Failed to load latest commit information.
.github [CARBONDATA-1599] Update pull request checklist Oct 31, 2017
assembly [Hoxfix] Upgrade dev version to 1.5.0-SNAPSHOT and fix some small issues Jun 2, 2018
bin [Hoxfix] Upgrade dev version to 1.5.0-SNAPSHOT and fix some small issues Jun 2, 2018
build [CARBONDATA-1981][Build] Fix compile error in windows env Jan 11, 2018
common [CARBONDATA-2720] Remove dead code Jul 12, 2018
conf [CARBONDATA-1993] Carbon properties default values fix and correspond… Mar 9, 2018
core [CARBONDATA-2854] Release the lock of tablestatus file in 'clean file… Aug 15, 2018
datamap [CARBONDATA-2539]Fix mv classcast exception issue Aug 7, 2018
dev [CARBONDATA-2323]Distributed search mode using RPC Apr 21, 2018
docs [HOTFIX] Correct the sentence to be meaningful Aug 15, 2018
examples [CARBONDATA-2839] Add custom compaction example Aug 12, 2018
format [CARBONDATA-2587][CARBONDATA-2588] Local Dictionary Data Loading support Jun 27, 2018
hadoop [CARBONDATA-2834] Remove unnecessary nested looping over loadMetadata… Aug 8, 2018
integration [HOTFIX] Ignore the random failure test case Aug 12, 2018
licenses-binary [CARBONDATA-2756] added BSD license for zstd-jni dependency Jul 23, 2018
processing [CARBONDATA-2817]Thread Leak in Update and in No sort flow Aug 8, 2018
store [CARBONDATA-2621][BloomDataMap] Lock problem in index datamap Jul 24, 2018
streaming [CARBONDATA-2513][32K] Support write long string from dataframe Jun 21, 2018
.gitignore [CARBONDATA-2139] Optimize CTAS documentation and test case Mar 14, 2018
LICENSE [CARBONDATA-2756] added BSD license for zstd-jni dependency Jul 23, 2018
NOTICE [CARBONDATA-2006] project level update Jan 8, 2018
README.md [CARBONDATA-2478] Added datamap-developer-guide.md file to Readme.md Aug 2, 2018
pom.xml [CARBONDATA-2789] Support Hadoop 2.8.3 eco-system integration Jul 31, 2018


Apache CarbonData is an indexed columnar data store solution for fast analytics on big data platform, e.g.Apache Hadoop, Apache Spark, etc.

You can find the latest CarbonData document and learn more at: http://carbondata.apache.org

CarbonData cwiki

Visit count: HitCount


Spark2.2: Build Status Coverage Status Coverity Scan Build Status


CarbonData file format is a columnar store in HDFS, it has many features that a modern columnar format has, such as splittable, compression schema ,complex data type etc, and CarbonData has following unique features:

  • Stores data along with index: it can significantly accelerate query performance and reduces the I/O scans and CPU resources, where there are filters in the query. CarbonData index consists of multiple level of indices, a processing framework can leverage this index to reduce the task it needs to schedule and process, and it can also do skip scan in more finer grain unit (called blocklet) in task side scanning instead of scanning the whole file.
  • Operable encoded data :Through supporting efficient compression and global encoding schemes, can query on compressed/encoded data, the data can be converted just before returning the results to the users, which is "late materialized".
  • Supports for various use cases with one single Data format : like interactive OLAP-style query, Sequential Access (big scan), Random Access (narrow scan).

Building CarbonData

CarbonData is built using Apache Maven, to build CarbonData

Online Documentation

Other Technical Material

Fork and Contribute

This is an active open source project for everyone, and we are always open to people who want to use this system or contribute to it. This guide document introduce how to contribute to CarbonData.

Contact us

To get involved in CarbonData:


Apache CarbonData is an open source project of The Apache Software Foundation (ASF).