Skip to content

Commit

Permalink
Merge 6ec1d47 into d392717
Browse files Browse the repository at this point in the history
  • Loading branch information
sgururajshetty committed Oct 5, 2018
2 parents d392717 + 6ec1d47 commit 0053dbf
Show file tree
Hide file tree
Showing 21 changed files with 229 additions and 208 deletions.
33 changes: 18 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,23 +45,26 @@ CarbonData file format is a columnar store in HDFS, it has many features that a
CarbonData is built using Apache Maven, to [build CarbonData](https://github.com/apache/carbondata/blob/master/build)

## Online Documentation
* [What is CarbonData](https://github.com/apache/carbondata/blob/master/docs/introduction.md)
* [Quick Start](https://github.com/apache/carbondata/blob/master/docs/quick-start-guide.md)
* [CarbonData File Structure](https://github.com/apache/carbondata/blob/master/docs/file-structure-of-carbondata.md)
* [Data Types](https://github.com/apache/carbondata/blob/master/docs/supported-data-types-in-carbondata.md)
* [Data Management on CarbonData](https://github.com/apache/carbondata/blob/master/docs/language-manual.md)
* [Configuring Carbondata](https://github.com/apache/carbondata/blob/master/docs/configuration-parameters.md)
* [Streaming Ingestion](https://github.com/apache/carbondata/blob/master/docs/streaming-guide.md)
* [SDK Guide](https://github.com/apache/carbondata/blob/master/docs/sdk-guide.md)
* [S3 Guide](https://github.com/apache/carbondata/blob/master/docs/s3-guide.md)
* [DataMap Developer Guide](https://github.com/apache/carbondata/blob/master/docs/datamap-developer-guide.md)
* [CarbonData DataMap Management](https://github.com/apache/carbondata/blob/master/docs/datamap/datamap-management.md)
* [CarbonData BloomFilter DataMap](https://github.com/apache/carbondata/blob/master/docs/datamap/bloomfilter-datamap-guide.md)
* [CarbonData Lucene DataMap](https://github.com/apache/carbondata/blob/master/docs/datamap/lucene-datamap-guide.md)
* [CarbonData Pre-aggregate DataMap](https://github.com/apache/carbondata/blob/master/docs/datamap/preaggregate-datamap-guide.md)
* [CarbonData Timeseries DataMap](https://github.com/apache/carbondata/blob/master/docs/datamap/timeseries-datamap-guide.md)
* [Performance Tuning](https://github.com/apache/carbondata/blob/master/docs/performance-tuning.md)
* [FAQ](https://github.com/apache/carbondata/blob/master/docs/faq.md)
* [Use Cases](https://github.com/apache/carbondata/blob/master/docs/usecases.md)
* [Language Reference](https://github.com/apache/carbondata/blob/master/docs/language-manual.md)
* [CarbonData Data Definition Language](https://github.com/apache/carbondata/blob/master/docs/ddl-of-carbondata.md)
* [CarbonData Data Manipulation Language](https://github.com/apache/carbondata/blob/master/docs/dml-of-carbondata.md)
* [CarbonData Streaming Ingestion](https://github.com/apache/carbondata/blob/master/docs/streaming-guide.md)
* [Configuring CarbonData](https://github.com/apache/carbondata/blob/master/docs/configuration-parameters.md)
* [DataMap Developer Guide](https://github.com/apache/carbondata/blob/master/docs/datamap-developer-guide.md)
* [Data Types](https://github.com/apache/carbondata/blob/master/docs/supported-data-types-in-carbondata.md)
* [CarbonData DataMap Management](https://github.com/apache/carbondata/blob/master/docs/datamap-management.md)
* [CarbonData BloomFilter DataMap](https://github.com/apache/carbondata/blob/master/docs/bloomfilter-datamap-guide.md)
* [CarbonData Lucene DataMap](https://github.com/apache/carbondata/blob/master/docs/lucene-datamap-guide.md)
* [CarbonData Pre-aggregate DataMap](https://github.com/apache/carbondata/blob/master/docs/preaggregate-datamap-guide.md)
* [CarbonData Timeseries DataMap](https://github.com/apache/carbondata/blob/master/docs/timeseries-datamap-guide.md)
* [SDK Guide](https://github.com/apache/carbondata/blob/master/docs/sdk-guide.md)
* [Performance Tuning](https://github.com/apache/carbondata/blob/master/docs/performance-tuning.md)
* [S3 Storage](https://github.com/apache/carbondata/blob/master/docs/s3-guide.md)
* [Carbon as Spark's Datasource](https://github.com/apache/carbondata/blob/master/docs/carbon-as-spark-datasource-guide.md)
* [FAQs](https://github.com/apache/carbondata/blob/master/docs/faq.md)

## Other Technical Material
* [Apache CarbonData meetup material](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=66850609)
Expand Down
29 changes: 14 additions & 15 deletions docs/carbon-as-spark-datasource-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,19 +15,20 @@
limitations under the License.
-->

# Carbon as Spark's datasource guide
# CarbonData as Spark's Datasource

Carbon fileformat can be integrated to Spark using datasource to read and write data without using CarbonSession.
The CarbonData fileformat is now integrated as Spark datasource for read and write operation without using CarbonSession. This is useful for users who wants to use carbondata as spark's data source.

**Note:** You can only apply the functions/features supported by spark datasource APIs, functionalities supported would be similar to Parquet. The carbon session features are not supported.

# Create Table with DDL

Carbon table can be created with spark's datasource DDL syntax as follows.
Now you can create Carbon table using Spark's datasource DDL syntax.

```
CREATE [TEMPORARY] TABLE [IF NOT EXISTS] [db_name.]table_name
[(col_name1 col_type1 [COMMENT col_comment1], ...)]
USING carbon
USING CARBON
[OPTIONS (key1=val1, key2=val2, ...)]
[PARTITIONED BY (col_name1, col_name2, ...)]
[CLUSTERED BY (col_name3, col_name4, ...) INTO num_buckets BUCKETS]
Expand All @@ -41,25 +42,23 @@ Carbon table can be created with spark's datasource DDL syntax as follows.

| Property | Default Value | Description |
|-----------|--------------|------------|
| table_blocksize | 1024 | Size of blocks to write onto hdfs |
| table_blocklet_size | 64 | Size of blocklet to write |
| local_dictionary_threshold | 10000 | Cardinality upto which the local dictionary can be generated |
| local_dictionary_enable | false | Enable local dictionary generation |
| sort_columns | all dimensions are sorted | comma separated string columns which to include in sort and its order of sort |
| sort_scope | local_sort | Sort scope of the load.Options include no sort, local sort ,batch sort and global sort |
| long_string_columns | null | comma separated string columns which are more than 32k length |
| table_blocksize | 1024 | Size of blocks to write onto hdfs. For more details, see [Table Block Size Configuration](./ddl-of-carbondata.md#table-block-size-configuration). |
| table_blocklet_size | 64 | Size of blocklet to write. |
| local_dictionary_threshold | 10000 | Cardinality upto which the local dictionary can be generated. For more details, see [Local Dictionary Configuration](./ddl-of-carbondata.md#local-dictionary-configuration). |
| local_dictionary_enable | false | Enable local dictionary generation. For more details, see [Local Dictionary Configuration](./ddl-of-carbondata.md#local-dictionary-configuration). |
| sort_columns | all dimensions are sorted | Columns to include in sort and its order of sort. For more details, see [Sort Columns Configuration](./ddl-of-carbondata.md#sort-columns-configuration). |
| sort_scope | local_sort | Sort scope of the load.Options include no sort, local sort, batch sort, and global sort. For more details, see [Sort Scope Configuration](./ddl-of-carbondata.md#sort-scope-configuration). |
| long_string_columns | null | Comma separated string/char/varchar columns which are more than 32k length. For more details, see [String longer than 32000 characters](./ddl-of-carbondata.md#string-longer-than-32000-characters). |

## Example

```
CREATE TABLE CARBON_TABLE (NAME STRING) USING CARBON OPTIONS(table_block_size’=’256)
CREATE TABLE CARBON_TABLE (NAME STRING) USING CARBON OPTIONS('table_block_size'='256')
```

Note: User can only apply the features of what spark datasource like parquet supports. It cannot support the features of carbon session like IUD, compaction etc.

# Using DataFrame

Carbon format can be used in dataframe also using the following way.
Carbon format can be used in dataframe also. Following are the ways to use carbon format in dataframe.

Write carbon using dataframe
```
Expand Down
Loading

0 comments on commit 0053dbf

Please sign in to comment.