Merge 6ec1d47 into d392717

apache · Oct 5, 2018 · 0053dbf · 0053dbf
2 parents d392717 + 6ec1d47
commit 0053dbf
Show file tree

Hide file tree

Showing 21 changed files with 229 additions and 208 deletions.
diff --git a/README.md b/README.md
@@ -45,23 +45,26 @@ CarbonData file format is a columnar store in HDFS, it has many features that a
 CarbonData is built using Apache Maven, to [build CarbonData](https://github.com/apache/carbondata/blob/master/build)
 
 ## Online Documentation
+* [What is CarbonData](https://github.com/apache/carbondata/blob/master/docs/introduction.md)
 * [Quick Start](https://github.com/apache/carbondata/blob/master/docs/quick-start-guide.md)
-* [CarbonData File Structure](https://github.com/apache/carbondata/blob/master/docs/file-structure-of-carbondata.md)
-* [Data Types](https://github.com/apache/carbondata/blob/master/docs/supported-data-types-in-carbondata.md)
-* [Data Management on CarbonData](https://github.com/apache/carbondata/blob/master/docs/language-manual.md)
-* [Configuring Carbondata](https://github.com/apache/carbondata/blob/master/docs/configuration-parameters.md)
-* [Streaming Ingestion](https://github.com/apache/carbondata/blob/master/docs/streaming-guide.md)
-* [SDK Guide](https://github.com/apache/carbondata/blob/master/docs/sdk-guide.md)
-* [S3 Guide](https://github.com/apache/carbondata/blob/master/docs/s3-guide.md)
-* [DataMap Developer Guide](https://github.com/apache/carbondata/blob/master/docs/datamap-developer-guide.md)
-* [CarbonData DataMap Management](https://github.com/apache/carbondata/blob/master/docs/datamap/datamap-management.md)
-* [CarbonData BloomFilter DataMap](https://github.com/apache/carbondata/blob/master/docs/datamap/bloomfilter-datamap-guide.md)
-* [CarbonData Lucene DataMap](https://github.com/apache/carbondata/blob/master/docs/datamap/lucene-datamap-guide.md)
-* [CarbonData Pre-aggregate DataMap](https://github.com/apache/carbondata/blob/master/docs/datamap/preaggregate-datamap-guide.md)
-* [CarbonData Timeseries DataMap](https://github.com/apache/carbondata/blob/master/docs/datamap/timeseries-datamap-guide.md)
-* [Performance Tuning](https://github.com/apache/carbondata/blob/master/docs/performance-tuning.md)
-* [FAQ](https://github.com/apache/carbondata/blob/master/docs/faq.md)
 * [Use Cases](https://github.com/apache/carbondata/blob/master/docs/usecases.md)
+* [Language Reference](https://github.com/apache/carbondata/blob/master/docs/language-manual.md)
+ * [CarbonData Data Definition Language](https://github.com/apache/carbondata/blob/master/docs/ddl-of-carbondata.md) 
+ * [CarbonData Data Manipulation Language](https://github.com/apache/carbondata/blob/master/docs/dml-of-carbondata.md) 
+ * [CarbonData Streaming Ingestion](https://github.com/apache/carbondata/blob/master/docs/streaming-guide.md) 
+ * [Configuring CarbonData](https://github.com/apache/carbondata/blob/master/docs/configuration-parameters.md) 
+ * [DataMap Developer Guide](https://github.com/apache/carbondata/blob/master/docs/datamap-developer-guide.md) 
+ * [Data Types](https://github.com/apache/carbondata/blob/master/docs/supported-data-types-in-carbondata.md) 
+* [CarbonData DataMap Management](https://github.com/apache/carbondata/blob/master/docs/datamap-management.md) 
+ * [CarbonData BloomFilter DataMap](https://github.com/apache/carbondata/blob/master/docs/bloomfilter-datamap-guide.md) 
+ * [CarbonData Lucene DataMap](https://github.com/apache/carbondata/blob/master/docs/lucene-datamap-guide.md) 
+ * [CarbonData Pre-aggregate DataMap](https://github.com/apache/carbondata/blob/master/docs/preaggregate-datamap-guide.md) 
+ * [CarbonData Timeseries DataMap](https://github.com/apache/carbondata/blob/master/docs/timeseries-datamap-guide.md) 
+* [SDK Guide](https://github.com/apache/carbondata/blob/master/docs/sdk-guide.md) 
+* [Performance Tuning](https://github.com/apache/carbondata/blob/master/docs/performance-tuning.md) 
+* [S3 Storage](https://github.com/apache/carbondata/blob/master/docs/s3-guide.md) 
+* [Carbon as Spark's Datasource](https://github.com/apache/carbondata/blob/master/docs/carbon-as-spark-datasource-guide.md) 
+* [FAQs](https://github.com/apache/carbondata/blob/master/docs/faq.md) 
 
 ## Other Technical Material
 * [Apache CarbonData meetup material](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=66850609)

diff --git a/docs/carbon-as-spark-datasource-guide.md b/docs/carbon-as-spark-datasource-guide.md
@@ -15,19 +15,20 @@
     limitations under the License.
 -->
 
-# Carbon as Spark's datasource guide
+# CarbonData as Spark's Datasource
 
-Carbon fileformat can be integrated to Spark using datasource to read and write data without using CarbonSession.
+The CarbonData fileformat is now integrated as Spark datasource for read and write operation without using CarbonSession. This is useful for users who wants to use carbondata as spark's data source. 
 
+**Note:** You can only apply the functions/features supported by spark datasource APIs, functionalities supported would be similar to Parquet. The carbon session features are not supported.
 
 # Create Table with DDL
 
-Carbon table can be created with spark's datasource DDL syntax as follows.
+Now you can create Carbon table using Spark's datasource DDL syntax.
 
 ```
  CREATE [TEMPORARY] TABLE [IF NOT EXISTS] [db_name.]table_name
      [(col_name1 col_type1 [COMMENT col_comment1], ...)]
-     USING carbon
+     USING CARBON
      [OPTIONS (key1=val1, key2=val2, ...)]
      [PARTITIONED BY (col_name1, col_name2, ...)]
      [CLUSTERED BY (col_name3, col_name4, ...) INTO num_buckets BUCKETS]
@@ -41,25 +42,23 @@ Carbon table can be created with spark's datasource DDL syntax as follows.
 
 | Property | Default Value | Description |
 |-----------|--------------|------------|
-| table_blocksize | 1024 | Size of blocks to write onto hdfs |
-| table_blocklet_size | 64 | Size of blocklet to write |
-| local_dictionary_threshold | 10000 | Cardinality upto which the local dictionary can be generated  |
-| local_dictionary_enable | false | Enable local dictionary generation  |
-| sort_columns | all dimensions are sorted | comma separated string columns which to include in sort and its order of sort |
-| sort_scope | local_sort | Sort scope of the load.Options include no sort, local sort ,batch sort and global sort |
-| long_string_columns | null | comma separated string columns which are more than 32k length |
+| table_blocksize | 1024 | Size of blocks to write onto hdfs. For  more details, see [Table Block Size Configuration](./ddl-of-carbondata.md#table-block-size-configuration). |
+| table_blocklet_size | 64 | Size of blocklet to write. |
+| local_dictionary_threshold | 10000 | Cardinality upto which the local dictionary can be generated. For  more details, see [Local Dictionary Configuration](./ddl-of-carbondata.md#local-dictionary-configuration). |
+| local_dictionary_enable | false | Enable local dictionary generation. For  more details, see [Local Dictionary Configuration](./ddl-of-carbondata.md#local-dictionary-configuration). |
+| sort_columns | all dimensions are sorted | Columns to include in sort and its order of sort. For  more details, see [Sort Columns Configuration](./ddl-of-carbondata.md#sort-columns-configuration). |
+| sort_scope | local_sort | Sort scope of the load.Options include no sort, local sort, batch sort, and global sort. For  more details, see [Sort Scope Configuration](./ddl-of-carbondata.md#sort-scope-configuration). |
+| long_string_columns | null | Comma separated string/char/varchar columns which are more than 32k length. For  more details, see [String longer than 32000 characters](./ddl-of-carbondata.md#string-longer-than-32000-characters). |
 
 ## Example 
 
 ```
- CREATE TABLE CARBON_TABLE (NAME  STRING) USING CARBON OPTIONS(‘table_block_size’=’256’)
+ CREATE TABLE CARBON_TABLE (NAME  STRING) USING CARBON OPTIONS('table_block_size'='256')
 ```
 
-Note: User can only apply the features of what spark datasource like parquet supports. It cannot support the features of carbon session like IUD, compaction etc. 
-
 # Using DataFrame
 
-Carbon format can be used in dataframe also using the following way.
+Carbon format can be used in dataframe also. Following are the ways to use carbon format in dataframe.
 
 Write carbon using dataframe 
 ```