Skip to content

Commit

Permalink
Some of the minor issues (formatting, syntax, missing info) corrected
Browse files Browse the repository at this point in the history
  • Loading branch information
sgururajshetty committed Aug 2, 2018
1 parent 2bf7242 commit b5f3308
Show file tree
Hide file tree
Showing 6 changed files with 32 additions and 29 deletions.
2 changes: 1 addition & 1 deletion docs/configuration-parameters.md
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,7 @@ This section provides the details of all the configurations required for CarbonD
| carbon.enableMinMax | true | Min max is feature added to enhance query performance. To disable this feature, set it false. |
| carbon.dynamicallocation.schedulertimeout | 5 | Specifies the maximum time (unit in seconds) the scheduler can wait for executor to be active. Minimum value is 5 sec and maximum value is 15 sec. |
| carbon.scheduler.minregisteredresourcesratio | 0.8 | Specifies the minimum resource (executor) ratio needed for starting the block distribution. The default value is 0.8, which indicates 80% of the requested resource is allocated for starting block distribution. The minimum value is 0.1 min and the maximum value is 1.0. |
| carbon.search.enabled | false | If set to true, it will use CarbonReader to do distributed scan directly instead of using compute framework like spark, thus avoiding limitation of compute framework like SQL optimizer and task scheduling overhead. |
| carbon.search.enabled (Alpha Feature) | false | If set to true, it will use CarbonReader to do distributed scan directly instead of using compute framework like spark, thus avoiding limitation of compute framework like SQL optimizer and task scheduling overhead. |

* **Global Dictionary Configurations**

Expand Down
39 changes: 21 additions & 18 deletions docs/data-management-on-carbondata.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,25 @@ This tutorial is going to introduce all commands and data operations on CarbonDa
* BATCH_SORT: It increases the load performance but decreases the query performance if identified blocks > parallelism.
* GLOBAL_SORT: It increases the query performance, especially high concurrent point query.
And if you care about loading resources isolation strictly, because the system uses the spark GroupBy to sort data, the resource can be controlled by spark.

### Example:

```
CREATE TABLE IF NOT EXISTS productSchema.productSalesTable (
productNumber INT,
productName STRING,
storeCity STRING,
storeProvince STRING,
productCategory STRING,
productBatch STRING,
saleQuantity INT,
revenue INT)
STORED BY 'carbondata'
TBLPROPERTIES ('SORT_COLUMNS'='productName,storeCity',
'SORT_SCOPE'='NO_SORT')
```

**NOTE:** CarbonData also supports "using carbondata". Find example code at [SparkSessionExample](https://github.com/apache/carbondata/blob/master/examples/spark2/src/main/scala/org/apache/carbondata/examples/SparkSessionExample.scala) in the CarbonData repo.

- **Table Block Size Configuration**

Expand Down Expand Up @@ -170,23 +189,6 @@ This tutorial is going to introduce all commands and data operations on CarbonDa
TBLPROPERTIES('LOCAL_DICTIONARY_ENABLE'='true','LOCAL_DICTIONARY_THRESHOLD'='1000',
'LOCAL_DICTIONARY_INCLUDE'='column1','LOCAL_DICTIONARY_EXCLUDE'='column2')
```
### Example:

```
CREATE TABLE IF NOT EXISTS productSchema.productSalesTable (
productNumber INT,
productName STRING,
storeCity STRING,
storeProvince STRING,
productCategory STRING,
productBatch STRING,
saleQuantity INT,
revenue INT)
STORED BY 'carbondata'
TBLPROPERTIES ('SORT_COLUMNS'='productName,storeCity',
'SORT_SCOPE'='NO_SORT')
```
**NOTE:** CarbonData also supports "using carbondata". Find example code at [SparkSessionExample](https://github.com/apache/carbondata/blob/master/examples/spark2/src/main/scala/org/apache/carbondata/examples/SparkSessionExample.scala) in the CarbonData repo.

- **Caching Min/Max Value for Required Columns**
By default, CarbonData caches min and max values of all the columns in schema. As the load increases, the memory required to hold the min and max values increases considerably. This feature enables you to configure min and max values only for the required columns, resulting in optimized memory usage.
Expand All @@ -210,7 +212,7 @@ This tutorial is going to introduce all commands and data operations on CarbonDa
COLUMN_META_CACHE=’col1,col2,col3,…’
```
Columns to be cached can be specifies either while creating tale or after creation of the table.
Columns to be cached can be specified either while creating table or after creation of the table.
During create table operation; specify the columns to be cached in table properties.
Syntax:
Expand Down Expand Up @@ -574,6 +576,7 @@ Users can specify which columns to include and exclude for local dictionary gene
```
REFRESH TABLE dbcarbon.productSalesTable
```

**NOTE:**
* The new database name and the old database name should be same.
* Before executing this command the old table schema and data should be copied into the new database location.
Expand Down
10 changes: 5 additions & 5 deletions docs/datamap/bloomfilter-datamap-guide.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# CarbonData BloomFilter DataMap (Alpha feature in 1.4.0)
# CarbonData BloomFilter DataMap (Alpha Feature)

* [DataMap Management](#datamap-management)
* [BloomFilter Datamap Introduction](#bloomfilter-datamap-introduction)
Expand Down Expand Up @@ -44,7 +44,7 @@ A Bloom filter is a space-efficient probabilistic data structure that is used to
Carbondata introduce BloomFilter as an index datamap to enhance the performance of querying with precise value.
It is well suitable for queries that do precise match on high cardinality columns(such as Name/ID).
Internally, CarbonData maintains a BloomFilter per blocklet for each index column to indicate that whether a value of the column is in this blocklet.
Just like the other datamaps, BloomFilter datamap is managed ablong with main tables by CarbonData.
Just like the other datamaps, BloomFilter datamap is managed along with main tables by CarbonData.
User can create BloomFilter datamap on specified columns with specified BloomFilter configurations such as size and probability.

For instance, main table called **datamap_test** which is defined as:
Expand Down Expand Up @@ -83,9 +83,9 @@ User can create BloomFilter datamap using the Create DataMap DDL:

| Property | Is Required | Default Value | Description |
|-------------|----------|--------|---------|
| INDEX_COLUMNS | YES | | Carbondata will generate BloomFilter index on these columns. Queries on there columns are usually like 'COL = VAL'. |
| BLOOM_SIZE | NO | 640000 | This value is internally used by BloomFilter as the number of expected insertions, it will affects the size of BloomFilter index. Since each blocklet has a BloomFilter here, so the default value is the approximate distinct index values in a blocklet assuming that each blocklet contains 20 pages and each page contains 32000 records. The value should be an integer. |
| BLOOM_FPP | NO | 0.00001 | This value is internally used by BloomFilter as the False-Positive Probability, it will affects the size of bloomfilter index as well as the number of hash functions for the BloomFilter. The value should be in range (0, 1). In one test scenario, a 96GB TPCH customer table with bloom_size=320000 and bloom_fpp=0.00001 will result in 18 false positive samples. |
| INDEX_COLUMNS | YES | | Carbondata will generate BloomFilter index on these columns. Queries on these columns are usually like 'COL = VAL'. |
| BLOOM_SIZE | NO | 640000 | This value is internally used by BloomFilter as the number of expected insertions, it will affect the size of BloomFilter index. Since each blocklet has a BloomFilter here, so the default value is the approximate distinct index values in a blocklet assuming that each blocklet contains 20 pages and each page contains 32000 records. The value should be an integer. |
| BLOOM_FPP | NO | 0.00001 | This value is internally used by BloomFilter as the False-Positive Probability, it will affect the size of bloomfilter index as well as the number of hash functions for the BloomFilter. The value should be in the range (0, 1). In one test scenario, a 96GB TPCH customer table with bloom_size=320000 and bloom_fpp=0.00001 will result in 18 false positive samples. |
| BLOOM_COMPRESS | NO | true | Whether to compress the BloomFilter index files. |


Expand Down
2 changes: 1 addition & 1 deletion docs/datamap/lucene-datamap-guide.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# CarbonData Lucene DataMap (Alpha feature in 1.4.0)
# CarbonData Lucene DataMap (Alpha Feature)

* [DataMap Management](#datamap-management)
* [Lucene Datamap](#lucene-datamap-introduction)
Expand Down
2 changes: 1 addition & 1 deletion docs/datamap/timeseries-datamap-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
* [Compaction](#compacting-pre-aggregate-tables)
* [Data Management](#data-management-with-pre-aggregate-tables)

## Timeseries DataMap Introduction (Alpha feature in 1.3.0)
## Timeseries DataMap Introduction (Alpha Feature)
Timeseries DataMap a pre-aggregate table implementation based on 'pre-aggregate' DataMap.
Difference is that Timeseries DataMap has built-in understanding of time hierarchy and
levels: year, month, day, hour, minute, so that it supports automatic roll-up in time dimension
Expand Down
6 changes: 3 additions & 3 deletions docs/sdk-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -297,7 +297,7 @@ public CarbonWriterBuilder persistSchemaFile(boolean persist);
* by default it is system time in nano seconds.
* @return updated CarbonWriterBuilder
*/
public CarbonWriterBuilder taskNo(String taskNo);
public CarbonWriterBuilder taskNo(long taskNo);
```

```
Expand Down Expand Up @@ -340,7 +340,7 @@ public CarbonWriterBuilder withLoadOptions(Map<String, String> options);
* @throws IOException
* @throws InvalidLoadOptionException
*/
public CarbonWriter buildWriterForCSVInput() throws IOException, InvalidLoadOptionException;
public CarbonWriter buildWriterForCSVInput(org.apache.carbondata.sdk.file.Schema schema) throws IOException, InvalidLoadOptionException;
```

```
Expand All @@ -351,7 +351,7 @@ public CarbonWriter buildWriterForCSVInput() throws IOException, InvalidLoadOpti
* @throws IOException
* @throws InvalidLoadOptionException
*/
public CarbonWriter buildWriterForAvroInput() throws IOException, InvalidLoadOptionException;
public CarbonWriter buildWriterForAvroInput(org.apache.avro.Schema schema) throws IOException, InvalidLoadOptionException;
```

```
Expand Down

0 comments on commit b5f3308

Please sign in to comment.