-
Notifications
You must be signed in to change notification settings - Fork 703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Documentation] Editorial review comment fixed #2603
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -140,7 +140,7 @@ This section provides the details of all the configurations required for CarbonD | |
| carbon.enableMinMax | true | Min max is feature added to enhance query performance. To disable this feature, set it false. | | ||
| carbon.dynamicallocation.schedulertimeout | 5 | Specifies the maximum time (unit in seconds) the scheduler can wait for executor to be active. Minimum value is 5 sec and maximum value is 15 sec. | | ||
| carbon.scheduler.minregisteredresourcesratio | 0.8 | Specifies the minimum resource (executor) ratio needed for starting the block distribution. The default value is 0.8, which indicates 80% of the requested resource is allocated for starting block distribution. The minimum value is 0.1 min and the maximum value is 1.0. | | ||
| carbon.search.enabled | false | If set to true, it will use CarbonReader to do distributed scan directly instead of using compute framework like spark, thus avoiding limitation of compute framework like SQL optimizer and task scheduling overhead. | | ||
| carbon.search.enabled (Alpha Feature) | false | If set to true, it will use CarbonReader to do distributed scan directly instead of using compute framework like spark, thus avoiding limitation of compute framework like SQL optimizer and task scheduling overhead. | | ||
|
||
* **Global Dictionary Configurations** | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In Local Dictionary section the following updates needs to be done.
to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The minimum value need not be mentioned now |
||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
# CarbonData BloomFilter DataMap (Alpha feature in 1.4.0) | ||
# CarbonData BloomFilter DataMap (Alpha Feature) | ||
|
||
* [DataMap Management](#datamap-management) | ||
* [BloomFilter Datamap Introduction](#bloomfilter-datamap-introduction) | ||
|
@@ -44,7 +44,7 @@ A Bloom filter is a space-efficient probabilistic data structure that is used to | |
Carbondata introduce BloomFilter as an index datamap to enhance the performance of querying with precise value. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Change introduce to introduced |
||
It is well suitable for queries that do precise match on high cardinality columns(such as Name/ID). | ||
Internally, CarbonData maintains a BloomFilter per blocklet for each index column to indicate that whether a value of the column is in this blocklet. | ||
Just like the other datamaps, BloomFilter datamap is managed ablong with main tables by CarbonData. | ||
Just like the other datamaps, BloomFilter datamap is managed along with main tables by CarbonData. | ||
User can create BloomFilter datamap on specified columns with specified BloomFilter configurations such as size and probability. | ||
|
||
For instance, main table called **datamap_test** which is defined as: | ||
|
@@ -83,9 +83,9 @@ User can create BloomFilter datamap using the Create DataMap DDL: | |
|
||
| Property | Is Required | Default Value | Description | | ||
|-------------|----------|--------|---------| | ||
| INDEX_COLUMNS | YES | | Carbondata will generate BloomFilter index on these columns. Queries on there columns are usually like 'COL = VAL'. | | ||
| BLOOM_SIZE | NO | 640000 | This value is internally used by BloomFilter as the number of expected insertions, it will affects the size of BloomFilter index. Since each blocklet has a BloomFilter here, so the default value is the approximate distinct index values in a blocklet assuming that each blocklet contains 20 pages and each page contains 32000 records. The value should be an integer. | | ||
| BLOOM_FPP | NO | 0.00001 | This value is internally used by BloomFilter as the False-Positive Probability, it will affects the size of bloomfilter index as well as the number of hash functions for the BloomFilter. The value should be in range (0, 1). In one test scenario, a 96GB TPCH customer table with bloom_size=320000 and bloom_fpp=0.00001 will result in 18 false positive samples. | | ||
| INDEX_COLUMNS | YES | | Carbondata will generate BloomFilter index on these columns. Queries on these columns are usually like 'COL = VAL'. | | ||
| BLOOM_SIZE | NO | 640000 | This value is internally used by BloomFilter as the number of expected insertions, it will affect the size of BloomFilter index. Since each blocklet has a BloomFilter here, so the default value is the approximate distinct index values in a blocklet assuming that each blocklet contains 20 pages and each page contains 32000 records. The value should be an integer. | | ||
| BLOOM_FPP | NO | 0.00001 | This value is internally used by BloomFilter as the False-Positive Probability, it will affect the size of bloomfilter index as well as the number of hash functions for the BloomFilter. The value should be in the range (0, 1). In one test scenario, a 96GB TPCH customer table with bloom_size=320000 and bloom_fpp=0.00001 will result in 18 false positive samples. | | ||
| BLOOM_COMPRESS | NO | true | Whether to compress the BloomFilter index files. | | ||
|
||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -297,7 +297,7 @@ public CarbonWriterBuilder persistSchemaFile(boolean persist); | |
* by default it is system time in nano seconds. | ||
* @return updated CarbonWriterBuilder | ||
*/ | ||
public CarbonWriterBuilder taskNo(String taskNo); | ||
public CarbonWriterBuilder taskNo(long taskNo); | ||
``` | ||
|
||
``` | ||
|
@@ -340,7 +340,7 @@ public CarbonWriterBuilder withLoadOptions(Map<String, String> options); | |
* @throws IOException | ||
* @throws InvalidLoadOptionException | ||
*/ | ||
public CarbonWriter buildWriterForCSVInput() throws IOException, InvalidLoadOptionException; | ||
public CarbonWriter buildWriterForCSVInput(org.apache.carbondata.sdk.file.Schema schema) throws IOException, InvalidLoadOptionException; | ||
``` | ||
|
||
``` | ||
|
@@ -351,7 +351,7 @@ public CarbonWriter buildWriterForCSVInput() throws IOException, InvalidLoadOpti | |
* @throws IOException | ||
* @throws InvalidLoadOptionException | ||
*/ | ||
public CarbonWriter buildWriterForAvroInput() throws IOException, InvalidLoadOptionException; | ||
public CarbonWriter buildWriterForAvroInput(org.apache.avro.Schema schema) throws IOException, InvalidLoadOptionException; | ||
``` | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. TestSdkJson example code needs to be corrected. testJsonSdkWriter should be static and IOException should be handled |
||
``` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In S3 section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This issue is handled in a different PR #2576