Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HOTFIX] Fix documentation errors.Add examples for pre-aggregate usage #1945

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
72 changes: 27 additions & 45 deletions docs/data-management-on-carbondata.md
Original file line number Diff line number Diff line change
Expand Up @@ -627,21 +627,21 @@ This tutorial is going to introduce all commands and data operations on CarbonDa

```
LOAD DATA [LOCAL] INPATH 'folder_path'
INTO TABLE [db_name.]table_name PARTITION (partition_spec)
OPTIONS(property_name=property_value, ...)
NSERT INTO INTO TABLE [db_name.]table_name PARTITION (partition_spec) SELECT STATMENT
INTO TABLE [db_name.]table_name PARTITION (partition_spec)
OPTIONS(property_name=property_value, ...)

INSERT INTO INTO TABLE [db_name.]table_name PARTITION (partition_spec) <SELECT STATMENT>
```

Example:
```
LOAD DATA LOCAL INPATH '${env:HOME}/staticinput.txt'
INTO TABLE locationTable
PARTITION (country = 'US', state = 'CA')
LOAD DATA LOCAL INPATH '${env:HOME}/staticinput.csv'
INTO TABLE locationTable
PARTITION (country = 'US', state = 'CA')

INSERT INTO TABLE locationTable
PARTITION (country = 'US', state = 'AL')
SELECT * FROM another_user au
WHERE au.country = 'US' AND au.state = 'AL';
PARTITION (country = 'US', state = 'AL')
SELECT <columns list excluding partition columns> FROM another_user
```

#### Load Data Using Dynamic Partition
Expand All @@ -650,12 +650,11 @@ This tutorial is going to introduce all commands and data operations on CarbonDa

Example:
```
LOAD DATA LOCAL INPATH '${env:HOME}/staticinput.txt'
INTO TABLE locationTable
LOAD DATA LOCAL INPATH '${env:HOME}/staticinput.csv'
INTO TABLE locationTable

INSERT INTO TABLE locationTable
SELECT * FROM another_user au
WHERE au.country = 'US' AND au.state = 'AL';
SELECT <columns list excluding partition columns> FROM another_user
```

#### Show Partitions
Expand All @@ -679,19 +678,19 @@ This tutorial is going to introduce all commands and data operations on CarbonDa

```
INSERT OVERWRITE TABLE table_name
PARTITION (column = 'partition_name')
select_statement
PARTITION (column = 'partition_name')
select_statement
```

Example:
```
INSERT OVERWRITE TABLE partitioned_user
PARTITION (country = 'US')
SELECT * FROM another_user au
WHERE au.country = 'US';
PARTITION (country = 'US')
SELECT * FROM another_user au
WHERE au.country = 'US';
```

### CARBONDATA PARTITION(HASH,RANGE,LIST) -- Alpha feature, this partition not supports update and delete data.
### CARBONDATA PARTITION(HASH,RANGE,LIST) -- Alpha feature, this partition feature does not support update and delete data.

The partition supports three type:(Hash,Range,List), similar to other system's partition features, CarbonData's partition feature can be used to improve query performance by filtering on the partition column.

Expand Down Expand Up @@ -886,11 +885,11 @@ will be transformed by Query Planner to fetch data from pre-aggregate table **ag

But queries of kind
```
SELECT user_id, country, sex, sum(quantity), avg(price) from sales GROUP BY country, sex
SELECT user_id, country, sex, sum(quantity), avg(price) from sales GROUP BY user_id, country, sex

SELECT sex, avg(quantity) from sales GROUP BY sex

SELECT max(price), country from sales GROUP BY country
SELECT country, max(price) from sales GROUP BY country
```

will fetch the data from the main table **sales**
Expand All @@ -910,18 +909,13 @@ pre-aggregate tables satisfy the query condition, the plan is transformed automa
pre-aggregate table to fetch the data

##### Compacting pre-aggregate tables
Compaction is an optional operation for pre-aggregate table. If compaction is performed on main
table but not performed on pre-aggregate table, all queries still can benefit from pre-aggregate
table.To further improve performance on pre-aggregate table, compaction can be triggered on
pre-aggregate tables directly, it will merge the segments inside pre-aggregation table.
To do that, use ALTER TABLE COMPACT command on the pre-aggregate table just like the main table
Compaction command (ALTER TABLE COMPACT) need to be run separately on each pre-aggregate table.
Running Compaction command on main table will **not automatically** compact the pre-aggregate
tables.Compaction is an optional operation for pre-aggregate table. If compaction is performed on
main table but not performed on pre-aggregate table, all queries still can benefit from
pre-aggregate tables.To further improve performance on pre-aggregate tables, compaction can be
triggered on pre-aggregate tables directly, it will merge the segments inside pre-aggregate table.

NOTE:
* If the aggregate function used in the pre-aggregate table creation included distinct-count,
during compaction, the pre-aggregate table values are recomputed.This would a costly
operation as compared to the compaction of pre-aggregate tables containing other aggregate
functions alone

##### Update/Delete Operations on pre-aggregate tables
This functionality is not supported.

Expand Down Expand Up @@ -996,16 +990,6 @@ roll-up for the queries on these hierarchies.
SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), sum(price),
avg(price) FROM sales GROUP BY order_time, country, sex

CREATE DATAMAP agg_minute
ON TABLE sales
USING "timeseries"
DMPROPERTIES (
'event_time’=’order_time’,
'minute_granualrity’=’1’,
) AS
SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), sum(price),
avg(price) FROM sales GROUP BY order_time, country, sex

CREATE DATAMAP agg_minute
ON TABLE sales
USING "timeseries"
Expand All @@ -1028,9 +1012,7 @@ roll-up for the queries on these hierarchies.
```

It is **not necessary** to create pre-aggregate tables for each granularity unless required for
query
.Carbondata
can roll-up the data and fetch it
query.Carbondata can roll-up the data and fetch it.

For Example: For main table **sales** , If pre-aggregate tables were created as

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,30 @@ object PreAggregateTableExample {
println("time for query on table without pre-aggregate table:" + time_without_aggTable.toString)
// scalastyle:on

// 3. if avg function is defined for a column, sum also can be used on that;but not other way
// round
val time_without_aggTable_sum = time {
spark.sql(
s"""
| SELECT id, sum(age)
| FROM personTableWithoutAgg group by id
""".stripMargin).count()
}

val time_with_aggTable_sum = time {
spark.sql(
s"""
| SELECT id, sum(age)
| FROM personTable group by id
""".stripMargin).count()
}
// scalastyle:off
println("time for query with function sum on table with pre-aggregate table:" +
time_with_aggTable_sum.toString)
println("time for query with function sum on table without pre-aggregate table:" +
time_without_aggTable_sum.toString)
// scalastyle:on

spark.sql("DROP TABLE IF EXISTS mainTable")
spark.sql("DROP TABLE IF EXISTS personTable")
spark.sql("DROP TABLE IF EXISTS personTableWithoutAgg")
Expand Down