-
Notifications
You must be signed in to change notification settings - Fork 4.8k
HIVE-27163: Column stats are not getting published after an insert qu… #4228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
b270602
02aad53
4eefd9d
d3a4f7c
331cfef
0d8027c
733eab3
5ff48a5
ec63eb5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -244,14 +244,14 @@ Stage-0 | |
| Stage-1 | ||
| Reducer 2 vectorized | ||
| File Output Operator [FS_8] | ||
| Select Operator [SEL_7] (rows=9 width=95) | ||
| Select Operator [SEL_7] (rows=9 width=192) | ||
| Output:["_col0","_col1","_col2"] | ||
| <-Map 1 [SIMPLE_EDGE] vectorized | ||
| SHUFFLE [RS_6] | ||
| Select Operator [SEL_5] (rows=9 width=95) | ||
| Select Operator [SEL_5] (rows=9 width=192) | ||
| Output:["_col0","_col1","_col2"] | ||
| TableScan [TS_0] (rows=9 width=95) | ||
| default@tbl_ice_puffin,tbl_ice_puffin,Tbl:COMPLETE,Col:COMPLETE,Output:["a","b","c"] | ||
| TableScan [TS_0] (rows=9 width=192) | ||
| default@tbl_ice_puffin,tbl_ice_puffin,Tbl:COMPLETE,Col:NONE,Output:["a","b","c"] | ||
|
|
||
| PREHOOK: query: drop table if exists tbl_ice_puffin | ||
| PREHOOK: type: DROPTABLE | ||
|
|
@@ -339,17 +339,16 @@ POSTHOOK: type: DESCTABLE | |
| POSTHOOK: Input: default@tbl_ice_puffin | ||
| col_name a | ||
| data_type int | ||
| min 1 | ||
| max 333 | ||
| num_nulls 0 | ||
| distinct_count 7 | ||
| min | ||
| max | ||
| num_nulls | ||
| distinct_count | ||
|
Comment on lines
-343
to
+345
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This removed the iceberg column stats? If so we should revert and make sure it doesn't affect iceberg
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The The There is an
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This part of the output corresponds to the following code snippet. In this case, the output of I think we should either:
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This perhaps true for Iceberg table, as it will track its own metadata. If every time we create a new Iceberg table in HMS, the legacy files under the table directory won't be read, e.g, the row number is 0 regardless of the legacy files, then we can put "COLUMN_STATS":{"a":"true","b":"true","c":"true" into table parameters.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. when the iceberg table is recreated - a new snapshot is generated, so any leftover files would be ignored
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks, will create another PR to address this |
||
| avg_col_len | ||
| max_col_len | ||
| num_trues | ||
| num_falses | ||
| bit_vector HL | ||
| bit_vector | ||
| comment | ||
| COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"a\":\"true\",\"b\":\"true\",\"c\":\"true\"}} | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Basic stats are removed as well?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same as |
||
| PREHOOK: query: drop table if exists tbl_ice | ||
| PREHOOK: type: DROPTABLE | ||
| POSTHOOK: query: drop table if exists tbl_ice | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -225,7 +225,7 @@ Retention: 0 | |
| #### A masked pattern was here #### | ||
| Table Type: EXTERNAL_TABLE | ||
| Table Parameters: | ||
| COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"id\":\"true\",\"value\":\"true\"}} | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
| COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\"} | ||
| EXTERNAL TRUE | ||
| bucketing_version 2 | ||
| current-schema {\"type\":\"struct\",\"schema-id\":0,\"fields\":[{\"id\":1,\"name\":\"id\",\"required\":false,\"type\":\"int\"},{\"id\":2,\"name\":\"value\",\"required\":false,\"type\":\"string\"}]} | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| set hive.stats.column.autogather=true; | ||
| set hive.stats.autogather=true; | ||
| dfs ${system:test.dfs.mkdir} ${system:test.tmp.dir}/test1; | ||
|
|
||
| create external table test_custom(age int, name string) stored as orc location '/tmp/test1'; | ||
| insert into test_custom select 1, 'test'; | ||
| desc formatted test_custom age; | ||
|
|
||
| drop table test_custom; |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,47 @@ | ||
| #### A masked pattern was here #### | ||
| PREHOOK: type: CREATETABLE | ||
| #### A masked pattern was here #### | ||
| PREHOOK: Output: database:default | ||
| PREHOOK: Output: default@test_custom | ||
| #### A masked pattern was here #### | ||
| POSTHOOK: type: CREATETABLE | ||
| #### A masked pattern was here #### | ||
| POSTHOOK: Output: database:default | ||
| POSTHOOK: Output: default@test_custom | ||
| PREHOOK: query: insert into test_custom select 1, 'test' | ||
| PREHOOK: type: QUERY | ||
| PREHOOK: Input: _dummy_database@_dummy_table | ||
| PREHOOK: Output: default@test_custom | ||
| POSTHOOK: query: insert into test_custom select 1, 'test' | ||
| POSTHOOK: type: QUERY | ||
| POSTHOOK: Input: _dummy_database@_dummy_table | ||
| POSTHOOK: Output: default@test_custom | ||
| POSTHOOK: Lineage: test_custom.age SIMPLE [] | ||
| POSTHOOK: Lineage: test_custom.name SIMPLE [] | ||
| PREHOOK: query: desc formatted test_custom age | ||
| PREHOOK: type: DESCTABLE | ||
| PREHOOK: Input: default@test_custom | ||
| POSTHOOK: query: desc formatted test_custom age | ||
| POSTHOOK: type: DESCTABLE | ||
| POSTHOOK: Input: default@test_custom | ||
| col_name age | ||
| data_type int | ||
| min 1 | ||
| max 1 | ||
| num_nulls 0 | ||
| distinct_count 1 | ||
| avg_col_len | ||
| max_col_len | ||
| num_trues | ||
| num_falses | ||
| bit_vector HL | ||
| comment from deserializer | ||
| COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"age\":\"true\",\"name\":\"true\"}} | ||
| PREHOOK: query: drop table test_custom | ||
| PREHOOK: type: DROPTABLE | ||
| PREHOOK: Input: default@test_custom | ||
| PREHOOK: Output: default@test_custom | ||
| POSTHOOK: query: drop table test_custom | ||
| POSTHOOK: type: DROPTABLE | ||
| POSTHOOK: Input: default@test_custom | ||
| POSTHOOK: Output: default@test_custom |
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @simhadri-g, seems like the
desc formatted tbl_ice_puffin agets status from metastore thoughhive.iceberg.stats.source=iceberg, cloud you please check?Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dengzhhu653 ,
Yes, we recently merged a PR to store hive column stats in puffin files for iceberg tables. a8a0ae7
Hive will use stats from this file whenever
hive.iceberg.stats.source=icebergThere are few more follow up tasks that i am currently working as a part of the epic.