-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[improvement](statistics)Support identical column name in different index. #32792
Conversation
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
run buildall |
TPC-H: Total hot run time: 37917 ms
|
TPC-DS: Total hot run time: 182206 ms
|
Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
|
b5b40f3
to
9e9e54b
Compare
run buildall |
run buildall |
TPC-H: Total hot run time: 37614 ms
|
TPC-DS: Total hot run time: 181833 ms
|
Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
|
fe/fe-core/src/main/java/org/apache/doris/statistics/AnalysisManager.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/statistics/TableStatsMeta.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/statistics/TableStatsMeta.java
Show resolved
Hide resolved
public void convertDeprecatedColStatsToNewVersion() { | ||
deprecatedColNameToColStatsMeta = null; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not impl?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel it's worth to implement this. The older version only have table id in memory, so it's not easy to find the table during replay metadata, we need to go through all catalogs->dbs to find it.
The downside of not implementing this is auto analyze need to collect stats again for all tables. But the old stats already collected are still available.
fe/fe-core/src/main/java/org/apache/doris/statistics/AnalysisInfoBuilder.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/test/java/org/apache/doris/statistics/StatisticsAutoCollectorTest.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/catalog/OlapTable.java
Outdated
Show resolved
Hide resolved
a7e57dd
to
01763fa
Compare
run buildall |
TPC-H: Total hot run time: 38103 ms
|
TPC-DS: Total hot run time: 183089 ms
|
Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
|
run external |
PR approved by at least one committer and no changes requested. |
PR approved by anyone and no changes requested. |
* [fix](merge cloud) Fix cloud be set be tag map (#32864) * [chore] Add gavinchou to collaborators (#32881) * [chore](show) support statement to show views from table (#32358) MySQL [test]> show views; +----------------+ | Tables_in_test | +----------------+ | t1_view | | t2_view | +----------------+ 2 rows in set (0.00 sec) MySQL [test]> show views like '%t1%'; +----------------+ | Tables_in_test | +----------------+ | t1_view | +----------------+ 1 row in set (0.01 sec) MySQL [test]> show views where create_time > '2024-03-18'; +----------------+ | Tables_in_test | +----------------+ | t2_view | +----------------+ 1 row in set (0.02 sec) * [Enhancement](ranger) Disable some permission operations when Ranger or LDAP are enabled (#32538) Disable some permission operations when Ranger or LDAP are enabled. * [chore](ci) exclude unstable trino_connector case (#32892) Co-authored-by: stephen <hello-stephen@qq.com> * [fix](Nereids) NPE when create table with implicit index type (#32893) * [improvement](mtmv) Support more join types for query rewriting by materialized view (#32685) This pattern of rewriting is supported for multi-table joins and supported join types is as following: INNER JOIN LEFT OUTER JOIN RIGHT OUTER JOIN FULL OUTER JOIN LEFT SEMI JOIN RIGHT SEMI JOIN LEFT ANTI JOIN RIGHT ANTI JOIN * [Serde](Variant) support arrow serialization for varint type (#32780) * [fix](multicatalog) fix no data error when read hive table on cosn (#32815) Currently, when reading a hive on cosn table, doris return empty result, but the table has data. iceberg on cosn is ok. The reason is misuse of cosn's file sytem. according to cosn's doc, its fs.cosn.impl should be org.apache.hadoop.fs.CosFileSystem * [fix](nereids)EliminateGroupByConstant should replace agg's output after removing constant group by keys (#32878) * [Fix](executor)Fix regression test for test_active_queries/test_backend_active_tasks #32899 * [fix](iceberg) fix iceberg catalog bug and p2 test cases (#32898) 1. Fix iceberg catalog bug This PR #30198 change the logic of `IcebergHMSExternalCatalog.java`, to get locationUrl by calling hive metastore's `getCatalog()` method. But this method only exists in hive 3+. So it will fail if we using hive 2.x. I temporary remove this logic, because this logic is only used from iceberg table writing. Which is still under development. We will rethink this logic later. 2. Fix test cases Some of P2 test cases missed `order_qt`. And because the output format of the floating point type is changed, some result in `out` files need to be regenerated. * [revert](jni) revert part of #32455 (#32904) * [fix](spill) Avoid releasing resources while spill tasks are executing (#32783) * [chore](log) print query id before logging profile in be.INFO (#32922) * [fix](grace-exit) Stop incorrectly of reportwork cause heap use after free #32929 * [improvement](decommission be) decommission check replica num (#32748) * [fix](arrow-flight) Fix reach limit of connections error (#32911) Fix Reach limit of connections error in fe.conf , arrow_flight_token_cache_size is mandatory less than qe_max_connection/2. arrow flight sql is a stateless protocol, connection is usually not actively disconnected, bearer token is evict from the cache will unregister ConnectContext. Fix ConnectContext.command not be reset to COM_SLEEP in time, this will result in frequent kill connection after query timeout. Fix bearer token evict log and exception. TODO: use arrow flight session: https://mail.google.com/mail/u/0/#inbox/FMfcgzGxRdxBLQLTcvvtRpqsvmhrHpdH * [bugfix](cloud) few variable not initialized (#32868) ../../cloud/src/recycler/meta_checker.cpp can cause uninitialised memory read. * [fix](arrow-flight) Fix arrow flight sql compatible with JDK 17 and upgrade arrow 15.0.2 (#32796) --add-opens=java.base/java.nio=ALL-UNNAMED, see: https://arrow.apache.org/docs/java/install.html#java-compatibility groovy use flight sql connection to execute query SUM(MAX(c1) OVER (PARTITION BY)) report error: AGGREGATE clause must not contain analytic expressions, but no problem in Java execute it with jdbc::arrow-flight-sql. groovy not support print arrow array type, throw IndexOutOfBoundsException. "arrow_flight_sql" not support two phase read ./run-regression-test.sh --run --clean -g arrow_flight_sql * [fix](spill) SpillStream's writer maybe may not have been finalized (#32931) * [improvement](spill) Disable DistinctStreamingAgg when spill is enabled (#32932) * [Improve](inverted_index) update clucene and improve array inverted index writer (#32436) * [Performance](exec) replace SipHash in function by XXHash (#32919) * [feature](agg) add aggregate function sum0 (#32541) * [improvement](mtmv) Support to get tables in materialized view when collecting table in plan (#32797) Support to get tables in materialized view when collecting table in plan table scehma as fllowing: create materialized view mv1 BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL DISTRIBUTED BY RANDOM BUCKETS 1 PROPERTIES ('replication_num' = '1') as select t1.c1, t3.c2 from table1 t1 inner join table3 t3 on t1.c1 = t3.c2 if get table from the plan as follwoing, we can get [table1, table3, table2], the mv1 is expanded to get base tables; SELECT mv1.*, uuid() FROM mv1 LEFT SEMI JOIN table2 ON mv1.c1 = table2.c1 WHERE mv1.c1 IN ( SELECT c1 FROM table2 ) OR mv1.c1 < 10 * [enhance](mtmv)support olap table partition column is null (#32698) * [enhancement](cloud) add table version to cloud (#32738) Add table version to cloud. In Fe: Get: If Fe is cloud mode, get table version from meta service. Update: Op drop/replace temp partition, commit transaction. In meta service: Add: create Index. init value is 1. Remove: by recycler. Update: commit/drop partition rpc, commit txn rpc. Atomic++. * [fix](cloud) schema change from not null to null (#32913) 1. Use equals instead of == for type comparing 2. null bitmap size is reisze by size of ref column. * [feature](Nereids): add ColumnPruningPostProcessor. (#32800) * [case](rowpolicy)fix row policy has been exist (#32880) * [fix](pipeline) fix use error row desc when origin block clear (#32803) * [fix](Nereids) support variant column with index when create table (#32948) * [opt](Nereids) support create table with variant type (#32953) * [test](insert-overwrite) Add insert overwrite auto detect concurrency cases (#32935) * [fix](compile) fe cannot compile in idea (#32955) * [enhancement](plsql) Support select * from routines (#32866) Support show of plsql procedure using select * from routines. * [fix](trino-connector) fix `NoClassDefFoundError` of hudi `Utils` class (#32846) Due to the change of this PR #32455 , the `trino-connector-scanner` package cannot access the `hudi_scanner` package, so the exception NoclassDeffounderror will appear. We need to write a separate Utils class. * [exec](column) change some complex column move to noexcept (#32954) * [Enhancement](data skew) extends show data skew (#32732) * [chore](test) let suite compatible with Nereids (#32964) * Support identical column name in different index. (#32792) * Limit the max string length to 1024 while collecting column stats to control BE memory usage. (#32470) * [fix](merge-iterator) fix NOT_IMPLEMENTED_ERROR when read next block view (#32961) * [improvement](executor)Add tag property for workload group #32874 * [fix](auth)unified workload and resource permission logic (#32907) - `Grant resource` can no longer grant global `usage_priv` - `grant resource %` instead of `grant resource *` before change: ``` grant usage_priv on resource * to f; show grants for f\G *************************** 1. row *************************** UserIdentity: 'f'@'%' Comment: Password: No Roles: GlobalPrivs: Usage_priv CatalogPrivs: NULL DatabasePrivs: internal.information_schema: Select_priv ; internal.mysql: Select_priv TablePrivs: NULL ColPrivs: NULL ResourcePrivs: NULL CloudClusterPrivs: NULL WorkloadGroupPrivs: normal: Usage_priv ``` after change ``` grant usage_priv on resource '%' to f; show grants for f\G *************************** 1. row *************************** UserIdentity: 'f'@'%' Comment: Password: No Roles: GlobalPrivs: NULL CatalogPrivs: NULL DatabasePrivs: internal.information_schema: Select_priv ; internal.mysql: Select_priv TablePrivs: NULL ColPrivs: NULL ResourcePrivs: %: Usage_priv CloudClusterPrivs: NULL WorkloadGroupPrivs: normal: Usage_priv ``` --------- Co-authored-by: yujun <yu.jun.reach@gmail.com> Co-authored-by: Gavin Chou <gavineaglechou@gmail.com> Co-authored-by: xy720 <22125576+xy720@users.noreply.github.com> Co-authored-by: yongjinhou <109586248+yongjinhou@users.noreply.github.com> Co-authored-by: Dongyang Li <hello_stephen@qq.com> Co-authored-by: stephen <hello-stephen@qq.com> Co-authored-by: morrySnow <101034200+morrySnow@users.noreply.github.com> Co-authored-by: seawinde <149132972+seawinde@users.noreply.github.com> Co-authored-by: lihangyu <15605149486@163.com> Co-authored-by: Yulei-Yang <yulei.yang0699@gmail.com> Co-authored-by: starocean999 <40539150+starocean999@users.noreply.github.com> Co-authored-by: wangbo <wangbo@apache.org> Co-authored-by: Mingyu Chen <morningman@163.com> Co-authored-by: Jerry Hu <mrhhsg@gmail.com> Co-authored-by: zhiqiang <seuhezhiqiang@163.com> Co-authored-by: Xinyi Zou <zouxinyi02@gmail.com> Co-authored-by: Vallish Pai <vallishpai@gmail.com> Co-authored-by: amory <wangqiannan@selectdb.com> Co-authored-by: HappenLee <happenlee@hotmail.com> Co-authored-by: Jensen <czjourney@163.com> Co-authored-by: zhangdong <493738387@qq.com> Co-authored-by: Yongqiang YANG <98214048+dataroaring@users.noreply.github.com> Co-authored-by: jakevin <jakevingoo@gmail.com> Co-authored-by: Mryange <59914473+Mryange@users.noreply.github.com> Co-authored-by: zclllyybb <zhaochangle@selectdb.com> Co-authored-by: Tiewei Fang <43782773+BePPPower@users.noreply.github.com> Co-authored-by: Xin Liao <liaoxinbit@126.com>
For each column stats meta in tableStatsMeta, add the index name as key along with column name, so we can tell columns with identical names, and trigger auto analyze after a new mv created.
Before, we keep this map in memory for each table:
ColumnName -> ColumnStatistics
But different index in table may have identical column name. So we change the map to:
Pair<ColumnName, IndexName> -> ColumnStatistics
Further comments
If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...