Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature](statistics)Support high priority column stats auto collection. #33703

Merged
merged 1 commit into from
Apr 26, 2024

Conversation

Jibing-Li
Copy link
Contributor

This pr is to support high priority column stats collection. High priority columns are the columns that queried by users. For example filter column and join columns. The auto analyze thread will collect these kind of columns first. When there are no high priority columns, then collect all olap tables one by one as before.

This pr includes the following PRs:

Proposed changes

Issue Number: close #xxx

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@Jibing-Li Jibing-Li marked this pull request as ready for review April 16, 2024 06:52
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@Jibing-Li
Copy link
Contributor Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 40012 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 4b781e542255d07076af8b8bd3d85bbaab8d4a28, data reload: false

------ Round 1 ----------------------------------
q1	17612	4947	4261	4261
q2	2011	191	191	191
q3	10448	1160	1271	1160
q4	10194	821	811	811
q5	7544	2746	2718	2718
q6	215	137	134	134
q7	1059	624	626	624
q8	9225	2137	2133	2133
q9	7648	6779	6859	6779
q10	8780	3769	3719	3719
q11	452	233	225	225
q12	466	224	217	217
q13	17177	3238	3144	3144
q14	277	240	237	237
q15	537	472	476	472
q16	506	394	401	394
q17	995	649	691	649
q18	7685	7235	7067	7067
q19	4907	1577	1551	1551
q20	991	309	330	309
q21	3642	2937	2910	2910
q22	378	323	307	307
Total cold run time: 112749 ms
Total hot run time: 40012 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4596	4414	4437	4414
q2	512	268	257	257
q3	3119	2980	2912	2912
q4	1963	1694	1668	1668
q5	5571	5608	5334	5334
q6	205	119	117	117
q7	2359	2006	1940	1940
q8	3295	3409	3447	3409
q9	8950	8918	8846	8846
q10	4081	3834	3873	3834
q11	589	494	487	487
q12	816	625	614	614
q13	15840	2967	3064	2967
q14	312	291	275	275
q15	533	469	488	469
q16	498	445	444	444
q17	1808	1512	1524	1512
q18	8003	7880	7297	7297
q19	5345	1554	1537	1537
q20	1994	1762	1732	1732
q21	7117	4777	4613	4613
q22	538	466	464	464
Total cold run time: 78044 ms
Total hot run time: 55142 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 185483 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 4b781e542255d07076af8b8bd3d85bbaab8d4a28, data reload: false

query1	910	373	369	369
query2	6470	2439	2409	2409
query3	6640	202	203	202
query4	24117	21339	21395	21339
query5	4108	413	417	413
query6	267	200	179	179
query7	4584	288	294	288
query8	235	177	171	171
query9	8502	2383	2359	2359
query10	422	248	252	248
query11	14727	14148	14248	14148
query12	134	88	84	84
query13	1634	350	359	350
query14	9895	7868	7906	7868
query15	251	173	188	173
query16	8104	258	264	258
query17	1821	561	540	540
query18	2124	281	268	268
query19	336	153	148	148
query20	88	88	86	86
query21	194	125	125	125
query22	5168	4929	4891	4891
query23	33864	33259	33007	33007
query24	6604	2946	2975	2946
query25	562	373	392	373
query26	691	154	145	145
query27	1900	304	319	304
query28	3893	2049	2014	2014
query29	828	624	585	585
query30	255	171	186	171
query31	930	745	718	718
query32	88	56	53	53
query33	473	250	241	241
query34	829	472	485	472
query35	780	708	676	676
query36	1012	896	909	896
query37	105	71	68	68
query38	3347	3222	3235	3222
query39	1568	1569	1519	1519
query40	206	125	128	125
query41	46	39	39	39
query42	103	94	98	94
query43	575	543	532	532
query44	1061	727	740	727
query45	292	274	269	269
query46	1061	742	708	708
query47	1937	1874	1857	1857
query48	361	302	292	292
query49	773	408	369	369
query50	750	390	385	385
query51	6803	6624	6630	6624
query52	103	88	97	88
query53	351	276	282	276
query54	264	245	238	238
query55	80	73	72	72
query56	237	220	219	219
query57	1208	1127	1185	1127
query58	220	212	201	201
query59	3310	3190	3183	3183
query60	254	260	253	253
query61	92	88	88	88
query62	565	442	450	442
query63	312	282	282	282
query64	4769	4023	3999	3999
query65	3056	3028	3026	3026
query66	793	331	341	331
query67	15440	15151	15072	15072
query68	5203	544	548	544
query69	469	302	305	302
query70	1225	1183	1154	1154
query71	1364	1271	1266	1266
query72	6690	2610	2422	2422
query73	698	327	325	325
query74	6740	6518	6402	6402
query75	3337	2620	2655	2620
query76	2718	940	992	940
query77	409	272	278	272
query78	11051	10294	10284	10284
query79	2910	522	519	519
query80	1858	434	452	434
query81	543	241	244	241
query82	766	101	96	96
query83	267	173	170	170
query84	268	92	84	84
query85	1788	278	258	258
query86	527	265	300	265
query87	3527	3251	3249	3249
query88	4542	2430	2429	2429
query89	472	377	377	377
query90	2076	178	179	178
query91	124	128	99	99
query92	56	48	49	48
query93	4295	517	507	507
query94	1282	185	179	179
query95	378	292	289	289
query96	589	269	267	267
query97	3134	2983	2943	2943
query98	253	224	223	223
query99	1222	853	858	853
Total cold run time: 274263 ms
Total hot run time: 185483 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.56 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 4b781e542255d07076af8b8bd3d85bbaab8d4a28, data reload: false

query1	0.04	0.04	0.03
query2	0.08	0.04	0.03
query3	0.23	0.05	0.04
query4	1.69	0.06	0.07
query5	0.48	0.47	0.50
query6	1.48	0.73	0.72
query7	0.02	0.01	0.02
query8	0.05	0.04	0.05
query9	0.53	0.50	0.48
query10	0.54	0.55	0.55
query11	0.16	0.12	0.12
query12	0.15	0.12	0.11
query13	0.60	0.58	0.58
query14	0.75	0.79	0.77
query15	0.81	0.80	0.82
query16	0.37	0.37	0.37
query17	0.97	1.00	0.91
query18	0.21	0.22	0.25
query19	1.85	1.66	1.67
query20	0.02	0.01	0.02
query21	15.72	0.65	0.64
query22	4.10	7.35	2.14
query23	18.27	1.44	1.32
query24	1.87	0.23	0.24
query25	0.13	0.08	0.08
query26	0.28	0.17	0.16
query27	0.08	0.08	0.08
query28	13.30	1.00	0.96
query29	13.27	3.32	3.29
query30	0.24	0.07	0.06
query31	2.87	0.38	0.38
query32	3.28	0.45	0.46
query33	2.81	2.81	2.80
query34	17.24	4.45	4.45
query35	4.49	4.49	4.67
query36	0.71	0.49	0.47
query37	0.17	0.15	0.14
query38	0.16	0.14	0.15
query39	0.04	0.04	0.03
query40	0.18	0.14	0.14
query41	0.09	0.05	0.05
query42	0.06	0.04	0.05
query43	0.04	0.04	0.04
Total cold run time: 110.43 s
Total hot run time: 30.56 s

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit 4b781e542255d07076af8b8bd3d85bbaab8d4a28 with default session variables
Stream load json:         18 seconds loaded 2358488459 Bytes, about 124 MB/s
Stream load orc:          58 seconds loaded 1101869774 Bytes, about 18 MB/s
Stream load parquet:      32 seconds loaded 861443392 Bytes, about 25 MB/s
Insert into select:       13.1 seconds inserted 10000000 Rows, about 763K ops/s

@Jibing-Li
Copy link
Contributor Author

run feut

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

* fix visible column (apache#33023)
* Collect high priority columns. (apache#31235)
* High priority queue and map. (apache#31509)
* Support column level health value. (apache#31794)
* Support follower sync query columns to master. (apache#31859)
* Support show auto analyze pending jobs. (apache#31926)
* Check column health value earlier, show job priority. (apache#32064)
* support window (apache#32094)
* Refactor. (apache#32273)
* refactor2 (apache#32278)
* Unit test (apache#32398)
* Support auto analyze mv (apache#32433)
* Fix bug (apache#32454)
* Support identical column name in different index. (apache#32957)
* Fix visible column
* Use future to block auto analyze before job finish. (apache#33083)
* Fix ut. (apache#33147)
* Fix ut (apache#33161)
* fix p0 (apache#33210)
* Improve failover logic. (apache#33382)
* Improve waiting empty table logic. (apache#33472)
* Fix pipeline (apache#33671)
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@Jibing-Li
Copy link
Contributor Author

run buildall

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 26, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

@Jibing-Li Jibing-Li merged commit d4cdd28 into apache:master Apr 26, 2024
28 of 32 checks passed
morrySnow pushed a commit that referenced this pull request Apr 30, 2024
…valid or not (#34263)

This is a following pr of #33685
After #33703 merged, need to check update rows in column level instead of table level.
yiguolei pushed a commit that referenced this pull request May 6, 2024
…valid or not (#34263)

This is a following pr of #33685
After #33703 merged, need to check update rows in column level instead of table level.
dataroaring pushed a commit that referenced this pull request May 7, 2024
…valid or not (#34263)

This is a following pr of #33685
After #33703 merged, need to check update rows in column level instead of table level.
ByteYue pushed a commit to ByteYue/doris that referenced this pull request May 15, 2024
…valid or not (apache#34263)

This is a following pr of apache#33685
After apache#33703 merged, need to check update rows in column level instead of table level.
@Jibing-Li Jibing-Li deleted the merge branch May 28, 2024 08:39
Jibing-Li added a commit that referenced this pull request Sep 24, 2024
Remove analyze retry logic when task failed. Because usually retry would
fail again and retry would bring a long time of sleep, which cause the
analyze job running too slow.
Master pr: #33703
Jibing-Li added a commit that referenced this pull request Sep 24, 2024
Remove analyze retry logic when task failed. Because usually retry would
fail again and retry would bring a long time of sleep, which cause the
analyze job running too slow.
Master pr: #33703
Jibing-Li added a commit to Jibing-Li/incubator-doris that referenced this pull request Oct 8, 2024
Remove analyze retry logic when task failed. Because usually retry would
fail again and retry would bring a long time of sleep, which cause the
analyze job running too slow.
Master pr: apache#33703
Jibing-Li added a commit that referenced this pull request Oct 8, 2024
Remove analyze retry logic when task failed. Because usually retry would
fail again and retry would bring a long time of sleep, which cause the
analyze job running too slow.
Master pr: #33703
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants