Skip to content

[fix](filecache) exclude warmup reads from file cache hit ratio metrics#63394

Open
freemandealer wants to merge 1 commit into
apache:masterfrom
freemandealer:task-master-fix-file-cache-hit-ratio-exclude-war
Open

[fix](filecache) exclude warmup reads from file cache hit ratio metrics#63394
freemandealer wants to merge 1 commit into
apache:masterfrom
freemandealer:task-master-fix-file-cache-hit-ratio-exclude-war

Conversation

@freemandealer
Copy link
Copy Markdown
Member

Issue Number: N/A

Related PR: N/A

Problem Summary: File cache hit ratio metrics are derived from global file cache read bytes, but warmup reads from manual warmup, periodic warmup, event-driven warmup, and rebalance-triggered warmup used to update the same counters as query reads. This polluted the query hit ratio. Mixed hit/miss reads could also be attributed to one source for the whole request. This change skips warmup updates to global file cache read metrics while preserving per-IOContext profile stats, records local/remote/peer bytes by actual returned bytes, and avoids updating metrics for failed reads. It also fixes direct-read partial continuation and no-warmup miss-only hit ratio refresh.

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

Issue Number: N/A

Related PR: N/A

Problem Summary: File cache hit ratio metrics are derived from global file cache read bytes, but warmup reads from manual warmup, periodic warmup, event-driven warmup, and rebalance-triggered warmup used to update the same counters as query reads. This polluted the query hit ratio. Mixed hit/miss reads could also be attributed to one source for the whole request. This change skips warmup updates to global file cache read metrics while preserving per-IOContext profile stats, records local/remote/peer bytes by actual returned bytes, and avoids updating metrics for failed reads. It also fixes direct-read partial continuation and no-warmup miss-only hit ratio refresh.
@freemandealer
Copy link
Copy Markdown
Member Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31237 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit e6e8f7829c2fb8810a25f3aacbfe1f93ef1b68b5, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17659	3964	3883	3883
q2	q3	10761	1403	816	816
q4	4684	476	350	350
q5	7571	2363	2153	2153
q6	337	177	139	139
q7	974	792	637	637
q8	9341	1840	1700	1700
q9	6873	4925	4920	4920
q10	6458	2115	1837	1837
q11	448	274	244	244
q12	694	430	303	303
q13	18195	3449	2799	2799
q14	260	259	236	236
q15	q16	815	781	714	714
q17	953	914	950	914
q18	6900	5719	5579	5579
q19	1138	1300	1074	1074
q20	501	415	257	257
q21	5749	2528	2377	2377
q22	425	362	305	305
Total cold run time: 100736 ms
Total hot run time: 31237 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4260	4186	4187	4186
q2	q3	4514	4980	4348	4348
q4	2111	2197	1381	1381
q5	4615	4311	4302	4302
q6	353	234	162	162
q7	1937	1854	1636	1636
q8	2501	2070	2178	2070
q9	7988	7830	7643	7643
q10	4531	4483	4109	4109
q11	597	614	379	379
q12	731	731	519	519
q13	3244	3637	3016	3016
q14	312	316	292	292
q15	q16	741	733	626	626
q17	1353	1314	1303	1303
q18	7994	7317	7135	7135
q19	1131	1088	1066	1066
q20	2210	2202	1914	1914
q21	5285	4606	4471	4471
q22	520	478	401	401
Total cold run time: 56928 ms
Total hot run time: 50959 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 169227 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit e6e8f7829c2fb8810a25f3aacbfe1f93ef1b68b5, data reload: false

query5	4331	650	519	519
query6	335	225	205	205
query7	4300	548	287	287
query8	324	254	224	224
query9	8788	4008	4046	4008
query10	474	340	314	314
query11	5832	2367	2158	2158
query12	185	127	123	123
query13	1295	598	453	453
query14	6019	5324	5069	5069
query14_1	4357	4377	4316	4316
query15	219	205	182	182
query16	997	443	410	410
query17	1059	748	623	623
query18	2485	500	374	374
query19	222	209	166	166
query20	149	138	134	134
query21	222	170	118	118
query22	13690	13567	13318	13318
query23	17192	16314	16031	16031
query23_1	16128	16150	16125	16125
query24	7451	1741	1310	1310
query24_1	1276	1297	1284	1284
query25	566	467	411	411
query26	1300	318	172	172
query27	2774	524	344	344
query28	4452	1958	1926	1926
query29	969	635	489	489
query30	307	242	196	196
query31	1121	1065	939	939
query32	89	77	73	73
query33	543	350	295	295
query34	1178	1107	646	646
query35	761	781	669	669
query36	1366	1330	1184	1184
query37	155	97	91	91
query38	3196	3124	3028	3028
query39	940	930	903	903
query39_1	880	877	866	866
query40	231	146	124	124
query41	65	65	63	63
query42	108	110	113	110
query43	325	334	281	281
query44	
query45	212	201	195	195
query46	1014	1230	718	718
query47	2301	2365	2142	2142
query48	402	396	302	302
query49	630	487	387	387
query50	968	344	257	257
query51	4327	4265	4239	4239
query52	104	104	93	93
query53	249	275	206	206
query54	304	279	256	256
query55	92	91	84	84
query56	297	309	296	296
query57	1440	1437	1317	1317
query58	311	283	284	283
query59	1530	1627	1406	1406
query60	338	331	315	315
query61	184	183	182	182
query62	678	636	576	576
query63	248	203	209	203
query64	2474	842	682	682
query65	
query66	1740	500	375	375
query67	29913	29886	29214	29214
query68	
query69	457	344	308	308
query70	1028	954	989	954
query71	328	279	273	273
query72	3047	2697	2396	2396
query73	825	731	426	426
query74	5076	4948	4697	4697
query75	2657	2584	2295	2295
query76	2272	1143	752	752
query77	393	403	338	338
query78	12236	12296	11592	11592
query79	1436	1011	718	718
query80	1325	543	453	453
query81	537	277	237	237
query82	1027	160	126	126
query83	334	275	248	248
query84	294	141	110	110
query85	922	570	459	459
query86	453	330	328	328
query87	3431	3432	3246	3246
query88	3514	2668	2650	2650
query89	456	386	335	335
query90	1915	179	180	179
query91	184	169	138	138
query92	80	76	76	76
query93	1513	1510	903	903
query94	718	334	319	319
query95	656	478	341	341
query96	981	822	336	336
query97	2713	2750	2580	2580
query98	245	238	227	227
query99	1146	1102	981	981
Total cold run time: 253462 ms
Total hot run time: 169227 ms

@freemandealer
Copy link
Copy Markdown
Member Author

run beut

@freemandealer
Copy link
Copy Markdown
Member Author

run nonConcurrent

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants