Skip to content

Conversation

@hubgeter
Copy link
Contributor

@hubgeter hubgeter commented Jan 28, 2026

What problem does this PR solve?

Problem Summary:
This PR enhances the output of EXPLAIN VERBOSE for File Scan nodes by adding the following metrics:
dataFileNum=xxx, deleteFileNum=xxx, deleteSplitNum=xxx
Especially useful for iceberg/paimon/hive acid

These metrics provide more visibility into the underlying file and split layout, helping users better tune parameters and control query performance.
Details:
dataFileNum : The number of distinct data files that need to be read.
This is not equivalent to the number of splits, since a single data file can be divided into multiple splits.

deleteFileNum : The number of distinct delete files that need to be read.

deleteSplitNum : Added because the relationship between data files and delete files is many-to-many:
one data file may be associated with multiple delete files
one delete file may apply to multiple data files
Using deleteSplitNum / dataSplitNum, users can estimate the average number of delete splits that need to be read per data split.

Example:

mysql> explain verbose select * from iceberg.format_v3.dv_test_1w;
+-----------------------------------------------------------------------------------------------------------------------------------------------+
| Explain String(Nereids Planner)                                                                                                               |
+-----------------------------------------------------------------------------------------------------------------------------------------------+
| PLAN FRAGMENT 0                                                                                                                               |
|   OUTPUT EXPRS:                                                                                                                               |
|     id[#0]                                                                                                                                    |
|     grp[#1]                                                                                                                                   |
|     value[#2]                                                                                                                                 |
|     ts[#3]                                                                                                                                    |
|   PARTITION: RANDOM                                                                                                                           |
|                                                                                                                                               |
|   HAS_COLO_PLAN_NODE: false                                                                                                                   |
|                                                                                                                                               |
|   VRESULT SINK                                                                                                                                |
|      MYSQL_PROTOCOL                                                                                                                           |
|                                                                                                                                               |
|   0:VICEBERG_SCAN_NODE(32)                                                                                                                    |
|      table: iceberg.format_v3.dv_test_1w                                                                                                      |
|      inputSplitNum=220, totalFileSize=720774, scanRanges=220                                                                                  |
|      partition=0/0                                                                                                                            |
|      backends:                                                                                                                                |
|        1769590309070                                                                                                                          |
|          s3://warehouse/wh/format_v3/dv_test_1w/data/00004-51-fc462f9a-d42a-404d-adfc-c8d2781c8d04-0-00001.parquet start: 4 length: 2672      |
|          s3://warehouse/wh/format_v3/dv_test_1w/data/00003-50-fc462f9a-d42a-404d-adfc-c8d2781c8d04-0-00001.parquet start: 4 length: 2852      |
|          s3://warehouse/wh/format_v3/dv_test_1w/data/00000-47-fc462f9a-d42a-404d-adfc-c8d2781c8d04-0-00001.parquet start: 4 length: 2894      |
|          ... other 216 files ...                                                                                                              |
|          s3://warehouse/wh/format_v3/dv_test_1w/data/00001-48-fc462f9a-d42a-404d-adfc-c8d2781c8d04-0-00001.parquet start: 58397 length: 13894 |
|          dataFileNum=10, deleteFileNum=1 deleteSplitNum=220                                                                               |
|      cardinality=33334, numNodes=1                                                                                                            |
|      pushdown agg=NONE                                                                                                                        |
|      tuple ids: 0                                                                                                                             |
|                                                                                                                                               |
| Tuples:                                                                                                                                       |
| TupleDescriptor{id=0, tbl=dv_test_1w}                                                                                                         |
|   SlotDescriptor{id=0, col=id, colUniqueId=1, type=bigint, nullable=true, isAutoIncrement=false, subColPath=null, virtualColumn=null}         |
|   SlotDescriptor{id=1, col=grp, colUniqueId=2, type=int, nullable=true, isAutoIncrement=false, subColPath=null, virtualColumn=null}           |
|   SlotDescriptor{id=2, col=value, colUniqueId=3, type=int, nullable=true, isAutoIncrement=false, subColPath=null, virtualColumn=null}         |
|   SlotDescriptor{id=3, col=ts, colUniqueId=4, type=datetimev2(6), nullable=true, isAutoIncrement=false, subColPath=null, virtualColumn=null}  |
|                                                                                                                                               |
|                                                                                                                                               |
|                                                                                                                                               |
|                                                                                                                                               |
| ========== STATISTICS ==========                                                                                                              |
+-----------------------------------------------------------------------------------------------------------------------------------------------+

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Jan 28, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@hubgeter
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 31670 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 44287822f92867ea9a75f43236e6419c748de82e, data reload: false

------ Round 1 ----------------------------------
q1	17807	5282	5109	5109
q2	2023	307	186	186
q3	10231	1323	733	733
q4	10201	829	315	315
q5	7534	2099	1921	1921
q6	192	183	157	157
q7	859	712	623	623
q8	9287	1357	1137	1137
q9	5234	4911	4810	4810
q10	6766	1944	1562	1562
q11	511	279	275	275
q12	334	376	225	225
q13	17770	4045	3178	3178
q14	229	246	216	216
q15	907	818	814	814
q16	668	675	627	627
q17	637	771	499	499
q18	6732	6430	6290	6290
q19	1364	993	604	604
q20	387	347	226	226
q21	2610	1979	1895	1895
q22	350	310	268	268
Total cold run time: 102633 ms
Total hot run time: 31670 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5310	5284	5300	5284
q2	261	340	247	247
q3	2138	2629	2253	2253
q4	1358	1719	1272	1272
q5	4303	4170	4225	4170
q6	222	180	141	141
q7	1921	2180	1841	1841
q8	2687	2478	2458	2458
q9	7434	7384	7434	7384
q10	2871	3096	2628	2628
q11	534	471	446	446
q12	690	791	613	613
q13	3929	4447	3753	3753
q14	282	316	273	273
q15	870	836	838	836
q16	687	753	671	671
q17	1132	1299	1322	1299
q18	8112	7864	7980	7864
q19	868	843	878	843
q20	2074	2150	1984	1984
q21	4801	4470	4115	4115
q22	591	551	498	498
Total cold run time: 53075 ms
Total hot run time: 50873 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 28.27 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 44287822f92867ea9a75f43236e6419c748de82e, data reload: false

query1	0.06	0.05	0.05
query2	0.10	0.04	0.04
query3	0.25	0.09	0.08
query4	1.60	0.12	0.11
query5	0.27	0.25	0.26
query6	1.16	0.67	0.67
query7	0.03	0.02	0.03
query8	0.05	0.04	0.03
query9	0.56	0.50	0.48
query10	0.54	0.53	0.55
query11	0.14	0.09	0.10
query12	0.15	0.10	0.10
query13	0.63	0.62	0.60
query14	1.08	1.06	1.05
query15	0.88	0.88	0.86
query16	0.39	0.41	0.39
query17	1.12	1.10	1.16
query18	0.22	0.22	0.21
query19	1.97	1.96	2.02
query20	0.02	0.02	0.02
query21	15.39	0.28	0.14
query22	5.02	0.05	0.05
query23	15.90	0.28	0.10
query24	1.47	0.34	0.31
query25	0.07	0.05	0.09
query26	0.14	0.13	0.13
query27	0.09	0.06	0.06
query28	3.43	1.17	0.96
query29	12.54	4.00	3.18
query30	0.28	0.14	0.11
query31	2.81	0.68	0.40
query32	3.23	0.60	0.50
query33	3.35	3.29	3.22
query34	16.20	5.50	4.73
query35	4.76	4.83	4.81
query36	0.65	0.50	0.49
query37	0.11	0.07	0.07
query38	0.07	0.05	0.04
query39	0.04	0.04	0.03
query40	0.19	0.17	0.15
query41	0.09	0.04	0.03
query42	0.05	0.03	0.03
query43	0.05	0.04	0.04
Total cold run time: 97.15 s
Total hot run time: 28.27 s

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 0.00% (0/70) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 20.00% (14/70) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants