Skip to content

[enhance](hive) support skip.header.line.count #49929

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 10, 2025

Conversation

suxiaogang223
Copy link
Contributor

@suxiaogang223 suxiaogang223 commented Apr 9, 2025

What problem does this PR solve?

Support skip.header.line.count from hive text table

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@suxiaogang223 suxiaogang223 changed the title [enhance](hive) support skip.header.line.count and skip.footer.line.count for hive table [enhance](hive) support skip.header.line.count Apr 10, 2025
@suxiaogang223
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 34127 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 2f5cf08e6286526daebf44fe3b28fad46c6fa346, data reload: false

------ Round 1 ----------------------------------
q1	25979	5085	5028	5028
q2	2077	280	190	190
q3	10371	1265	689	689
q4	10225	1011	551	551
q5	7514	2398	2293	2293
q6	184	161	131	131
q7	902	744	613	613
q8	9308	1225	1056	1056
q9	6815	5174	5121	5121
q10	6807	2299	1921	1921
q11	483	296	304	296
q12	352	357	214	214
q13	17757	3719	3126	3126
q14	224	227	210	210
q15	538	475	483	475
q16	611	622	593	593
q17	629	856	390	390
q18	7423	7161	7042	7042
q19	2518	988	548	548
q20	322	332	224	224
q21	4098	2660	2459	2459
q22	1057	1068	957	957
Total cold run time: 116194 ms
Total hot run time: 34127 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5225	5080	5118	5080
q2	244	329	236	236
q3	2140	2686	2278	2278
q4	1404	1813	1408	1408
q5	4442	4452	4368	4368
q6	216	172	129	129
q7	1963	1903	1788	1788
q8	2618	2591	2542	2542
q9	7312	7169	7142	7142
q10	3013	3177	2760	2760
q11	573	514	498	498
q12	720	806	604	604
q13	3486	3911	3341	3341
q14	271	302	266	266
q15	521	477	475	475
q16	644	694	650	650
q17	1137	1548	1409	1409
q18	7826	7560	7288	7288
q19	859	811	845	811
q20	1886	1981	1850	1850
q21	5494	4911	5097	4911
q22	1097	1053	1024	1024
Total cold run time: 53091 ms
Total hot run time: 50858 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 194891 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 2f5cf08e6286526daebf44fe3b28fad46c6fa346, data reload: false

query1	1433	1056	1049	1049
query2	6107	1953	1965	1953
query3	11145	4694	4776	4694
query4	25449	23812	23654	23654
query5	5043	622	495	495
query6	299	190	186	186
query7	3980	501	288	288
query8	307	260	246	246
query9	8549	2607	2591	2591
query10	516	315	282	282
query11	15506	15107	14755	14755
query12	163	121	107	107
query13	1585	529	411	411
query14	9162	6186	6247	6186
query15	217	195	174	174
query16	7619	641	474	474
query17	1154	810	610	610
query18	2031	451	320	320
query19	203	200	171	171
query20	127	121	122	121
query21	210	132	114	114
query22	4605	4715	4321	4321
query23	34550	33989	33639	33639
query24	8797	2517	2451	2451
query25	547	460	407	407
query26	1235	277	161	161
query27	2780	509	350	350
query28	4530	2471	2426	2426
query29	702	592	463	463
query30	287	228	195	195
query31	894	878	776	776
query32	81	65	61	61
query33	548	385	322	322
query34	805	895	559	559
query35	841	834	785	785
query36	1002	1015	893	893
query37	115	99	72	72
query38	4292	4321	4169	4169
query39	1511	1456	1454	1454
query40	222	117	110	110
query41	53	54	52	52
query42	127	105	110	105
query43	521	541	505	505
query44	1347	825	851	825
query45	187	182	172	172
query46	859	1033	674	674
query47	1887	1911	1827	1827
query48	403	444	302	302
query49	755	555	469	469
query50	673	692	433	433
query51	4279	4266	4266	4266
query52	115	112	100	100
query53	233	258	189	189
query54	584	601	530	530
query55	83	85	86	85
query56	309	317	299	299
query57	1180	1209	1174	1174
query58	287	307	266	266
query59	2768	2897	2820	2820
query60	354	353	319	319
query61	129	128	132	128
query62	761	753	666	666
query63	220	185	182	182
query64	4129	1099	702	702
query65	4493	4370	4353	4353
query66	1004	427	318	318
query67	15953	15529	15507	15507
query68	8498	885	525	525
query69	510	293	261	261
query70	1206	1142	1142	1142
query71	474	329	304	304
query72	5816	4839	4858	4839
query73	764	689	357	357
query74	9228	8833	8898	8833
query75	3860	3237	2724	2724
query76	3762	1192	776	776
query77	786	382	285	285
query78	9923	10396	9350	9350
query79	1655	818	557	557
query80	676	508	430	430
query81	488	257	227	227
query82	459	126	97	97
query83	273	256	247	247
query84	293	105	85	85
query85	758	416	309	309
query86	343	305	278	278
query87	4464	4525	4504	4504
query88	2875	2209	2222	2209
query89	400	312	287	287
query90	1930	210	217	210
query91	149	139	120	120
query92	80	58	57	57
query93	1045	968	588	588
query94	666	403	314	314
query95	373	299	289	289
query96	484	556	275	275
query97	3151	3273	3137	3137
query98	229	205	199	199
query99	1440	1413	1313	1313
Total cold run time: 279900 ms
Total hot run time: 194891 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.94 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 2f5cf08e6286526daebf44fe3b28fad46c6fa346, data reload: false

query1	0.04	0.04	0.03
query2	0.13	0.10	0.11
query3	0.25	0.19	0.20
query4	1.59	0.19	0.19
query5	0.59	0.58	0.57
query6	1.18	0.73	0.73
query7	0.02	0.01	0.01
query8	0.04	0.03	0.04
query9	0.58	0.53	0.52
query10	0.59	0.60	0.57
query11	0.16	0.10	0.11
query12	0.14	0.11	0.11
query13	0.62	0.61	0.60
query14	2.70	2.70	2.79
query15	0.93	0.86	0.85
query16	0.38	0.38	0.39
query17	1.04	1.03	1.04
query18	0.22	0.20	0.20
query19	1.88	1.93	1.91
query20	0.01	0.01	0.01
query21	15.35	0.92	0.54
query22	0.74	1.20	0.65
query23	14.96	1.36	0.67
query24	7.32	1.42	0.44
query25	0.48	0.26	0.09
query26	0.66	0.17	0.14
query27	0.06	0.05	0.05
query28	9.32	0.87	0.43
query29	12.57	3.96	3.31
query30	0.25	0.09	0.07
query31	2.83	0.59	0.38
query32	3.25	0.54	0.47
query33	3.05	3.05	3.04
query34	15.78	5.17	4.50
query35	4.56	4.54	4.53
query36	0.67	0.51	0.49
query37	0.08	0.06	0.06
query38	0.05	0.04	0.03
query39	0.03	0.02	0.02
query40	0.17	0.14	0.13
query41	0.09	0.03	0.02
query42	0.03	0.02	0.02
query43	0.04	0.03	0.03
Total cold run time: 105.43 s
Total hot run time: 30.94 s

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 10, 2025
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@kaka11chen kaka11chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman morningman merged commit 105b658 into apache:master Apr 10, 2025
31 of 32 checks passed
github-actions bot pushed a commit that referenced this pull request Apr 10, 2025
### What problem does this PR solve?
Support skip.header.line.count from hive text table
@morningman morningman added the usercase Important user case type label label Apr 24, 2025
dataroaring pushed a commit that referenced this pull request Apr 25, 2025
…9975)

Cherry-picked from #49929

Co-authored-by: Socrates <suyiteng@selectdb.com>
koarz pushed a commit to koarz/doris that referenced this pull request Jun 4, 2025
### What problem does this PR solve?
Support skip.header.line.count from hive text table
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/3.0.6-merged reviewed usercase Important user case type label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants