Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix](Outfile) Fix the column type mapping in the orc/parquet file format #32281

Merged
merged 7 commits into from Mar 21, 2024

Conversation

BePPPower
Copy link
Contributor

@BePPPower BePPPower commented Mar 15, 2024

Proposed changes

Issue Number: close #xxx

Doris Type Orc Type Parquet Type
Date Long (logical: DATE) int32 (Logical: Date)
DateTime TIMESTAMP (logical: TIMESTAMP) int96

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

@BePPPower
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 38252 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 3694999762d0186fc0c1cd20343445f115ae5afd, data reload: false

------ Round 1 ----------------------------------
q1	17654	4314	4167	4167
q2	2021	149	141	141
q3	10608	1086	902	902
q4	7773	745	718	718
q5	7468	2558	2599	2558
q6	182	119	120	119
q7	1221	825	800	800
q8	9325	2032	2037	2032
q9	7288	6434	6439	6434
q10	8551	3527	3606	3527
q11	433	223	217	217
q12	806	298	303	298
q13	18017	2885	2860	2860
q14	283	252	258	252
q15	501	449	443	443
q16	507	391	382	382
q17	949	493	536	493
q18	7173	6398	6526	6398
q19	4505	1475	1405	1405
q20	540	296	271	271
q21	6057	3531	3632	3531
q22	363	305	304	304
Total cold run time: 112225 ms
Total hot run time: 38252 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4168	4110	4128	4110
q2	315	222	220	220
q3	2948	2840	2790	2790
q4	1809	1528	1530	1528
q5	5187	5291	5249	5249
q6	194	114	114	114
q7	2201	1846	1863	1846
q8	3179	3295	3302	3295
q9	8554	8581	8564	8564
q10	3739	3699	3701	3699
q11	540	446	440	440
q12	699	544	561	544
q13	16918	2872	2814	2814
q14	276	257	259	257
q15	484	454	440	440
q16	442	426	417	417
q17	1739	1473	1446	1446
q18	7421	7200	7109	7109
q19	1607	1519	1545	1519
q20	1903	1716	1730	1716
q21	4787	4815	4761	4761
q22	560	467	446	446
Total cold run time: 69670 ms
Total hot run time: 53324 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 34.96% (8576/24534)
Line Coverage: 26.66% (69510/260726)
Region Coverage: 25.96% (36107/139102)
Branch Coverage: 22.90% (18431/80476)
Coverage Report: http://coverage.selectdb-in.cc/coverage/3694999762d0186fc0c1cd20343445f115ae5afd_3694999762d0186fc0c1cd20343445f115ae5afd/report/index.html

@BePPPower
Copy link
Contributor Author

run buildall

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@@ -17,6 +17,8 @@

#pragma once

#include <cctz/time_zone.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: 'cctz/time_zone.h' file not found [clang-diagnostic-error]

#include <cctz/time_zone.h>
         ^

@@ -18,6 +18,7 @@
#pragma once

#include <arrow/type.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: 'arrow/type.h' file not found [clang-diagnostic-error]

#include <arrow/type.h>
         ^

@@ -17,6 +17,8 @@

#pragma once

#include <cctz/time_zone.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: 'cctz/time_zone.h' file not found [clang-diagnostic-error]

#include <cctz/time_zone.h>
         ^

@doris-robot
Copy link

TPC-H: Total hot run time: 38306 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit c95b67fdbaf50e2ec3452629e20da995cbb9c110, data reload: false

------ Round 1 ----------------------------------
q1	17641	4397	4104	4104
q2	2025	153	147	147
q3	10584	1112	889	889
q4	7080	749	697	697
q5	7469	2695	2659	2659
q6	186	123	122	122
q7	1236	817	808	808
q8	9331	2028	2011	2011
q9	7115	6464	6488	6464
q10	8521	3482	3653	3482
q11	429	230	217	217
q12	652	299	294	294
q13	17830	2845	2841	2841
q14	266	251	252	251
q15	506	456	449	449
q16	496	392	389	389
q17	973	533	532	532
q18	7256	6570	6436	6436
q19	2923	1417	1501	1417
q20	545	296	279	279
q21	6352	3532	3540	3532
q22	354	286	289	286
Total cold run time: 109770 ms
Total hot run time: 38306 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4135	4094	4056	4056
q2	322	221	220	220
q3	2966	2869	2881	2869
q4	1843	1556	1544	1544
q5	5243	5259	5228	5228
q6	206	122	117	117
q7	2226	1871	1855	1855
q8	3156	3290	3288	3288
q9	8535	8548	8565	8548
q10	3696	3681	3709	3681
q11	550	446	443	443
q12	710	570	557	557
q13	16907	2831	2851	2831
q14	297	247	255	247
q15	485	445	448	445
q16	461	409	415	409
q17	1736	1519	1469	1469
q18	7505	7227	7121	7121
q19	1622	1502	1545	1502
q20	1899	1700	1680	1680
q21	4770	4700	4658	4658
q22	523	458	457	457
Total cold run time: 69793 ms
Total hot run time: 53225 ms

@morningman morningman self-assigned this Mar 15, 2024
morningman
morningman previously approved these changes Mar 15, 2024
Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman
Copy link
Contributor

Need to update the document in https://doris.apache.org/docs/data-operate/export/export-manual/

Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Mar 15, 2024
Copy link
Contributor

PR approved by anyone and no changes requested.

@BePPPower
Copy link
Contributor Author

run feut

@BePPPower
Copy link
Contributor Author

run buildall

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Mar 15, 2024
@doris-robot
Copy link

TPC-H: Total hot run time: 38368 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 42c1a4314df86389346382df48f6183557d0bae8, data reload: false

------ Round 1 ----------------------------------
q1	17622	4222	4107	4107
q2	2027	147	141	141
q3	10608	1069	898	898
q4	7768	778	742	742
q5	7458	2712	2618	2618
q6	186	123	122	122
q7	1185	832	804	804
q8	9383	1980	2047	1980
q9	7071	6484	6496	6484
q10	8490	3500	3640	3500
q11	421	231	214	214
q12	626	300	305	300
q13	17813	2865	2830	2830
q14	278	256	243	243
q15	493	450	455	450
q16	488	392	393	392
q17	961	577	558	558
q18	7301	6556	6410	6410
q19	2622	1477	1516	1477
q20	570	286	285	285
q21	6291	3499	3579	3499
q22	369	314	314	314
Total cold run time: 110031 ms
Total hot run time: 38368 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4111	4089	4063	4063
q2	322	223	223	223
q3	2975	2879	2821	2821
q4	1866	1573	1551	1551
q5	5209	5234	5271	5234
q6	200	124	126	124
q7	2270	1845	1858	1845
q8	3186	3313	3282	3282
q9	8560	8586	8520	8520
q10	3687	3680	3677	3677
q11	547	439	427	427
q12	723	537	535	535
q13	16922	2850	2841	2841
q14	282	249	246	246
q15	487	442	441	441
q16	454	403	406	403
q17	1752	1496	1475	1475
q18	7535	7190	7088	7088
q19	1636	1554	1593	1554
q20	1933	1709	1690	1690
q21	4846	4616	4724	4616
q22	509	477	446	446
Total cold run time: 70012 ms
Total hot run time: 53102 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 34.95% (8575/24534)
Line Coverage: 26.66% (69516/260773)
Region Coverage: 25.94% (36101/139177)
Branch Coverage: 22.88% (18438/80574)
Coverage Report: http://coverage.selectdb-in.cc/coverage/42c1a4314df86389346382df48f6183557d0bae8_42c1a4314df86389346382df48f6183557d0bae8/report/index.html

@BePPPower
Copy link
Contributor Author

run buildall

2 similar comments
@BePPPower
Copy link
Contributor Author

run buildall

@BePPPower
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.25% (8701/24683)
Line Coverage: 27.08% (71198/262882)
Region Coverage: 26.33% (36929/140244)
Branch Coverage: 23.24% (18887/81254)
Coverage Report: http://coverage.selectdb-in.cc/coverage/e8cd841e2cf13184122eec7a72309c4208c78621_e8cd841e2cf13184122eec7a72309c4208c78621/report/index.html

@morningman
Copy link
Contributor

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.25% (8702/24687)
Line Coverage: 27.08% (71195/262904)
Region Coverage: 26.32% (36923/140259)
Branch Coverage: 23.25% (18890/81262)
Coverage Report: http://coverage.selectdb-in.cc/coverage/fd6dedc748f5eb4ce679a1dcab12e064ecdce923_fd6dedc748f5eb4ce679a1dcab12e064ecdce923/report/index.html

@BePPPower
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.33% (8729/24710)
Line Coverage: 27.14% (71418/263135)
Region Coverage: 26.39% (37052/140405)
Branch Coverage: 23.29% (18945/81358)
Coverage Report: http://coverage.selectdb-in.cc/coverage/95a33b6075ace354737592f73171a1f7da62cfe6_95a33b6075ace354737592f73171a1f7da62cfe6/report/index.html

@BePPPower
Copy link
Contributor Author

run external

@BePPPower
Copy link
Contributor Author

run p0

1 similar comment
@BePPPower
Copy link
Contributor Author

run p0

@BePPPower
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.26% (8728/24750)
Line Coverage: 27.08% (71457/263840)
Region Coverage: 26.32% (37075/140862)
Branch Coverage: 23.23% (18961/81612)
Coverage Report: http://coverage.selectdb-in.cc/coverage/95a33b6075ace354737592f73171a1f7da62cfe6_95a33b6075ace354737592f73171a1f7da62cfe6/report/index.html

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Mar 21, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

@kaka11chen kaka11chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman morningman merged commit 09e5845 into apache:master Mar 21, 2024
24 of 30 checks passed
yiguolei pushed a commit that referenced this pull request Mar 22, 2024
…rmat (#32281)

| Doris Type             | Orc Type                     |  Parquet Type                |
|---------------------|--------------------|------------------------|
| Date                            | Long (logical: DATE)                 |       int32 (Logical: Date)                                        |
| DateTime                    | TIMESTAMP (logical: TIMESTAMP)    |       int96                          |
seawinde pushed a commit to seawinde/doris that referenced this pull request Mar 22, 2024
…rmat (apache#32281)

| Doris Type             | Orc Type                     |  Parquet Type                |
|---------------------|--------------------|------------------------|
| Date                            | Long (logical: DATE)                 |       int32 (Logical: Date)                                        |
| DateTime                    | TIMESTAMP (logical: TIMESTAMP)    |       int96                          |
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. kind/behavior-changed reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants