Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature][Enhancement](hive-writer) Add hive-writer runtime profiles, change output file names and Use table_sink_partition_write_max_partition_nums_per_writer. #33245

Merged
merged 1 commit into from
Apr 7, 2024

Conversation

kaka11chen
Copy link
Contributor

@kaka11chen kaka11chen commented Apr 3, 2024

Proposed changes

Issue Number: #31442

  • Add hive-writer runtime profiles.
  • Change output file names to ${query_id}${uuid}-${index}.${compression}.${format}. e.g. "d8735c6fa444a6d-acd392981e510c2b_34fbdcbb-b2e1-4f2c-b68c-a384238954a9-0.snappy.parquet". For the same partition writer, when the file size exceeds hive_sink_max_file_size, the currently written file will be closed and a new file will be generated, in which ${index} in the new file name will be incremented, while the rest will be the same .
  • Use table_sink_partition_write_max_partition_nums_per_writer to control max open partitions in hive sink.

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@kaka11chen
Copy link
Contributor Author

run buildall

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@@ -68,114 +81,154 @@ Status VHiveTableWriter::open(RuntimeState* state, RuntimeProfile* profile) {
}

Status VHiveTableWriter::write(vectorized::Block& block) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: function 'write' has cognitive complexity of 79 (threshold 50) [readability-function-cognitive-complexity]

Status VHiveTableWriter::write(vectorized::Block& block) {
                         ^
Additional context

be/src/vec/sink/writer/vhive_table_writer.cpp:88: +1, including nesting penalty of 0, nesting level increased to 1

    if (_partition_columns_input_index.empty()) {
    ^

be/src/vec/sink/writer/vhive_table_writer.cpp:93: +2, including nesting penalty of 1, nesting level increased to 2

            if (writer_iter == _partitions_to_writers.end()) {
            ^

be/src/vec/sink/writer/vhive_table_writer.cpp:96: +3, including nesting penalty of 2, nesting level increased to 3

                } catch (doris::Exception& e) {
                  ^

be/src/vec/sink/writer/vhive_table_writer.cpp:100: +3, including nesting penalty of 2, nesting level increased to 3

                RETURN_IF_ERROR(writer->open(_state, _profile));
                ^

be/src/common/status.h:541: expanded from macro 'RETURN_IF_ERROR'

    do {                                \
    ^

be/src/vec/sink/writer/vhive_table_writer.cpp:100: +4, including nesting penalty of 3, nesting level increased to 4

                RETURN_IF_ERROR(writer->open(_state, _profile));
                ^

be/src/common/status.h:543: expanded from macro 'RETURN_IF_ERROR'

        if (UNLIKELY(!_status_.ok())) { \
        ^

be/src/vec/sink/writer/vhive_table_writer.cpp:102: +1, nesting level increased to 2

            } else {
              ^

be/src/vec/sink/writer/vhive_table_writer.cpp:103: +3, including nesting penalty of 2, nesting level increased to 3

                if (writer_iter->second->written_len() > config::hive_sink_max_file_size) {
                ^

be/src/vec/sink/writer/vhive_table_writer.cpp:114: +4, including nesting penalty of 3, nesting level increased to 4

                    } catch (doris::Exception& e) {
                      ^

be/src/vec/sink/writer/vhive_table_writer.cpp:118: +4, including nesting penalty of 3, nesting level increased to 4

                    RETURN_IF_ERROR(writer->open(_state, _profile));
                    ^

be/src/common/status.h:541: expanded from macro 'RETURN_IF_ERROR'

    do {                                \
    ^

be/src/vec/sink/writer/vhive_table_writer.cpp:118: +5, including nesting penalty of 4, nesting level increased to 5

                    RETURN_IF_ERROR(writer->open(_state, _profile));
                    ^

be/src/common/status.h:543: expanded from macro 'RETURN_IF_ERROR'

        if (UNLIKELY(!_status_.ok())) { \
        ^

be/src/vec/sink/writer/vhive_table_writer.cpp:119: +1, nesting level increased to 3

                } else {
                  ^

be/src/vec/sink/writer/vhive_table_writer.cpp:125: +2, including nesting penalty of 1, nesting level increased to 2

        RETURN_IF_ERROR(writer->write(block));
        ^

be/src/common/status.h:541: expanded from macro 'RETURN_IF_ERROR'

    do {                                \
    ^

be/src/vec/sink/writer/vhive_table_writer.cpp:125: +3, including nesting penalty of 2, nesting level increased to 3

        RETURN_IF_ERROR(writer->write(block));
        ^

be/src/common/status.h:543: expanded from macro 'RETURN_IF_ERROR'

        if (UNLIKELY(!_status_.ok())) { \
        ^

be/src/vec/sink/writer/vhive_table_writer.cpp:131: +1, including nesting penalty of 0, nesting level increased to 1

        for (int i = 0; i < block.rows(); ++i) {
        ^

be/src/vec/sink/writer/vhive_table_writer.cpp:135: +2, including nesting penalty of 1, nesting level increased to 2

            } catch (doris::Exception& e) {
              ^

be/src/vec/sink/writer/vhive_table_writer.cpp:142: nesting level increased to 2

                    [&](const std::string& partition_name, int position,
                    ^

be/src/vec/sink/writer/vhive_table_writer.cpp:148: +3, including nesting penalty of 2, nesting level increased to 3

                    RETURN_IF_ERROR(writer->open(_state, _profile));
                    ^

be/src/common/status.h:541: expanded from macro 'RETURN_IF_ERROR'

    do {                                \
    ^

be/src/vec/sink/writer/vhive_table_writer.cpp:148: +4, including nesting penalty of 3, nesting level increased to 4

                    RETURN_IF_ERROR(writer->open(_state, _profile));
                    ^

be/src/common/status.h:543: expanded from macro 'RETURN_IF_ERROR'

        if (UNLIKELY(!_status_.ok())) { \
        ^

be/src/vec/sink/writer/vhive_table_writer.cpp:154: +3, including nesting penalty of 2, nesting level increased to 3

                } catch (doris::Exception& e) {
                  ^

be/src/vec/sink/writer/vhive_table_writer.cpp:161: +2, including nesting penalty of 1, nesting level increased to 2

            if (writer_iter == _partitions_to_writers.end()) {
            ^

be/src/vec/sink/writer/vhive_table_writer.cpp:163: +3, including nesting penalty of 2, nesting level increased to 3

                if (_partitions_to_writers.size() + 1 > config::hive_sink_max_open_partitions) {
                ^

be/src/vec/sink/writer/vhive_table_writer.cpp:167: +3, including nesting penalty of 2, nesting level increased to 3

                RETURN_IF_ERROR(create_and_open_writer(partition_name, i, nullptr, 0, writer));
                ^

be/src/common/status.h:541: expanded from macro 'RETURN_IF_ERROR'

    do {                                \
    ^

be/src/vec/sink/writer/vhive_table_writer.cpp:167: +4, including nesting penalty of 3, nesting level increased to 4

                RETURN_IF_ERROR(create_and_open_writer(partition_name, i, nullptr, 0, writer));
                ^

be/src/common/status.h:543: expanded from macro 'RETURN_IF_ERROR'

        if (UNLIKELY(!_status_.ok())) { \
        ^

be/src/vec/sink/writer/vhive_table_writer.cpp:168: +1, nesting level increased to 2

            } else {
              ^

be/src/vec/sink/writer/vhive_table_writer.cpp:170: +3, including nesting penalty of 2, nesting level increased to 3

                if (writer_iter->second->written_len() > config::hive_sink_max_file_size) {
                ^

be/src/vec/sink/writer/vhive_table_writer.cpp:179: +4, including nesting penalty of 3, nesting level increased to 4

                    RETURN_IF_ERROR(create_and_open_writer(partition_name, i, &file_name,
                    ^

be/src/common/status.h:541: expanded from macro 'RETURN_IF_ERROR'

    do {                                \
    ^

be/src/vec/sink/writer/vhive_table_writer.cpp:179: +5, including nesting penalty of 4, nesting level increased to 5

                    RETURN_IF_ERROR(create_and_open_writer(partition_name, i, &file_name,
                    ^

be/src/common/status.h:543: expanded from macro 'RETURN_IF_ERROR'

        if (UNLIKELY(!_status_.ok())) { \
        ^

be/src/vec/sink/writer/vhive_table_writer.cpp:181: +1, nesting level increased to 3

                } else {
                  ^

be/src/vec/sink/writer/vhive_table_writer.cpp:185: +3, including nesting penalty of 2, nesting level increased to 3

                if (writer_pos_iter == writer_positions.end()) {
                ^

be/src/vec/sink/writer/vhive_table_writer.cpp:189: +1, nesting level increased to 3

                } else {
                  ^

@@ -19,6 +19,7 @@

#include <gen_cpp/DataSinks_types.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: 'gen_cpp/DataSinks_types.h' file not found [clang-diagnostic-error]

#include <gen_cpp/DataSinks_types.h>
         ^

@doris-robot
Copy link

TPC-H: Total hot run time: 39338 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit c8508dcc447ff941fbb827d1d66708525e2241ce, data reload: false

------ Round 1 ----------------------------------
q1	18067	4232	4266	4232
q2	2393	199	189	189
q3	11183	1240	1350	1240
q4	10207	877	1005	877
q5	7482	3030	2971	2971
q6	223	133	132	132
q7	1115	638	617	617
q8	10132	2045	2060	2045
q9	6706	6234	6244	6234
q10	8444	3530	3509	3509
q11	413	236	232	232
q12	380	211	200	200
q13	17762	2925	2923	2923
q14	274	244	247	244
q15	526	491	493	491
q16	479	393	384	384
q17	975	948	936	936
q18	7286	6572	6571	6571
q19	1617	1557	1569	1557
q20	577	320	312	312
q21	3578	3162	3151	3151
q22	368	291	310	291
Total cold run time: 110187 ms
Total hot run time: 39338 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4097	4091	4060	4060
q2	328	217	223	217
q3	2974	2979	2972	2972
q4	1868	1879	1854	1854
q5	5337	5266	5256	5256
q6	209	125	124	124
q7	2236	1789	1815	1789
q8	3253	3322	3320	3320
q9	8581	8546	8511	8511
q10	3804	3830	3836	3830
q11	560	454	443	443
q12	717	538	603	538
q13	13688	2886	2932	2886
q14	294	255	269	255
q15	515	475	466	466
q16	453	406	401	401
q17	1763	1696	1682	1682
q18	7687	7444	7348	7348
q19	1656	1669	1657	1657
q20	1964	1724	1737	1724
q21	4941	4874	4789	4789
q22	523	419	454	419
Total cold run time: 67448 ms
Total hot run time: 54541 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 181257 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit c8508dcc447ff941fbb827d1d66708525e2241ce, data reload: false

query1	1221	1123	1115	1115
query2	6488	1791	1728	1728
query3	6668	214	220	214
query4	25430	21407	21532	21407
query5	4225	401	404	401
query6	273	180	187	180
query7	4628	307	301	301
query8	236	187	176	176
query9	8730	2209	2199	2199
query10	588	239	268	239
query11	14921	14525	14530	14525
query12	136	89	87	87
query13	1628	383	376	376
query14	8705	6761	6847	6761
query15	208	179	177	177
query16	7144	269	267	267
query17	1014	602	574	574
query18	1898	287	284	284
query19	207	155	156	155
query20	95	88	88	88
query21	209	126	131	126
query22	5007	4818	4834	4818
query23	33826	33165	32793	32793
query24	11136	3103	3057	3057
query25	689	387	392	387
query26	1888	158	157	157
query27	3081	325	332	325
query28	6795	1831	1810	1810
query29	1362	596	588	588
query30	296	175	166	166
query31	957	720	730	720
query32	99	55	57	55
query33	654	254	250	250
query34	957	479	484	479
query35	845	699	701	699
query36	1005	855	855	855
query37	288	74	72	72
query38	3519	3398	3403	3398
query39	1617	1552	1527	1527
query40	292	133	129	129
query41	49	46	48	46
query42	110	107	100	100
query43	444	402	411	402
query44	1162	720	719	719
query45	281	261	269	261
query46	1071	807	763	763
query47	1889	1780	1789	1780
query48	373	320	304	304
query49	1148	370	375	370
query50	803	387	393	387
query51	6834	6761	6826	6761
query52	115	94	94	94
query53	358	291	285	285
query54	287	229	235	229
query55	82	77	72	72
query56	246	227	229	227
query57	1245	1120	1131	1120
query58	248	207	209	207
query59	2596	2468	2246	2246
query60	264	244	252	244
query61	113	110	109	109
query62	719	439	451	439
query63	308	282	280	280
query64	6471	3125	3256	3125
query65	3074	3016	3046	3016
query66	1463	328	312	312
query67	15530	15225	14962	14962
query68	8939	576	581	576
query69	563	303	304	303
query70	1401	1113	1113	1113
query71	495	277	271	271
query72	6285	2568	2439	2439
query73	1424	329	328	328
query74	6658	6270	6460	6270
query75	3676	2277	2284	2277
query76	5683	1198	1238	1198
query77	671	247	249	247
query78	10855	10253	10241	10241
query79	11416	524	531	524
query80	1586	426	427	426
query81	481	239	230	230
query82	395	96	95	95
query83	226	168	164	164
query84	272	90	86	86
query85	1057	281	267	267
query86	378	273	287	273
query87	3680	3497	3496	3496
query88	4435	2372	2372	2372
query89	555	364	362	362
query90	1995	186	182	182
query91	133	107	101	101
query92	64	48	52	48
query93	6463	534	525	525
query94	1306	188	182	182
query95	443	319	331	319
query96	608	271	278	271
query97	2656	2518	2503	2503
query98	226	223	210	210
query99	1223	833	838	833
Total cold run time: 302496 ms
Total hot run time: 181257 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.76 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit c8508dcc447ff941fbb827d1d66708525e2241ce, data reload: false

query1	0.04	0.03	0.03
query2	0.07	0.04	0.04
query3	0.24	0.05	0.04
query4	1.66	0.08	0.07
query5	0.48	0.47	0.49
query6	1.16	0.65	0.64
query7	0.02	0.01	0.01
query8	0.05	0.04	0.04
query9	0.55	0.53	0.52
query10	0.56	0.57	0.58
query11	0.16	0.11	0.11
query12	0.15	0.11	0.12
query13	0.61	0.59	0.59
query14	0.78	0.78	0.81
query15	0.86	0.83	0.85
query16	0.35	0.35	0.36
query17	0.99	1.00	0.97
query18	0.25	0.26	0.27
query19	1.84	1.73	1.70
query20	0.02	0.01	0.01
query21	15.43	0.66	0.65
query22	3.92	8.10	1.32
query23	17.76	1.37	1.28
query24	1.52	0.20	0.19
query25	0.15	0.07	0.08
query26	0.28	0.17	0.17
query27	0.08	0.08	0.08
query28	13.89	0.96	0.95
query29	12.52	3.32	3.31
query30	0.26	0.06	0.05
query31	2.87	0.40	0.39
query32	3.26	0.50	0.48
query33	2.90	2.88	2.90
query34	15.51	4.35	4.35
query35	4.40	4.38	4.37
query36	0.68	0.48	0.48
query37	0.19	0.16	0.16
query38	0.15	0.15	0.14
query39	0.04	0.03	0.04
query40	0.19	0.14	0.16
query41	0.10	0.04	0.04
query42	0.05	0.06	0.05
query43	0.04	0.04	0.04
Total cold run time: 107.03 s
Total hot run time: 29.76 s

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.63% (8881/24924)
Line Coverage: 27.37% (72917/266460)
Region Coverage: 26.54% (37699/142072)
Branch Coverage: 23.35% (19212/82284)
Coverage Report: http://coverage.selectdb-in.cc/coverage/c8508dcc447ff941fbb827d1d66708525e2241ce_c8508dcc447ff941fbb827d1d66708525e2241ce/report/index.html

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit c8508dcc447ff941fbb827d1d66708525e2241ce with default session variables
Stream load json:         19 seconds loaded 2358488459 Bytes, about 118 MB/s
Stream load orc:          58 seconds loaded 1101869774 Bytes, about 18 MB/s
Stream load parquet:      32 seconds loaded 861443392 Bytes, about 25 MB/s
Insert into select:       15.7 seconds inserted 10000000 Rows, about 636K ops/s

@kaka11chen
Copy link
Contributor Author

run buildall

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@@ -68,114 +81,153 @@ Status VHiveTableWriter::open(RuntimeState* state, RuntimeProfile* profile) {
}

Status VHiveTableWriter::write(vectorized::Block& block) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: function 'write' has cognitive complexity of 79 (threshold 50) [readability-function-cognitive-complexity]

Status VHiveTableWriter::write(vectorized::Block& block) {
                         ^
Additional context

be/src/vec/sink/writer/vhive_table_writer.cpp:88: +1, including nesting penalty of 0, nesting level increased to 1

    if (_partition_columns_input_index.empty()) {
    ^

be/src/vec/sink/writer/vhive_table_writer.cpp:93: +2, including nesting penalty of 1, nesting level increased to 2

            if (writer_iter == _partitions_to_writers.end()) {
            ^

be/src/vec/sink/writer/vhive_table_writer.cpp:96: +3, including nesting penalty of 2, nesting level increased to 3

                } catch (doris::Exception& e) {
                  ^

be/src/vec/sink/writer/vhive_table_writer.cpp:100: +3, including nesting penalty of 2, nesting level increased to 3

                RETURN_IF_ERROR(writer->open(_state, _profile));
                ^

be/src/common/status.h:541: expanded from macro 'RETURN_IF_ERROR'

    do {                                \
    ^

be/src/vec/sink/writer/vhive_table_writer.cpp:100: +4, including nesting penalty of 3, nesting level increased to 4

                RETURN_IF_ERROR(writer->open(_state, _profile));
                ^

be/src/common/status.h:543: expanded from macro 'RETURN_IF_ERROR'

        if (UNLIKELY(!_status_.ok())) { \
        ^

be/src/vec/sink/writer/vhive_table_writer.cpp:101: +1, nesting level increased to 2

            } else {
              ^

be/src/vec/sink/writer/vhive_table_writer.cpp:102: +3, including nesting penalty of 2, nesting level increased to 3

                if (writer_iter->second->written_len() > config::hive_sink_max_file_size) {
                ^

be/src/vec/sink/writer/vhive_table_writer.cpp:113: +4, including nesting penalty of 3, nesting level increased to 4

                    } catch (doris::Exception& e) {
                      ^

be/src/vec/sink/writer/vhive_table_writer.cpp:117: +4, including nesting penalty of 3, nesting level increased to 4

                    RETURN_IF_ERROR(writer->open(_state, _profile));
                    ^

be/src/common/status.h:541: expanded from macro 'RETURN_IF_ERROR'

    do {                                \
    ^

be/src/vec/sink/writer/vhive_table_writer.cpp:117: +5, including nesting penalty of 4, nesting level increased to 5

                    RETURN_IF_ERROR(writer->open(_state, _profile));
                    ^

be/src/common/status.h:543: expanded from macro 'RETURN_IF_ERROR'

        if (UNLIKELY(!_status_.ok())) { \
        ^

be/src/vec/sink/writer/vhive_table_writer.cpp:118: +1, nesting level increased to 3

                } else {
                  ^

be/src/vec/sink/writer/vhive_table_writer.cpp:124: +2, including nesting penalty of 1, nesting level increased to 2

        RETURN_IF_ERROR(writer->write(block));
        ^

be/src/common/status.h:541: expanded from macro 'RETURN_IF_ERROR'

    do {                                \
    ^

be/src/vec/sink/writer/vhive_table_writer.cpp:124: +3, including nesting penalty of 2, nesting level increased to 3

        RETURN_IF_ERROR(writer->write(block));
        ^

be/src/common/status.h:543: expanded from macro 'RETURN_IF_ERROR'

        if (UNLIKELY(!_status_.ok())) { \
        ^

be/src/vec/sink/writer/vhive_table_writer.cpp:130: +1, including nesting penalty of 0, nesting level increased to 1

        for (int i = 0; i < block.rows(); ++i) {
        ^

be/src/vec/sink/writer/vhive_table_writer.cpp:134: +2, including nesting penalty of 1, nesting level increased to 2

            } catch (doris::Exception& e) {
              ^

be/src/vec/sink/writer/vhive_table_writer.cpp:141: nesting level increased to 2

                    [&](const std::string& partition_name, int position,
                    ^

be/src/vec/sink/writer/vhive_table_writer.cpp:147: +3, including nesting penalty of 2, nesting level increased to 3

                    RETURN_IF_ERROR(writer->open(_state, _profile));
                    ^

be/src/common/status.h:541: expanded from macro 'RETURN_IF_ERROR'

    do {                                \
    ^

be/src/vec/sink/writer/vhive_table_writer.cpp:147: +4, including nesting penalty of 3, nesting level increased to 4

                    RETURN_IF_ERROR(writer->open(_state, _profile));
                    ^

be/src/common/status.h:543: expanded from macro 'RETURN_IF_ERROR'

        if (UNLIKELY(!_status_.ok())) { \
        ^

be/src/vec/sink/writer/vhive_table_writer.cpp:153: +3, including nesting penalty of 2, nesting level increased to 3

                } catch (doris::Exception& e) {
                  ^

be/src/vec/sink/writer/vhive_table_writer.cpp:160: +2, including nesting penalty of 1, nesting level increased to 2

            if (writer_iter == _partitions_to_writers.end()) {
            ^

be/src/vec/sink/writer/vhive_table_writer.cpp:162: +3, including nesting penalty of 2, nesting level increased to 3

                if (_partitions_to_writers.size() + 1 > config::hive_sink_max_open_partitions) {
                ^

be/src/vec/sink/writer/vhive_table_writer.cpp:166: +3, including nesting penalty of 2, nesting level increased to 3

                RETURN_IF_ERROR(create_and_open_writer(partition_name, i, nullptr, 0, writer));
                ^

be/src/common/status.h:541: expanded from macro 'RETURN_IF_ERROR'

    do {                                \
    ^

be/src/vec/sink/writer/vhive_table_writer.cpp:166: +4, including nesting penalty of 3, nesting level increased to 4

                RETURN_IF_ERROR(create_and_open_writer(partition_name, i, nullptr, 0, writer));
                ^

be/src/common/status.h:543: expanded from macro 'RETURN_IF_ERROR'

        if (UNLIKELY(!_status_.ok())) { \
        ^

be/src/vec/sink/writer/vhive_table_writer.cpp:167: +1, nesting level increased to 2

            } else {
              ^

be/src/vec/sink/writer/vhive_table_writer.cpp:169: +3, including nesting penalty of 2, nesting level increased to 3

                if (writer_iter->second->written_len() > config::hive_sink_max_file_size) {
                ^

be/src/vec/sink/writer/vhive_table_writer.cpp:178: +4, including nesting penalty of 3, nesting level increased to 4

                    RETURN_IF_ERROR(create_and_open_writer(partition_name, i, &file_name,
                    ^

be/src/common/status.h:541: expanded from macro 'RETURN_IF_ERROR'

    do {                                \
    ^

be/src/vec/sink/writer/vhive_table_writer.cpp:178: +5, including nesting penalty of 4, nesting level increased to 5

                    RETURN_IF_ERROR(create_and_open_writer(partition_name, i, &file_name,
                    ^

be/src/common/status.h:543: expanded from macro 'RETURN_IF_ERROR'

        if (UNLIKELY(!_status_.ok())) { \
        ^

be/src/vec/sink/writer/vhive_table_writer.cpp:180: +1, nesting level increased to 3

                } else {
                  ^

be/src/vec/sink/writer/vhive_table_writer.cpp:184: +3, including nesting penalty of 2, nesting level increased to 3

                if (writer_pos_iter == writer_positions.end()) {
                ^

be/src/vec/sink/writer/vhive_table_writer.cpp:188: +1, nesting level increased to 3

                } else {
                  ^

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.63% (8880/24924)
Line Coverage: 27.36% (72893/266470)
Region Coverage: 26.53% (37695/142080)
Branch Coverage: 23.35% (19214/82292)
Coverage Report: http://coverage.selectdb-in.cc/coverage/d46dd700f0515891b91333f094c3c16ba8e8180d_d46dd700f0515891b91333f094c3c16ba8e8180d/report/index.html

@@ -1194,6 +1194,7 @@ DEFINE_mInt32(table_sink_partition_write_max_partition_nums_per_writer, "128");

/** Hive sink configurations **/
DEFINE_mInt64(hive_sink_max_file_size, "1073741824"); // 1GB
DEFINE_mInt32(hive_sink_max_open_partitions, "10000");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 10000 is too many? How about 2000?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will use table_sink_partition_write_max_partition_nums_per_writer

@kaka11chen kaka11chen marked this pull request as ready for review April 7, 2024 03:04
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@@ -68,114 +81,155 @@ Status VHiveTableWriter::open(RuntimeState* state, RuntimeProfile* profile) {
}

Status VHiveTableWriter::write(vectorized::Block& block) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: function 'write' has cognitive complexity of 79 (threshold 50) [readability-function-cognitive-complexity]

Status VHiveTableWriter::write(vectorized::Block& block) {
                         ^
Additional context

be/src/vec/sink/writer/vhive_table_writer.cpp:88: +1, including nesting penalty of 0, nesting level increased to 1

    if (_partition_columns_input_index.empty()) {
    ^

be/src/vec/sink/writer/vhive_table_writer.cpp:93: +2, including nesting penalty of 1, nesting level increased to 2

            if (writer_iter == _partitions_to_writers.end()) {
            ^

be/src/vec/sink/writer/vhive_table_writer.cpp:96: +3, including nesting penalty of 2, nesting level increased to 3

                } catch (doris::Exception& e) {
                  ^

be/src/vec/sink/writer/vhive_table_writer.cpp:100: +3, including nesting penalty of 2, nesting level increased to 3

                RETURN_IF_ERROR(writer->open(_state, _profile));
                ^

be/src/common/status.h:541: expanded from macro 'RETURN_IF_ERROR'

    do {                                \
    ^

be/src/vec/sink/writer/vhive_table_writer.cpp:100: +4, including nesting penalty of 3, nesting level increased to 4

                RETURN_IF_ERROR(writer->open(_state, _profile));
                ^

be/src/common/status.h:543: expanded from macro 'RETURN_IF_ERROR'

        if (UNLIKELY(!_status_.ok())) { \
        ^

be/src/vec/sink/writer/vhive_table_writer.cpp:101: +1, nesting level increased to 2

            } else {
              ^

be/src/vec/sink/writer/vhive_table_writer.cpp:102: +3, including nesting penalty of 2, nesting level increased to 3

                if (writer_iter->second->written_len() > config::hive_sink_max_file_size) {
                ^

be/src/vec/sink/writer/vhive_table_writer.cpp:113: +4, including nesting penalty of 3, nesting level increased to 4

                    } catch (doris::Exception& e) {
                      ^

be/src/vec/sink/writer/vhive_table_writer.cpp:117: +4, including nesting penalty of 3, nesting level increased to 4

                    RETURN_IF_ERROR(writer->open(_state, _profile));
                    ^

be/src/common/status.h:541: expanded from macro 'RETURN_IF_ERROR'

    do {                                \
    ^

be/src/vec/sink/writer/vhive_table_writer.cpp:117: +5, including nesting penalty of 4, nesting level increased to 5

                    RETURN_IF_ERROR(writer->open(_state, _profile));
                    ^

be/src/common/status.h:543: expanded from macro 'RETURN_IF_ERROR'

        if (UNLIKELY(!_status_.ok())) { \
        ^

be/src/vec/sink/writer/vhive_table_writer.cpp:118: +1, nesting level increased to 3

                } else {
                  ^

be/src/vec/sink/writer/vhive_table_writer.cpp:124: +2, including nesting penalty of 1, nesting level increased to 2

        RETURN_IF_ERROR(writer->write(block));
        ^

be/src/common/status.h:541: expanded from macro 'RETURN_IF_ERROR'

    do {                                \
    ^

be/src/vec/sink/writer/vhive_table_writer.cpp:124: +3, including nesting penalty of 2, nesting level increased to 3

        RETURN_IF_ERROR(writer->write(block));
        ^

be/src/common/status.h:543: expanded from macro 'RETURN_IF_ERROR'

        if (UNLIKELY(!_status_.ok())) { \
        ^

be/src/vec/sink/writer/vhive_table_writer.cpp:130: +1, including nesting penalty of 0, nesting level increased to 1

        for (int i = 0; i < block.rows(); ++i) {
        ^

be/src/vec/sink/writer/vhive_table_writer.cpp:134: +2, including nesting penalty of 1, nesting level increased to 2

            } catch (doris::Exception& e) {
              ^

be/src/vec/sink/writer/vhive_table_writer.cpp:141: nesting level increased to 2

                    [&](const std::string& partition_name, int position,
                    ^

be/src/vec/sink/writer/vhive_table_writer.cpp:147: +3, including nesting penalty of 2, nesting level increased to 3

                    RETURN_IF_ERROR(writer->open(_state, _profile));
                    ^

be/src/common/status.h:541: expanded from macro 'RETURN_IF_ERROR'

    do {                                \
    ^

be/src/vec/sink/writer/vhive_table_writer.cpp:147: +4, including nesting penalty of 3, nesting level increased to 4

                    RETURN_IF_ERROR(writer->open(_state, _profile));
                    ^

be/src/common/status.h:543: expanded from macro 'RETURN_IF_ERROR'

        if (UNLIKELY(!_status_.ok())) { \
        ^

be/src/vec/sink/writer/vhive_table_writer.cpp:153: +3, including nesting penalty of 2, nesting level increased to 3

                } catch (doris::Exception& e) {
                  ^

be/src/vec/sink/writer/vhive_table_writer.cpp:160: +2, including nesting penalty of 1, nesting level increased to 2

            if (writer_iter == _partitions_to_writers.end()) {
            ^

be/src/vec/sink/writer/vhive_table_writer.cpp:162: +3, including nesting penalty of 2, nesting level increased to 3

                if (_partitions_to_writers.size() + 1 >
                ^

be/src/vec/sink/writer/vhive_table_writer.cpp:168: +3, including nesting penalty of 2, nesting level increased to 3

                RETURN_IF_ERROR(create_and_open_writer(partition_name, i, nullptr, 0, writer));
                ^

be/src/common/status.h:541: expanded from macro 'RETURN_IF_ERROR'

    do {                                \
    ^

be/src/vec/sink/writer/vhive_table_writer.cpp:168: +4, including nesting penalty of 3, nesting level increased to 4

                RETURN_IF_ERROR(create_and_open_writer(partition_name, i, nullptr, 0, writer));
                ^

be/src/common/status.h:543: expanded from macro 'RETURN_IF_ERROR'

        if (UNLIKELY(!_status_.ok())) { \
        ^

be/src/vec/sink/writer/vhive_table_writer.cpp:169: +1, nesting level increased to 2

            } else {
              ^

be/src/vec/sink/writer/vhive_table_writer.cpp:171: +3, including nesting penalty of 2, nesting level increased to 3

                if (writer_iter->second->written_len() > config::hive_sink_max_file_size) {
                ^

be/src/vec/sink/writer/vhive_table_writer.cpp:180: +4, including nesting penalty of 3, nesting level increased to 4

                    RETURN_IF_ERROR(create_and_open_writer(partition_name, i, &file_name,
                    ^

be/src/common/status.h:541: expanded from macro 'RETURN_IF_ERROR'

    do {                                \
    ^

be/src/vec/sink/writer/vhive_table_writer.cpp:180: +5, including nesting penalty of 4, nesting level increased to 5

                    RETURN_IF_ERROR(create_and_open_writer(partition_name, i, &file_name,
                    ^

be/src/common/status.h:543: expanded from macro 'RETURN_IF_ERROR'

        if (UNLIKELY(!_status_.ok())) { \
        ^

be/src/vec/sink/writer/vhive_table_writer.cpp:182: +1, nesting level increased to 3

                } else {
                  ^

be/src/vec/sink/writer/vhive_table_writer.cpp:186: +3, including nesting penalty of 2, nesting level increased to 3

                if (writer_pos_iter == writer_positions.end()) {
                ^

be/src/vec/sink/writer/vhive_table_writer.cpp:190: +1, nesting level increased to 3

                } else {
                  ^

@kaka11chen
Copy link
Contributor Author

run buildall

@kaka11chen kaka11chen changed the title [Feature][Enhancement](hive-writer) Add hive-writer runtime profiles, change output file names and add hive_sink_max_open_partitions . [Feature][Enhancement](hive-writer) Add hive-writer runtime profiles, change output file names and Use table_sink_partition_write_max_partition_nums_per_writer. . Apr 7, 2024
@kaka11chen kaka11chen changed the title [Feature][Enhancement](hive-writer) Add hive-writer runtime profiles, change output file names and Use table_sink_partition_write_max_partition_nums_per_writer. . [Feature][Enhancement](hive-writer) Add hive-writer runtime profiles, change output file names and Use table_sink_partition_write_max_partition_nums_per_writer. Apr 7, 2024
Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 7, 2024
Copy link
Contributor

github-actions bot commented Apr 7, 2024

PR approved by at least one committer and no changes requested.

Copy link
Contributor

github-actions bot commented Apr 7, 2024

PR approved by anyone and no changes requested.

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.64% (8886/24934)
Line Coverage: 27.38% (72973/266546)
Region Coverage: 26.55% (37719/142092)
Branch Coverage: 23.36% (19228/82312)
Coverage Report: http://coverage.selectdb-in.cc/coverage/0237ca7e7156f0f84fcdad2e5d2c07cae48f8cd5_0237ca7e7156f0f84fcdad2e5d2c07cae48f8cd5/report/index.html

@morningman morningman merged commit 77e28b5 into apache:master Apr 7, 2024
29 of 34 checks passed
seawinde pushed a commit to seawinde/doris that referenced this pull request Apr 10, 2024
… change output file names (apache#33245)

Issue Number: apache#31442 

- Add hive-writer runtime profiles.
- Change output file names to `${query_id}${uuid}-${index}.${compression}.${format}`. e.g. `"d8735c6fa444a6d-acd392981e510c2b_34fbdcbb-b2e1-4f2c-b68c-a384238954a9-0.snappy.parquet"`. For the same partition writer, when the file size exceeds `hive_sink_max_file_size`, the currently written file will be closed and a new file will be generated, in which ${index} in the new file name will be incremented, while the rest will be the same .
morningman pushed a commit that referenced this pull request Apr 12, 2024
… change output file names (#33245)

Issue Number: #31442

- Add hive-writer runtime profiles.
- Change output file names to `${query_id}${uuid}-${index}.${compression}.${format}`. e.g. `"d8735c6fa444a6d-acd392981e510c2b_34fbdcbb-b2e1-4f2c-b68c-a384238954a9-0.snappy.parquet"`. For the same partition writer, when the file size exceeds `hive_sink_max_file_size`, the currently written file will be closed and a new file will be generated, in which ${index} in the new file name will be incremented, while the rest will be the same .
morningman added a commit that referenced this pull request Apr 13, 2024
yiguolei pushed a commit that referenced this pull request Apr 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.3-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants