Skip to content

Conversation

@mrhhsg
Copy link
Member

@mrhhsg mrhhsg commented Mar 13, 2024

Proposed changes

Issue Number: close #xxx

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

@mrhhsg
Copy link
Member Author

mrhhsg commented Mar 13, 2024

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.01% (8576/24499)
Line Coverage: 26.76% (69483/259689)
Region Coverage: 26.03% (36086/138653)
Branch Coverage: 22.99% (18428/80174)
Coverage Report: http://coverage.selectdb-in.cc/coverage/43c12d96ec343a08abc6b723bf1ca3eb2718989f_43c12d96ec343a08abc6b723bf1ca3eb2718989f/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 38412 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 43c12d96ec343a08abc6b723bf1ca3eb2718989f, data reload: false

------ Round 1 ----------------------------------
q1	17672	4457	4090	4090
q2	2027	156	143	143
q3	10600	1127	907	907
q4	7769	739	740	739
q5	7490	2802	2743	2743
q6	186	125	123	123
q7	1181	824	808	808
q8	9328	2007	2020	2007
q9	7200	6428	6460	6428
q10	8494	3490	3604	3490
q11	431	223	216	216
q12	694	307	299	299
q13	17879	2866	2861	2861
q14	276	239	258	239
q15	502	465	448	448
q16	500	389	393	389
q17	946	539	564	539
q18	7140	6459	6403	6403
q19	1528	1429	1479	1429
q20	539	280	288	280
q21	6166	3534	3575	3534
q22	369	306	297	297
Total cold run time: 108917 ms
Total hot run time: 38412 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4088	4066	4039	4039
q2	322	222	215	215
q3	2962	2890	2829	2829
q4	1834	1581	1556	1556
q5	5212	5206	5237	5206
q6	193	117	123	117
q7	2213	1873	1864	1864
q8	3147	3305	3286	3286
q9	8578	8540	8520	8520
q10	3682	3705	3672	3672
q11	534	430	427	427
q12	742	548	537	537
q13	16910	2849	2834	2834
q14	284	256	240	240
q15	478	439	459	439
q16	454	404	426	404
q17	1717	1487	1483	1483
q18	7567	7194	7045	7045
q19	1597	1477	1497	1477
q20	1917	1722	1714	1714
q21	4759	4658	4718	4658
q22	518	443	449	443
Total cold run time: 69708 ms
Total hot run time: 53005 ms

@mrhhsg
Copy link
Member Author

mrhhsg commented Mar 14, 2024

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.00% (8575/24499)
Line Coverage: 26.75% (69475/259689)
Region Coverage: 26.02% (36079/138653)
Branch Coverage: 22.98% (18421/80174)
Coverage Report: http://coverage.selectdb-in.cc/coverage/9bd5fd161d9115fc8a83c692b3b70786af8da8ba_9bd5fd161d9115fc8a83c692b3b70786af8da8ba/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 38238 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 9bd5fd161d9115fc8a83c692b3b70786af8da8ba, data reload: false

------ Round 1 ----------------------------------
q1	17669	4187	4129	4129
q2	2018	148	141	141
q3	10604	1108	877	877
q4	7782	745	673	673
q5	7474	2597	2648	2597
q6	183	122	120	120
q7	1203	828	781	781
q8	9329	1974	1978	1974
q9	7216	6392	6415	6392
q10	8507	3521	3615	3521
q11	433	221	212	212
q12	779	306	292	292
q13	18118	2863	2880	2863
q14	268	244	257	244
q15	507	452	450	450
q16	503	393	388	388
q17	948	576	558	558
q18	7091	6499	6465	6465
q19	2187	1395	1390	1390
q20	546	283	278	278
q21	6037	3597	3610	3597
q22	358	318	296	296
Total cold run time: 109760 ms
Total hot run time: 38238 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4112	4162	4086	4086
q2	321	223	217	217
q3	2918	2776	2813	2776
q4	1861	1519	1531	1519
q5	5224	5236	5226	5226
q6	191	116	123	116
q7	2208	1807	1845	1807
q8	3131	3269	3270	3269
q9	8537	8528	8507	8507
q10	3660	3653	3690	3653
q11	536	440	439	439
q12	721	558	564	558
q13	16917	2813	2831	2813
q14	268	267	246	246
q15	487	437	452	437
q16	455	418	421	418
q17	1734	1489	1450	1450
q18	7584	7093	7002	7002
q19	1609	1537	1576	1537
q20	1903	1696	1721	1696
q21	4791	4678	4643	4643
q22	541	437	466	437
Total cold run time: 69709 ms
Total hot run time: 52852 ms

@mrhhsg mrhhsg changed the title [improement](spill) optimize the spilling logic of hash join operator [improvement](spill) optimize the spilling logic of hash join operator Mar 14, 2024
struct PartitionedHashJoinSharedState : public HashJoinSharedState {
std::vector<std::unique_ptr<vectorized::MutableBlock>> partitioned_build_blocks;
std::vector<vectorized::SpillStreamSPtr> spilled_streams;
bool need_to_spill {false};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

= false we should unify the init operator

auto& block = partitioned_blocks[i];
if (block) {
if (block && block->rows() >= state->batch_size()) {
mem_size += block->allocated_bytes();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is not right for bitmap column, we should consider replace this logic.

bool PartitionedHashJoinProbeOperatorX::_should_revoke_memory(RuntimeState* state) const {
auto& local_state = get_local_state(state);

if (local_state._shared_state->need_to_spill) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build: 1 2 3 4
partition 1 --> memory

probe --> block ---> partition 4 ---> probe partition 1, left probe partition 2,3,4

memtable spill stream 的buffer

build --> sort

@mrhhsg
Copy link
Member Author

mrhhsg commented Mar 15, 2024

run buidall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Mar 19, 2024
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@yiguolei
Copy link
Contributor

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.31% (8728/24716)
Line Coverage: 27.12% (71387/263243)
Region Coverage: 26.38% (37049/140462)
Branch Coverage: 23.27% (18935/81378)
Coverage Report: http://coverage.selectdb-in.cc/coverage/120df9a7086b16441f2d7cf09529867ff48358f4_120df9a7086b16441f2d7cf09529867ff48358f4/report/index.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants