Skip to content

[Enhancement] [Vectorized]Some little optimization in SegmentIterator vectorization #7771

@wangbo

Description

@wangbo

Search before asking

  • I had searched in the issues and found no similar issues.

Description

After SegmentIterator Vectorization PR merged, there is still some todo for it;
This ISSUE tried to solve some performance problems.

Solution

Test SQL

SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue
FROM lineorder_flat
WHERE LO_ORDERDATE >= 19930101 and LO_ORDERDATE <= 19931231 AND LO_DISCOUNT BETWEEN 1 AND 3 AND LO_QUANTITY < 25;

Initial performance test:

code version:SegmentIterator row version
- BlockLoadTime: 3s687ms
- VectorPredEvalTime: 778.640ms
- BlockSeekCount: 5.36M


code version: SegmentIterator vectorization
- BlockLoadTime: 4s140ms
- VectorPredEvalTime: 256.926ms
- BlockSeekCount: 5.36M


Analysis
1 After SegIter is vectorized, the performance is dropped.
2 The predicate calculation performance is indeed improved, but the overall impact is not large
3 BlockSeekCount is too big, it can be optimized.

Optimization 1: remove timer BlockSeekTime

  • BlockLoadTime: 3s512ms

Optimization 2(based on opt 1): Batch insert column vector in BitShufflePageDecoder.next_batch

  • BlockLoadTime: 3s105ms

Optimization 3(based on opt1, opt2): eliminate lazy materialization

  • BlockLoadTime: 2s641ms
  • BlockSeekCount: 175.02K
    We can see BlockSeekCount reduced much.

Optimization 4(based on op1, opt2, opt3): set doris_scanner_thread_pool_thread_num = 1

  • BlockLoadTime: 1s665ms
    Performance is further improved, but the whole sql may cost more time.
    Then I wonder whether original version has the same problem

Origin Version Test: set doris_scanner_thread_pool_thread_num = 1 vs default value

set doris_scanner_thread_pool_thread_num = false value
- BlockLoadTime: 3s571ms

set doris_scanner_thread_pool_thread_num = 1
- BlockLoadTime: 2s232ms

We can see that the origin version has the same problem, this may be related to memory allocation under multithreading, this need further research.

I will submit a PR for opt1, opt2, opt3

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions