Skip to content

Comments

[enhancement] optmize 2 cases in seg_iter: all/none rows passed predicate#10259

Merged
yiguolei merged 2 commits intoapache:masterfrom
englefly:opt_vec_pred
Jun 20, 2022
Merged

[enhancement] optmize 2 cases in seg_iter: all/none rows passed predicate#10259
yiguolei merged 2 commits intoapache:masterfrom
englefly:opt_vec_pred

Conversation

@englefly
Copy link
Contributor

@englefly englefly commented Jun 20, 2022

Proposed changes

After vec_predicate, doris generates the array sel_rowid_idx for selected rows from 0/1 array ret_flags.
This pr implements a short cut for 2 cases:

  • all rows are passed predicate
  • none row is passed
    Furthermore, if the 32-char part of ret_flags is 0 or 0xfffffff, the implementation also optimized.

test results:
select count(LO_ORDERDATE) from lineorder where LO_ORDERDATE >=19920101;
ssb100G
before:3.74sec
after: 3.37sec

Issue Number: close #xxx

Problem Summary:

Describe the overview of changes.

Checklist(Required)

  1. Does it affect the original behavior: (Yes/No/I Don't know)
  2. Has unit tests been added: (Yes/No/No Need)
  3. Has document been added or modified: (Yes/No/No Need)
  4. Does it need to update dependencies: (Yes/No)
  5. Are there any changes that cannot be rolled back: (Yes/No)

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@englefly englefly marked this pull request as ready for review June 20, 2022 07:07
void SegmentIterator::_evaluate_vectorization_predicate(uint16_t* sel_rowid_idx,
uint16_t& selected_size) {
SCOPED_RAW_TIMER(&_opts.stats->vec_cond_ns);
if (!_is_need_vec_eval) {
Copy link
Contributor

@yiguolei yiguolei Jun 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If _pre_eval_block_predicate is empty, the original code is faster. Because it does not need init ret_flags and call memset to init it and call simd count zero num to calculate it.

size_t num_zeros = simd::count_zero_num(reinterpret_cast<int8_t*>(ret_flags), original_size);
if (0 == num_zeros) {
for (uint16_t i = 0; i < original_size; i++) {
sel_rowid_idx[i] = i;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“0 == num_zeros ” means all rows passed the predicate, we should set all of them in sel_rowid_idx array

yiguolei
yiguolei previously approved these changes Jun 20, 2022
Copy link
Contributor

@yiguolei yiguolei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@yiguolei yiguolei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yiguolei yiguolei merged commit c3743ec into apache:master Jun 20, 2022
freesinger pushed a commit to freesinger/incubator-doris that referenced this pull request Jun 21, 2022
…cate (apache#10259)

* [enhancement] optmize 2 cases: all/none rows passed predicate in seg_iter.

* format
@englefly englefly deleted the opt_vec_pred branch August 5, 2022 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants