Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] BitmapValue support copy on write #34047

Merged
merged 4 commits into from Nov 7, 2023

Conversation

trueeyu
Copy link
Contributor

@trueeyu trueeyu commented Oct 31, 2023

BitmapValue support copy on write.

CREATE TABLE `t1` (
  `c1` int(11) NULL COMMENT "",
  `c2` bitmap BITMAP_UNION NULL COMMENT ""
) ENGINE=OLAP 
AGGREGATE KEY(`c1`)
DISTRIBUTED BY HASH(`c1`) BUCKETS 1 
PROPERTIES (
"replication_num" = "1",
"in_memory" = "false",
"enable_persistent_index" = "false",
"replicated_storage" = "true",
"light_schema_change" = "true",
"compression" = "LZ4"
); 

CREATE TABLE `lineorder` (
  `lo_orderkey` int(11) NOT NULL COMMENT "",
  `lo_linenumber` int(11) NOT NULL COMMENT "",
  `lo_custkey` int(11) NOT NULL COMMENT "",
  `lo_partkey` int(11) NOT NULL COMMENT "",
  `lo_suppkey` int(11) NOT NULL COMMENT "",
  `lo_orderdate` int(11) NOT NULL COMMENT "",
  `lo_orderpriority` varchar(16) NOT NULL COMMENT "",
  `lo_shippriority` int(11) NOT NULL COMMENT "",
  `lo_quantity` int(11) NOT NULL COMMENT "",
  `lo_extendedprice` int(11) NOT NULL COMMENT "",
  `lo_ordtotalprice` int(11) NOT NULL COMMENT "",
  `lo_discount` int(11) NOT NULL COMMENT "",
  `lo_revenue` int(11) NOT NULL COMMENT "",
  `lo_supplycost` int(11) NOT NULL COMMENT "",
  `lo_tax` int(11) NOT NULL COMMENT "",
  `lo_commitdate` int(11) NOT NULL COMMENT "",
  `lo_shipmode` varchar(11) NOT NULL COMMENT ""
) ENGINE=OLAP 
DUPLICATE KEY(`lo_orderkey`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`lo_orderkey`) BUCKETS 192 
PROPERTIES (
"replication_num" = "1",
"colocate_with" = "groupa1",
"in_memory" = "false",
"enable_persistent_index" = "false",
"replicated_storage" = "true",
"compression" = "LZ4"
); 

CREATE TABLE `t2` (
  `c1` int(11) NULL COMMENT "",
  `c2` bitmap BITMAP_UNION NULL COMMENT "",
  `c3` bitmap BITMAP_UNION NULL COMMENT ""
) ENGINE=OLAP 
AGGREGATE KEY(`c1`)
DISTRIBUTED BY HASH(`c1`) BUCKETS 1 
PROPERTIES (
"replication_num" = "1",
"in_memory" = "false",
"enable_persistent_index" = "false",
"replicated_storage" = "true",
"light_schema_change" = "true",
"compression" = "LZ4"
); 

mysql> select c1, bitmap_count(c2) from t1;
+------+------------------+
| c1   | bitmap_count(c2) |
+------+------------------+
|    1 |         30000000 |
+------+------------------+
1 row in set (0.02 sec)

mysql> select count(*) from lineorder;
+-----------+
| count(*)  |
+-----------+
| 143999468 |
+-----------+
1 row in set (0.19 sec)

select count(*) from t2;
+----------+
| count(*) |
+----------+
|       64 |
+----------+
1 row in set (0.01 sec)

Time: 5.020s -> 0.043s
Mem: 7.372G -> 0.015G

select count(*) from lineorder join [broadcast] t1 on bitmap_contains(c2, lo_orderkey) where lo_orderkey<100;

Time: 10.684s -> 0.048s
Mem: 14.73sG -> 0.014G

select count(*) from t1 join [broadcast] lineorder on bitmap_contains(c2, lo_orderkey) where lo_orderkey<100;

Time: 4.251s -> 3.679s
Mem: 29M -> 19M

select count(*) from lineorder join [broadcast] t1 on lo_linenumber=c1 and bitmap_contains(c2, lo_orderkey);

Time: 7.618s -> 5.045s
Mem: 6.792G -> 3.640G

select count(*) from t1 join [broadcast] lineorder on lo_partkey=c1 and bitmap_contains(c2, lo_orderkey);

Time: 2.365s -> 2.316s
Mem: 29.358m -> 17.758m

select bitmap_count(bitmap_agg(lo_orderkey)) from lineorder;

Time: 4.5s -> 2.1s
Mem: 2.7G -> 2.7G

select count(bitmap_or(c2, c3)) from t2;

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 3.2
    • 3.1
    • 3.0
    • 2.5

Signed-off-by: trueeyu <lxhhust350@qq.com>
@trueeyu trueeyu changed the title [WIP] Bitmap opt 4 [Enhancement] BitmapValue support copy on write Nov 6, 2023
Signed-off-by: trueeyu <lxhhust350@qq.com>
Signed-off-by: trueeyu <lxhhust350@qq.com>
Signed-off-by: trueeyu <lxhhust350@qq.com>
Copy link

github-actions bot commented Nov 6, 2023

[FE Incremental Coverage Report]

pass : 0 / 0 (0%)

Copy link

github-actions bot commented Nov 6, 2023

[BE Incremental Coverage Report]

fail : 65 / 99 (65.66%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 src/column/map_column.cpp 0 1 00.00% [136]
🔵 src/column/binary_column.cpp 0 1 00.00% [93]
🔵 src/exec/cross_join_node.cpp 0 4 00.00% [189, 200, 261, 268]
🔵 src/column/fixed_length_column_base.cpp 0 1 00.00% [57]
🔵 src/exec/pipeline/nljoin/nljoin_probe_operator.cpp 0 2 00.00% [423, 426]
🔵 src/column/struct_column.cpp 0 2 00.00% [172, 177]
🔵 src/column/const_column.cpp 0 1 00.00% [51]
🔵 src/column/column.h 0 1 00.00% [174]
🔵 src/exec/pipeline/nljoin/spillable_nljoin_probe_operator.cpp 0 2 00.00% [106, 109]
🔵 src/column/nullable_column.cpp 0 4 00.00% [112, 122, 123, 127]
🔵 src/column/adaptive_nullable_column.cpp 0 2 00.00% [105, 107]
🔵 src/exprs/struct_functions.cpp 0 1 00.00% [36]
🔵 src/exec/join_hash_map.tpp 2 3 66.67% [618]
🔵 src/types/bitmap_value.cpp 59 70 84.29% [80, 83, 120, 121, 122, 477, 478, 479, 1144, 1145, 1146]
🔵 src/column/array_column.cpp 1 1 100.00% []
🔵 src/column/object_column.cpp 3 3 100.00% []


// Use shared_ptr, not unique_ptr, because we want to avoid unnecessary copy
std::shared_ptr<detail::Roaring64Map> _bitmap = nullptr;
std::unique_ptr<phmap::flat_hash_set<uint64_t>> _set;
uint64_t _sv = 0; // store the single value when _type == SINGLE
BitmapDataType _type{EMPTY};
mutable bool _shared = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use shared_counter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, let me test it

@stdpain stdpain merged commit 44ae317 into StarRocks:main Nov 7, 2023
55 of 56 checks passed
Copy link

github-actions bot commented Nov 7, 2023

@Mergifyio backport branch-3.2

@github-actions github-actions bot removed the 3.2 label Nov 7, 2023
Copy link

github-actions bot commented Nov 7, 2023

@Mergifyio backport branch-3.1

@github-actions github-actions bot removed the 3.1 label Nov 7, 2023
Copy link

github-actions bot commented Nov 7, 2023

@Mergifyio backport branch-3.0

mergify bot pushed a commit that referenced this pull request Nov 7, 2023
Signed-off-by: trueeyu <lxhhust350@qq.com>
(cherry picked from commit 44ae317)

# Conflicts:
#	be/src/column/nullable_column.cpp
#	be/src/types/bitmap_value.cpp
#	be/src/types/bitmap_value.h
#	be/test/types/bitmap_value_test.cpp
mergify bot pushed a commit that referenced this pull request Nov 7, 2023
Signed-off-by: trueeyu <lxhhust350@qq.com>
(cherry picked from commit 44ae317)

# Conflicts:
#	be/src/column/nullable_column.cpp
#	be/src/exprs/bitmap_functions.cpp
#	be/src/exprs/struct_functions.cpp
#	be/src/types/bitmap_value.cpp
#	be/src/types/bitmap_value.h
#	be/test/types/bitmap_value_test.cpp
mergify bot pushed a commit that referenced this pull request Nov 7, 2023
Signed-off-by: trueeyu <lxhhust350@qq.com>
(cherry picked from commit 44ae317)

# Conflicts:
#	be/src/column/nullable_column.cpp
#	be/src/column/object_column.cpp
#	be/src/exec/pipeline/nljoin/spillable_nljoin_probe_operator.cpp
#	be/src/exprs/struct_functions.cpp
#	be/src/exprs/vectorized/bitmap_functions.cpp
#	be/src/types/bitmap_value.cpp
#	be/src/types/bitmap_value.h
#	be/test/types/bitmap_value_test.cpp
trueeyu added a commit to trueeyu/starrocks that referenced this pull request Nov 14, 2023
Signed-off-by: trueeyu <lxhhust350@qq.com>
@trueeyu
Copy link
Contributor Author

trueeyu commented Nov 15, 2023

https://github.com/Mergifyio backport branch-3.2

Copy link
Contributor

mergify bot commented Nov 15, 2023

backport branch-3.2

✅ Backports have been created

trueeyu added a commit to trueeyu/starrocks that referenced this pull request Nov 15, 2023
Signed-off-by: trueeyu <lxhhust350@qq.com>
trueeyu added a commit to trueeyu/starrocks that referenced this pull request Nov 15, 2023
Signed-off-by: trueeyu <lxhhust350@qq.com>
trueeyu added a commit to trueeyu/starrocks that referenced this pull request Nov 15, 2023
Signed-off-by: trueeyu <lxhhust350@qq.com>
trueeyu added a commit to trueeyu/starrocks that referenced this pull request Nov 15, 2023
Signed-off-by: trueeyu <lxhhust350@qq.com>
trueeyu added a commit to trueeyu/starrocks that referenced this pull request Nov 15, 2023
Signed-off-by: trueeyu <lxhhust350@qq.com>
trueeyu added a commit that referenced this pull request Nov 15, 2023
Signed-off-by: trueeyu <lxhhust350@qq.com>
trueeyu added a commit that referenced this pull request Nov 15, 2023
Signed-off-by: trueeyu <lxhhust350@qq.com>
trueeyu added a commit that referenced this pull request Nov 15, 2023
Signed-off-by: trueeyu <lxhhust350@qq.com>
trueeyu added a commit that referenced this pull request Nov 15, 2023
Signed-off-by: trueeyu <lxhhust350@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants