Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support specify bloom index columns #12048

Merged
merged 3 commits into from Jul 14, 2023

Conversation

zhyass
Copy link
Member

@zhyass zhyass commented Jul 10, 2023

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

Add bloom_index_columns option for fuse engine. Support for specifying bloom index columns.

To create table with bloom index:

CREATE TABLE table_name (
  column_name1 column_type1,
  column_name2 column_type2,
  ...
) ... bloom_index_columns='columnName1[, ...]'

To create or modify bloom index for a existing table:
The existing Bloom index options will be replaced by the new options, it does not create Bloom filters for existing data.

ALTER TABLE <db.table_name> SET OPTIONS(bloom_index_columns='columnName1[, ...]');

To disable the bloom index:

ALTER TABLE <db.table_name> SET OPTIONS(bloom_index_columns='');
  • Closes #issue

@vercel
Copy link

vercel bot commented Jul 10, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
databend ⬜️ Ignored (Inspect) Visit Preview Jul 14, 2023 4:24am

@zhyass zhyass marked this pull request as draft July 10, 2023 17:59
@github-actions github-actions bot added the pr-feature this PR introduces a new feature to the codebase label Jul 10, 2023
@sundy-li
Copy link
Member

What's the real usage of this pr? I did not get it.

@zhyass zhyass marked this pull request as ready for review July 12, 2023 17:14
@zhyass zhyass changed the title feat: support alter bloom index columns feat: support specify bloom index columns Jul 12, 2023
@zhyass

This comment was marked as outdated.

@dantengsky
Copy link
Member

dantengsky commented Jul 13, 2023

What's the real usage of this pr? I did not get it.

bloom filter creation takes a significant amount of CPU resources.

img_v2_938887ae-79cd-4cdd-a2aa-2184d747918g

for some "wide" tables with lots of columns

  • only a few of them involved in point query
  • and the data ingestion is heavy (and time-critical somehow)

Creating bloom indexes for all the columns is wasteful.

@sundy-li
Copy link
Member

Creating bloom indexes for all the columns is wasteful.

I agree with that, so let's disable this creation by default?

@dantengsky
Copy link
Member

Creating bloom indexes for all the columns is wasteful.

I agree with that, so let's disable this creation by default?

I am not sure about this.

maybe for those tables that are not heavily appended(and the speed of ingestion is not that critical), and users are not sure about which columns might be involved in the point query, enabling bloom filters by default seems to be not a bad idea. I do not if "those tables" are the common cases though.

@BohuTANG
Copy link
Member

I think this PR only for some special case.
Another question: if we rename a column who is in the options, what will happen?

@zhyass
Copy link
Member Author

zhyass commented Jul 13, 2023

I think this PR only for some special case. Another question: if we rename a column who is in the options, what will happen?

Fix @BohuTANG , drop column will remove the droped column from the bloom index columns. Rename column will replace the bloom index columns with new column name.

mysql> create table t(a int, b int, c int) bloom_index_columns='a,b,c';
Query OK, 0 rows affected (0.03 sec)

mysql> alter table t drop column c;
Query OK, 0 rows affected (0.02 sec)

mysql> set hide_options_in_show_create_table=0;
Query OK, 0 rows affected (0.00 sec)

mysql> show create table t;
+-------+-----------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                |
+-------+-----------------------------------------------------------------------------------------------------------------------------+
| t     | CREATE TABLE `t` (
  `a` INT,
  `b` INT
) ENGINE=FUSE BLOOM_INDEX_COLUMNS='a,b' COMPRESSION='zstd' STORAGE_FORMAT='parquet' |
+-------+-----------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.02 sec)
Read 0 rows, 0.00 B in 0.012 sec., 0 rows/sec., 0.00 B/sec.

mysql> alter table t rename column b to c;
Query OK, 0 rows affected (0.11 sec)

mysql> show create table t;
+-------+-----------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                |
+-------+-----------------------------------------------------------------------------------------------------------------------------+
| t     | CREATE TABLE `t` (
  `a` INT,
  `c` INT
) ENGINE=FUSE BLOOM_INDEX_COLUMNS='a,c' COMPRESSION='zstd' STORAGE_FORMAT='parquet' |
+-------+-----------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.06 sec)
Read 0 rows, 0.00 B in 0.049 sec., 0 rows/sec., 0.00 B/sec.

@BohuTANG BohuTANG added the ci-cloud Build docker image for cloud test label Jul 13, 2023
@github-actions
Copy link
Contributor

Docker Image for PR

  • tag: pr-12048-4295be3

note: this image tag is only available for internal use,
please check the internal doc for more details.

@BohuTANG
Copy link
Member

Conflicting files:
src/query/service/src/interpreters/interpreter_table_create.rs

Copy link
Member

@dantengsky dantengsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Added some comments to the test scripts, hope it helps

zhyass and others added 2 commits July 14, 2023 12:24
…r.test

Co-authored-by: dantengsky <dantengsky@gmail.com>
…r.test

Co-authored-by: dantengsky <dantengsky@gmail.com>
@dantengsky dantengsky merged commit c5e012e into datafuselabs:main Jul 14, 2023
56 checks passed
andylokandy pushed a commit to andylokandy/databend that referenced this pull request Nov 27, 2023
* feat: support specify bloom index columns

* Update tests/sqllogictests/suites/mode/standalone/explain/bloom_filter.test

Co-authored-by: dantengsky <dantengsky@gmail.com>

* Update tests/sqllogictests/suites/mode/standalone/explain/bloom_filter.test

Co-authored-by: dantengsky <dantengsky@gmail.com>

---------

Co-authored-by: dantengsky <dantengsky@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-cloud Build docker image for cloud test pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants