Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking: bitmap functions #11219

Closed
25 tasks done
ariesdevil opened this issue Apr 24, 2023 · 14 comments
Closed
25 tasks done

Tracking: bitmap functions #11219

ariesdevil opened this issue Apr 24, 2023 · 14 comments
Labels
good first issue Category: good first issue

Comments

@ariesdevil
Copy link
Collaborator

ariesdevil commented Apr 24, 2023

Summary
Scalar Functions:

Aggregate Functions:

  • bitmap_union
  • bitmap_and_count
  • bitmap_and_not_count
  • bitmap_xor_count
  • bitmap_or_count
  • intersect_count
  • bitmap_interssect
@ariesdevil ariesdevil mentioned this issue Apr 24, 2023
4 tasks
@ariesdevil ariesdevil added the good first issue Category: good first issue label Apr 25, 2023
@byrantwithyou
Copy link

Hi, I'm new to Databend and I want to contribute. I have a working knowledge of database kernel and Rust. Could you assign some of the tasks in this list to me?

@byrantwithyou
Copy link

byrantwithyou commented May 1, 2023

Hi, I'm new to Databend and I want to contribute. I have a working knowledge of database kernel and Rust. Could you assign some of the tasks in this list to me?

If nobody is doing the same thing, I want to do bitmap_or, bitmap_and, bitmap_xor, bitmap_not, bitmap_and_not together first, because they are kinda similar.

@ariesdevil
Copy link
Collaborator Author

@byrantwithyou Thanks for being interested in these funcs, When things are done, I'll put the pr address after the corresponding function.

@silver-ymz
Copy link
Contributor

Hello, I'm new to Databend and want to have a try for this. I want to try bitmap_contains, bitmap_has_all, bitmap_has_any, bitmap_max, bitmap_min together firstly. Could you assign them to me?

@byrantwithyou
Copy link

I'm so sorry that I couldn't do this task because I'm a little bit busy these days. Please assign it to anyone else.

@gitccl
Copy link
Contributor

gitccl commented May 16, 2023

Hi, I'd like to try bitmap_or, bitmap_and, bitmap_xor, bitmap_not, bitmap_and_not.

@gitccl
Copy link
Contributor

gitccl commented May 19, 2023

Hi @sundy-li , Can bitmap_hash use ahash as the hash function?

@sundy-li
Copy link
Member

Can bitmap_hash use ahash as the hash function?

Yes, I don't think we should add the extra function bitmap_hash , we can support bitmap type in other hash functions.

@gitccl
Copy link
Contributor

gitccl commented May 20, 2023

It seems that bitmap_hash isn't the hash of bitmap type, instead it hashes any value, and then constructs a bitmap with the hash value.

@liangjiawei1110
Copy link

Hello, I am new to Databend and I'd like to try bitmap_subset_limit , bitmap_subset_in_range , sub_bitmap.

@ariesdevil
Copy link
Collaborator Author

ariesdevil commented May 24, 2023

Hello, I am new to Databend and I'd like to try bitmap_subset_limit , bitmap_subset_in_range , sub_bitmap.↳

Hi @liangjiawei1110 , these three have been implemented by @Mehrbod2002 , you may take others:)

@akoshchiy
Copy link
Contributor

@ariesdevil Hello! I want to complete the list and have some questions:

  1. As I understand, intersect_count is the alias of bitmap_and_count, isn't it?
  2. Is there any explanation/examples of api and behaving orthogonal_* functions?

@ariesdevil
Copy link
Collaborator Author

ariesdevil commented Jul 3, 2023

Hi @akoshchiy , thanks for your contribution.

  1. As I understand, intersect_count is the alias of bitmap_and_count, isn't it?

The intersect_count is diff from bitmap_and_count, the syntax is INTERSECT_COUNT(bitmap_column, column_to_filter, filter_values).

  1. bitmap_column: The bitmap column. This is the bitmap data you want to perform intersection operations on.
  2. column_to_filter: The dimension column for filtering. This is the column you want to filter the bitmap data based on certain conditions.
  3. filter_values: The values of the filter dimension column. This variable-length parameter represents different values of the dimension column to filter.

The purpose of the INTERSECT_COUNT function is to first filter the bitmap data based on the column_to_filter and filter_values, then calculate the intersection count between the filtered bitmaps. This can help you find the number of data points satisfying specific conditions within a larger dataset.

bendsql :) select tag, bitmap_to_string(user_id) from pv_bitmap where tag in (3, 4);
+------+-----------------------------+
| tag  | bitmap_to_string(`user_id`) |
+------+-----------------------------+
|    4 | 1,2,3                       |
|    3 | 1,2,3,4,5                   |
+------+-----------------------------+
2 rows in set (0.012 sec)

bendsql :) select intersect_count(user_id, tag, 3, 4) from pv_bitmap;
+----------------------------------------+
| intersect_count(`user_id`, `tag`, 3, 4)|
+----------------------------------------+
|                                      3 |
+----------------------------------------+
1 row in set (0.014 sec)
  1. Is there any explanation/examples of api and behaving orthogonal_* functions?

The orthogonal_* prefixed functions are just same as the original functions without this prefix, except using a different approach to calculation, we not need them now, so I deleted them.

@ariesdevil
Copy link
Collaborator Author

We made it, thank you all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Category: good first issue
Projects
None yet
Development

No branches or pull requests

7 participants