Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Add hive udf for handle bitmap types (backport #36949) #40165

Merged
merged 2 commits into from Jan 28, 2024

Conversation

mergify[bot]
Copy link
Contributor

@mergify mergify bot commented Jan 27, 2024

This is an automatic backport of pull request #36949 done by Mergify.
Cherry-pick of 1c06cdd has failed:

On branch mergify/bp/branch-2.5/pr-36949
Your branch is up to date with 'origin/branch-2.5'.

You are currently cherry-picking commit 1c06cddcfa.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	new file:   fe/hive-udf/pom.xml
	new file:   fe/hive-udf/src/main/java/com/starrocks/hive/udf/UDAFBitmapAgg.java
	new file:   fe/hive-udf/src/main/java/com/starrocks/hive/udf/UDFBitmapCount.java
	new file:   fe/hive-udf/src/main/java/com/starrocks/hive/udf/UDFBitmapFromString.java
	new file:   fe/hive-udf/src/main/java/com/starrocks/hive/udf/UDFBitmapToString.java
	modified:   fe/plugin-common/src/test/java/com/starrocks/types/BitmapValueTest.java
	new file:   fe/plugin-common/src/test/java/com/starrocks/types/Roaring64MapTest.java
	modified:   fe/pom.xml

Unmerged paths:
  (use "git add <file>..." to mark resolution)
	both modified:   build.sh

To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/github/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally


Mergify commands and options

More conditions and actions can be found in the documentation.

You can also trigger Mergify actions by commenting on this pull request:

  • @Mergifyio refresh will re-evaluate the rules
  • @Mergifyio rebase will rebase this PR on its base branch
  • @Mergifyio update will merge the base branch into this PR
  • @Mergifyio backport <destination> will backport this PR on <destination> branch

Additionally, on Mergify dashboard you can:

  • look at your merge queues
  • generate the Mergify configuration with the config editor.

Finally, you can contact us on https://mergify.com

Why I'm doing:

Users often have this need:

  1. Build the bitmap in hive/spark and then import it into StarRocks to reduce the construction pressure on StarRocks.
  2. After building the bitmap in StarRocks, export it into a parquet file for use by hive/spark.

Hive can use the udf to import/export/process bitmap.

How to use:

# create table in starrocks
CREATE TABLE `t1` (
  `c1` int(11) NULL COMMENT "",
  `c2` bitmap BITMAP_UNION NULL COMMENT ""
) ENGINE=OLAP 
AGGREGATE KEY(`c1`)
DISTRIBUTED BY HASH(`c1`)
PROPERTIES (
"replication_num" = "1",
"in_memory" = "false",
"enable_persistent_index" = "false",
"replicated_storage" = "true",
"fast_schema_evolution" = "true",
"compression" = "LZ4"
);

# prepare data
mysql> select c1, bitmap_to_string(c2) from t1;
+------+-------------------------------------------------------------------------------------------------------------+
| c1   | bitmap_to_string(c2)                                                                                        |
+------+-------------------------------------------------------------------------------------------------------------+
|    1 | 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39 |
+------+-------------------------------------------------------------------------------------------------------------+
1 row in set (0.02 sec)


# write to hdfs
 insert into files("path"="hdfs://xxx:9000/user/hive/warehouse/lxh.db/tmp/", "format"="parquet", "compression" = "uncompressed") select c1, bitmap_to_binary(c2) as c2 from t1;

# create table in hive
hive> create table t1(c1 int, c2 binary) stored as parquet;
OK

# load parquet file to hive table
load data inpath 'hdfs://xxxx:9000/user/hive/warehouse/lxh.db/tmp/data_1c3aff0b-9991-11ee-80ba-2ea9721a9a2d_0_1.parquet' into table lxh.t1;

# add udf.jar to hive
hive> add jar hdfs://xxxx:9000/user/hive/warehouse/lxh.db/hive-udf-1.0.0.jar;

# create hive udf
hive> create temporary function  bitmap_to_string as 'com.starrocks.hive.udf.UDFBitmapToString';
OK
Time taken: 0.091 seconds
hive> create temporary function  bitmap_count as 'com.starrocks.hive.udf.UDFBitmapCount';
OK
Time taken: 0.075 seconds
hive> create temporary function  bitmap_agg as 'com.starrocks.hive.udf.UDAFBitmapAgg';
OK
Time taken: 0.074 seconds
hive> create temporary function  bitmap_from_string as 'com.starrocks.hive.udf.UDFBitmapFromString';
OK
Time taken: 0.074 seconds

# use udf in hive
hive> select c1, bitmap_count(c2) from t1;
1       39

select c1, bitmap_to_string(c2) from t1;
1       1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39

# use bitmap_agg to generate bitmap
hive> select * from t2;
1       1
2       2
3       3

hive> select bitmap_to_string(bitmap_agg(c2)) from t2;
1,2,3

What I'm doing:

Add hive udf to handle bitmap type.

TODO 1: Currently the test framework don't support hive udf, i will add the test case to StarRocksTest framework later.
TODO 2: Add more bitmap function (BitmapAnd, BitmapOr, BitmapXor...)

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

Signed-off-by: trueeyu <lxhhust350@qq.com>
(cherry picked from commit 1c06cdd)

# Conflicts:
#	build.sh
@mergify mergify bot added the conflicts label Jan 27, 2024
@wanpengfei-git wanpengfei-git enabled auto-merge (squash) January 27, 2024 15:55
@mergify mergify bot closed this Jan 27, 2024
auto-merge was automatically disabled January 27, 2024 15:55

Pull request was closed

Copy link
Contributor Author

mergify bot commented Jan 27, 2024

@mergify[bot]: Backport conflict, please reslove the conflict and resubmit the pr

@mergify mergify bot deleted the mergify/bp/branch-2.5/pr-36949 branch January 27, 2024 15:56
@trueeyu trueeyu restored the mergify/bp/branch-2.5/pr-36949 branch January 27, 2024 15:56
Signed-off-by: trueeyu <lxhhust350@qq.com>
@trueeyu trueeyu reopened this Jan 27, 2024
@wanpengfei-git wanpengfei-git enabled auto-merge (squash) January 27, 2024 16:05
Copy link

sonarcloud bot commented Jan 27, 2024

Quality Gate Passed Quality Gate passed

The SonarCloud Quality Gate passed, but some issues were introduced.

12 New issues
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

Copy link

[FE Incremental Coverage Report]

pass : 0 / 0 (0%)

Copy link

[BE Incremental Coverage Report]

pass : 0 / 0 (0%)

@wanpengfei-git wanpengfei-git merged commit eaadece into branch-2.5 Jan 28, 2024
38 of 40 checks passed
@wanpengfei-git wanpengfei-git deleted the mergify/bp/branch-2.5/pr-36949 branch January 28, 2024 09:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants