Skip to content

[Enhancement](udf) Reject bitmap, hll, and quantile_state in udf create#63849

Merged
zclllyybb merged 1 commit into
apache:masterfrom
linrrzqqq:pyudf-reject-obj-type
May 29, 2026
Merged

[Enhancement](udf) Reject bitmap, hll, and quantile_state in udf create#63849
zclllyybb merged 1 commit into
apache:masterfrom
linrrzqqq:pyudf-reject-obj-type

Conversation

@linrrzqqq
Copy link
Copy Markdown
Collaborator

Problem Summary:

UDF creation currently allows BITMAP, HLL, and QUANTILE_STATE in function signatures, but these object types are not exposed to Java/Python UDF runtimes as first-class values. They are effectively bridged as opaque bytes, and marked unsupported in doc

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@linrrzqqq
Copy link
Copy Markdown
Collaborator Author

run buildall

@linrrzqqq
Copy link
Copy Markdown
Collaborator Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed for PR 63849. I did not find a blocking correctness issue in the actual GitHub PR patch.

Critical checkpoint conclusions:

  • Goal/test: The PR rejects bitmap, HLL, and quantile_state for Java/Python UDF/UDAF/UDTF create paths. The new FE unit test and Python regression suites cover direct and nested object-type cases.
  • Scope/minimality: The implementation is small and localized to CreateFunctionCommand, with tests added for the new validation.
  • Concurrency/lifecycle: No new concurrency or lifecycle-sensitive behavior is introduced.
  • Config/compatibility/protocol: No new config, persisted format, or FE-BE protocol change is introduced.
  • Parallel paths: Java and Python paths both call the new validation; RPC behavior is unchanged.
  • Error handling: Validation raises AnalysisException and keeps failures at analysis time.
  • Test coverage: Coverage exists for Java/Python scalar and aggregate FE paths plus Python UDF/UDAF/UDTF regression cases. I did not run tests in this review runner.
  • Observability/transactions/data writes/memory: Not applicable to this PR.
  • Performance: The recursive type validation is bounded by type nesting and not on a query hot path.

User focus: No additional user-provided review focus was specified.

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31966 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit f1db522bbd4c81e627eb1dc4efec4330d18396bf, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17700	4097	4082	4082
q2	q3	10772	1428	848	848
q4	4689	472	338	338
q5	7575	2251	2107	2107
q6	248	180	138	138
q7	950	784	656	656
q8	9344	1694	1628	1628
q9	5541	4941	4997	4941
q10	6425	2308	1910	1910
q11	444	278	249	249
q12	702	420	296	296
q13	18209	3387	2787	2787
q14	271	261	253	253
q15	q16	818	776	711	711
q17	1012	973	966	966
q18	7066	5678	5611	5611
q19	1291	1344	1250	1250
q20	571	463	293	293
q21	6357	2895	2594	2594
q22	456	397	308	308
Total cold run time: 100441 ms
Total hot run time: 31966 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4915	4781	4753	4753
q2	q3	4937	5222	4620	4620
q4	2151	2214	1397	1397
q5	5029	4641	4749	4641
q6	250	194	132	132
q7	1921	1773	1582	1582
q8	2437	2138	2093	2093
q9	7976	7373	7448	7373
q10	4725	4641	4199	4199
q11	532	381	355	355
q12	722	738	522	522
q13	2998	3351	2818	2818
q14	269	288	248	248
q15	q16	674	697	612	612
q17	1289	1263	1258	1258
q18	7183	6808	6712	6712
q19	1163	1096	1095	1095
q20	2213	2207	1957	1957
q21	5259	4591	4434	4434
q22	524	455	398	398
Total cold run time: 57167 ms
Total hot run time: 51199 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 171440 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit f1db522bbd4c81e627eb1dc4efec4330d18396bf, data reload: false

query5	4324	664	527	527
query6	340	243	206	206
query7	4222	567	319	319
query8	338	242	218	218
query9	8784	4024	4018	4018
query10	445	349	294	294
query11	5788	2545	2255	2255
query12	186	127	123	123
query13	1323	598	436	436
query14	6153	5526	5140	5140
query14_1	4430	4434	4453	4434
query15	213	208	184	184
query16	1056	456	384	384
query17	1119	741	581	581
query18	2497	475	351	351
query19	228	197	165	165
query20	138	134	127	127
query21	221	136	115	115
query22	13608	13567	13362	13362
query23	17354	16599	16203	16203
query23_1	16275	16370	16322	16322
query24	7496	1789	1344	1344
query24_1	1336	1322	1327	1322
query25	574	499	454	454
query26	1308	340	186	186
query27	2674	578	348	348
query28	4447	2023	2019	2019
query29	1022	661	528	528
query30	309	243	206	206
query31	1144	1086	981	981
query32	93	80	75	75
query33	553	363	310	310
query34	1198	1142	653	653
query35	785	797	696	696
query36	1441	1408	1243	1243
query37	161	116	96	96
query38	3211	3144	3122	3122
query39	946	921	920	920
query39_1	901	890	872	872
query40	230	153	130	130
query41	73	69	69	69
query42	114	112	109	109
query43	347	335	304	304
query44	
query45	216	207	202	202
query46	1112	1230	763	763
query47	2427	2335	2285	2285
query48	410	431	319	319
query49	664	510	407	407
query50	1036	363	260	260
query51	4387	4361	4303	4303
query52	107	109	96	96
query53	255	290	214	214
query54	327	296	266	266
query55	96	97	87	87
query56	323	362	316	316
query57	1456	1428	1362	1362
query58	310	283	284	283
query59	1571	1670	1424	1424
query60	341	367	308	308
query61	165	155	154	154
query62	710	648	582	582
query63	241	203	218	203
query64	2439	816	636	636
query65	
query66	1719	483	352	352
query67	29756	29697	29514	29514
query68	
query69	470	345	313	313
query70	985	1010	983	983
query71	309	276	264	264
query72	3001	2733	2532	2532
query73	893	783	443	443
query74	5115	4957	4795	4795
query75	2698	2606	2259	2259
query76	2299	1154	781	781
query77	413	420	340	340
query78	12501	12507	11747	11747
query79	1465	1036	720	720
query80	1315	547	466	466
query81	504	290	249	249
query82	1371	159	124	124
query83	373	286	248	248
query84	256	138	112	112
query85	931	544	455	455
query86	453	348	309	309
query87	3446	3376	3235	3235
query88	3642	2744	2760	2744
query89	457	410	347	347
query90	1907	186	186	186
query91	182	182	141	141
query92	78	77	77	77
query93	1501	1405	881	881
query94	702	354	337	337
query95	706	382	359	359
query96	1014	795	366	366
query97	2735	2726	2588	2588
query98	232	225	236	225
query99	1323	1132	1014	1014
Total cold run time: 255487 ms
Total hot run time: 171440 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 95.45% (21/22) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 54.55% (12/22) 🎉
Increment coverage report
Complete coverage report

Copy link
Copy Markdown
Contributor

@HappenLee HappenLee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions Bot added approved Indicates a PR has been approved by one committer. reviewed labels May 29, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

@zclllyybb zclllyybb merged commit aa68e4b into apache:master May 29, 2026
33 checks passed
@linrrzqqq linrrzqqq deleted the pyudf-reject-obj-type branch May 29, 2026 03:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. kind/behavior-changed reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants