[SPARK-56717][PYTHON][TESTS] Add ASV microbenchmark for SQL_ARROW_TABLE_UDF by Yicong-Huang · Pull Request #55673 · apache/spark

Yicong-Huang · 2026-05-04T20:17:42Z

What changes were proposed in this pull request?

Add an ASV micro-benchmark for SQL_ARROW_TABLE_UDF (Python UDTF with useArrow=True) eval type to bench_eval_type.py.

The new benchmark drives the worker through the UDTF wire protocol (separate from SQL_*_UDF: no num_udfs/num_chained/result_id; instead num_partition_child_indexes, optional pickled AnalyzeResult, the handler class, return-type JSON, and udtf name). It also threads input_type through EvalConf so the non-legacy Arrow code path is exercised.

Supporting changes in MockProtocolWriter:

write_worker_input accepts an optional eval_conf dict alongside runner_conf.
New write_udtf_payload for the UDTF-specific command frame.

UDTFs covered: identity_udtf (1->1), explode_udtf (1->3), filter_udtf (1->0/1), stringify_udtf (1->1, type change). Row counts are scaled down vs SQL_ARROW_BATCHED_UDF because the worker calls LocalDataToArrowConversion.convert once per input row in this path.

Why are the changes needed?

Part of SPARK-55724. Establishes a performance baseline before refactoring SQL_ARROW_TABLE_UDF.

Does this PR introduce any user-facing change?

No

How was this patch tested?

COLUMNS=120 ./python/asv run --python=same --bench "ArrowTableUDF" -a "repeat=(3,5,5.0)" (one of two stable runs):

ArrowTableUDFTimeBench:

=================== =============== ============== ============= ================
--                                              udtf
------------------- -------------------------------------------------------------
     scenario        identity_udtf   explode_udtf   filter_udtf   stringify_udtf
=================== =============== ============== ============= ================
 sm_batch_few_col       158+/-1ms      159+/-1ms      121+/-2ms     156+/-0.7ms
 sm_batch_many_col     52.6+/-0.2ms    54.3+/-2ms     44.4+/-2ms    53.1+/-0.7ms
 lg_batch_few_col       393+/-2ms      400+/-7ms      303+/-1ms      390+/-1ms
 lg_batch_many_col      208+/-1ms      215+/-6ms     172+/-0.9ms     208+/-3ms
 pure_ints              395+/-2ms      404+/-3ms      309+/-2ms      395+/-3ms
 pure_strings           412+/-2ms      420+/-7ms      328+/-2ms      411+/-3ms
=================== =============== ============== ============= ================

ArrowTableUDFPeakmemBench:

=================== =============== ============== ============= ================
--                                              udtf
------------------- -------------------------------------------------------------
     scenario        identity_udtf   explode_udtf   filter_udtf   stringify_udtf
=================== =============== ============== ============= ================
 sm_batch_few_col        464M           464M           464M           464M
 sm_batch_many_col       464M           464M           464M           464M
 lg_batch_few_col        466M           466M           465M           466M
 lg_batch_many_col       468M           468M           468M           468M
 pure_ints               466M           466M           466M           466M
 pure_strings            466M           467M           466M           466M
=================== =============== ============== ============= ================

Was this patch authored or co-authored using generative AI tooling?

No

devin-petersohn

LGTM

Yicong-Huang added 2 commits May 4, 2026 18:21

test: add ASV microbenchmark for SQL_ARROW_TABLE_UDF

ff21b21

test: scale down SQL_ARROW_TABLE_UDF bench scenarios for stable repeat=3

85c1b89

devin-petersohn approved these changes May 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56717][PYTHON][TESTS] Add ASV microbenchmark for SQL_ARROW_TABLE_UDF#55673

[SPARK-56717][PYTHON][TESTS] Add ASV microbenchmark for SQL_ARROW_TABLE_UDF#55673
Yicong-Huang wants to merge 2 commits intoapache:masterfrom
Yicong-Huang:SPARK-56717

Yicong-Huang commented May 4, 2026 •

edited

Loading

Uh oh!

devin-petersohn left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Yicong-Huang commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

devin-petersohn left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Yicong-Huang commented May 4, 2026 •

edited

Loading