Skip to content

[SPARK-56717][PYTHON][TESTS] Add ASV microbenchmark for SQL_ARROW_TABLE_UDF#55673

Open
Yicong-Huang wants to merge 2 commits intoapache:masterfrom
Yicong-Huang:SPARK-56717
Open

[SPARK-56717][PYTHON][TESTS] Add ASV microbenchmark for SQL_ARROW_TABLE_UDF#55673
Yicong-Huang wants to merge 2 commits intoapache:masterfrom
Yicong-Huang:SPARK-56717

Conversation

@Yicong-Huang
Copy link
Copy Markdown
Contributor

@Yicong-Huang Yicong-Huang commented May 4, 2026

What changes were proposed in this pull request?

Add an ASV micro-benchmark for SQL_ARROW_TABLE_UDF (Python UDTF with useArrow=True) eval type to bench_eval_type.py.

The new benchmark drives the worker through the UDTF wire protocol (separate from SQL_*_UDF: no num_udfs/num_chained/result_id; instead num_partition_child_indexes, optional pickled AnalyzeResult, the handler class, return-type JSON, and udtf name). It also threads input_type through EvalConf so the non-legacy Arrow code path is exercised.

Supporting changes in MockProtocolWriter:

  • write_worker_input accepts an optional eval_conf dict alongside runner_conf.
  • New write_udtf_payload for the UDTF-specific command frame.

UDTFs covered: identity_udtf (1->1), explode_udtf (1->3), filter_udtf (1->0/1), stringify_udtf (1->1, type change). Row counts are scaled down vs SQL_ARROW_BATCHED_UDF because the worker calls LocalDataToArrowConversion.convert once per input row in this path.

Why are the changes needed?

Part of SPARK-55724. Establishes a performance baseline before refactoring SQL_ARROW_TABLE_UDF.

Does this PR introduce any user-facing change?

No

How was this patch tested?

COLUMNS=120 ./python/asv run --python=same --bench "ArrowTableUDF" -a "repeat=(3,5,5.0)" (one of two stable runs):

ArrowTableUDFTimeBench:

=================== =============== ============== ============= ================
--                                              udtf
------------------- -------------------------------------------------------------
     scenario        identity_udtf   explode_udtf   filter_udtf   stringify_udtf
=================== =============== ============== ============= ================
 sm_batch_few_col       158+/-1ms      159+/-1ms      121+/-2ms     156+/-0.7ms
 sm_batch_many_col     52.6+/-0.2ms    54.3+/-2ms     44.4+/-2ms    53.1+/-0.7ms
 lg_batch_few_col       393+/-2ms      400+/-7ms      303+/-1ms      390+/-1ms
 lg_batch_many_col      208+/-1ms      215+/-6ms     172+/-0.9ms     208+/-3ms
 pure_ints              395+/-2ms      404+/-3ms      309+/-2ms      395+/-3ms
 pure_strings           412+/-2ms      420+/-7ms      328+/-2ms      411+/-3ms
=================== =============== ============== ============= ================

ArrowTableUDFPeakmemBench:

=================== =============== ============== ============= ================
--                                              udtf
------------------- -------------------------------------------------------------
     scenario        identity_udtf   explode_udtf   filter_udtf   stringify_udtf
=================== =============== ============== ============= ================
 sm_batch_few_col        464M           464M           464M           464M
 sm_batch_many_col       464M           464M           464M           464M
 lg_batch_few_col        466M           466M           465M           466M
 lg_batch_many_col       468M           468M           468M           468M
 pure_ints               466M           466M           466M           466M
 pure_strings            466M           467M           466M           466M
=================== =============== ============== ============= ================

Was this patch authored or co-authored using generative AI tooling?

No

Copy link
Copy Markdown
Contributor

@devin-petersohn devin-petersohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants