Skip to content

Add end-to-end Arrow Flight benchmark with CI dashboard integration #5556

@Yicong-Huang

Description

@Yicong-Huang

Task Summary

Add a real end-to-end micro-benchmark of the Arrow Flight data path that spawns a live PythonWorkflowWorker actor (real Pekko actor + real Python subprocess via texera_run_python_worker.py + real Arrow Flight gRPC transport) wired to an identity Python UDF, sweeps a 36-config grid (batch_size × schema_width × string_len), and reports per-batch send→echo latency percentiles plus throughput. Wire results into the github-action-benchmark dashboard via a bench-agnostic Benchmarks workflow + bin/run-benchmarks.sh entry point so future bench suites (e.g. JMH for ArrowUtils.fromTexeraSchema / appendTexeraTuple) can plug in without touching CI.

Task Type

  • DevOps / Deployment / CI
  • Performance

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No fields configured for Task.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions