feat: improve tpch benchmark CLI #1391

andygrove · 2026-01-17T21:17:13Z

Summary

Make the --query parameter optional - runs all 22 TPC-H queries when not specified
Only print SQL queries when --debug flag is enabled
Write a single JSON output file for the entire benchmark run (instead of one per query)
Fix parquet file path resolution for datafusion benchmarks
Simplify output when --iterations 1 (no iteration number, no average)

Example usage

Run a single query:

cargo run --release --bin tpch -- benchmark datafusion --path ./data --format parquet --query 1

Run all 22 queries:

cargo run --release --bin tpch -- benchmark datafusion --path ./data --format parquet

Test plan

Code compiles with cargo check -p ballista-benchmarks
cargo fmt and cargo clippy pass
Manual testing with TPC-H data (both parquet and tbl formats)

🤖 Generated with Claude Code

andygrove · 2026-01-17T21:32:47Z

Ballista run:

$ cargo run --release --bin tpch benchmark ballista --host localhost --port 50050 --path $(pwd)/data --format parquet --iterations 1 --output .
    Finished `release` profile [optimized] target(s) in 0.16s
     Running `/home/andy/git/apache/datafusion-ballista/target/release/tpch benchmark ballista --host localhost --port 50050 --path /home/andy/git/apache/datafusion-ballista/benchmarks/data --format parquet --iterations 1 --output .`
Running benchmarks with the following options: BallistaBenchmarkOpt { query: None, debug: false, expected_results: None, iterations: 1, batch_size: 8192, path: "/home/andy/git/apache/datafusion-ballista/benchmarks/data", file_format: "parquet", partitions: 2, host: Some("localhost"), port: Some(50050), output_path: Some(".") }
Query 1 took 816.9 ms and returned 4 rows
Query 2 took 1425.0 ms and returned 100 rows
Query 3 took 1018.5 ms and returned 10 rows
Query 4 took 815.6 ms and returned 5 rows
Query 5 took 1625.8 ms and returned 5 rows
Query 6 took 408.3 ms and returned 1 rows
Query 7 took 1830.3 ms and returned 4 rows
Query 8 took 2236.7 ms and returned 2 rows
Query 9 took 1829.2 ms and returned 175 rows
Query 10 took 1220.9 ms and returned 20 rows
Query 11 took 1018.1 ms and returned 1048 rows
Query 12 took 812.4 ms and returned 2 rows
Query 13 took 916.4 ms and returned 42 rows
Query 14 took 611.3 ms and returned 1 rows
Query 15 took 1019.0 ms and returned 0 rows
Query 16 took 1220.8 ms and returned 18314 rows
Query 17 took 715.1 ms and returned 1 rows
Query 18 took 1325.1 ms and returned 57 rows
Query 19 took 716.4 ms and returned 1 rows
Query 20 took 1019.5 ms and returned 186 rows
Query 21 took 1628.9 ms and returned 100 rows
Query 22 took 816.5 ms and returned 7 rows
Writing summary file to ./tpch-1768685522.json

DataFusion run:

$ cargo run --release --bin tpch benchmark datafusion --path $(pwd)/data --format parquet --iterations 1 --output .
   Compiling ballista-benchmarks v51.0.0 (/home/andy/git/apache/datafusion-ballista/benchmarks)
    Finished `release` profile [optimized] target(s) in 9.28s
     Running `/home/andy/git/apache/datafusion-ballista/target/release/tpch benchmark datafusion --path /home/andy/git/apache/datafusion-ballista/benchmarks/data --format parquet --iterations 1 --output .`
Running benchmarks with the following options: DataFusionBenchmarkOpt { query: None, debug: false, iterations: 1, partitions: 2, batch_size: 8192, path: "/home/andy/git/apache/datafusion-ballista/benchmarks/data", file_format: "parquet", mem_table: false, output_path: Some(".") }
Query 1 took 222.8 ms and returned 4 rows
Query 2 took 45.5 ms and returned 100 rows
Query 3 took 76.7 ms and returned 10 rows
Query 4 took 101.8 ms and returned 5 rows
Query 5 took 111.8 ms and returned 5 rows
Query 6 took 56.4 ms and returned 1 rows
Query 7 took 162.9 ms and returned 4 rows
Query 8 took 98.8 ms and returned 2 rows
Query 9 took 151.0 ms and returned 175 rows
Query 10 took 116.8 ms and returned 20 rows
Query 11 took 24.9 ms and returned 1048 rows
Query 12 took 90.8 ms and returned 2 rows
Query 13 took 148.1 ms and returned 42 rows
Query 14 took 58.3 ms and returned 1 rows
Query 15 took 86.6 ms and returned 0 rows
Query 16 took 34.6 ms and returned 18314 rows
Query 17 took 160.8 ms and returned 1 rows
Query 18 took 286.1 ms and returned 57 rows
Query 19 took 117.3 ms and returned 1 rows
Query 20 took 78.6 ms and returned 186 rows
Query 21 took 163.9 ms and returned 100 rows
Query 22 took 39.3 ms and returned 7 rows
Writing summary file to ./tpch-1768685732.json

When running the tpch benchmark, the --query parameter is now optional. If not specified, all 22 TPC-H queries will be run sequentially. Changes: - Make --query optional for both datafusion and ballista benchmarks - Run all 22 queries when --query is not specified - Only print SQL queries when --debug flag is enabled - Write a single JSON output file for the entire benchmark run - Fix parquet file path resolution for datafusion benchmarks - Simplify output when iterations=1 (no iteration number, no average) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

andygrove force-pushed the optional-query-param branch 5 times, most recently from a5e4d34 to f408775 Compare January 17, 2026 21:28

andygrove changed the title ~~feat: make query parameter optional in tpch benchmark~~ feat: improve tpch benchmark CLI Jan 17, 2026

andygrove force-pushed the optional-query-param branch from f408775 to c3b9c01 Compare January 17, 2026 21:35

andygrove requested a review from milenkovicm January 17, 2026 21:36

milenkovicm approved these changes Jan 17, 2026

View reviewed changes

milenkovicm merged commit 34f7513 into apache:main Jan 17, 2026
15 checks passed

milenkovicm mentioned this pull request Jan 18, 2026

test tpch queries in distributed setup #1384

Draft

andygrove mentioned this pull request Jan 20, 2026

feat: Add batch coalescing ability to shuffle reader exec #1380

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: improve tpch benchmark CLI #1391

feat: improve tpch benchmark CLI #1391

Uh oh!

andygrove commented Jan 17, 2026 •

edited

Loading

Uh oh!

andygrove commented Jan 17, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: improve tpch benchmark CLI #1391

feat: improve tpch benchmark CLI #1391

Uh oh!

Conversation

andygrove commented Jan 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Example usage

Test plan

Uh oh!

andygrove commented Jan 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

andygrove commented Jan 17, 2026 •

edited

Loading

andygrove commented Jan 17, 2026 •

edited

Loading