Skip to content

[VL] Example workload for benchmarking Gluten + Delta on TPC-DS datasets#10614

Merged
zhztheplayer merged 10 commits intoapache:mainfrom
zhztheplayer:wip-delta-ds-workload
Sep 15, 2025
Merged

[VL] Example workload for benchmarking Gluten + Delta on TPC-DS datasets#10614
zhztheplayer merged 10 commits intoapache:mainfrom
zhztheplayer:wip-delta-ds-workload

Conversation

@zhztheplayer
Copy link
Copy Markdown
Member

@zhztheplayer zhztheplayer commented Sep 3, 2025

This adds a workload example tools/workload/tpcds-delta.

Table generator is using gluten-it.

Workload is a trivial TPC-DS query benchmark on Delta tables without the DML commands (delete, merge, insert, etc.).

Following the practice we adopted for workload examples in tools/workload, the scripts need to be manually modified to execute.

Example for the modifications:

...
GLUTEN_JAR=/opt/code/incubator-gluten/package/target/gluten-velox-bundle-spark3.4_2.12-linux_aarch64-1.6.0-SNAPSHOT.jar
DELTA_JARS=/root/.m2/repository/io/delta/delta-core_2.12/2.4.0/delta-core_2.12-2.4.0.jar:/root/.m2/repository/io/delta/delta-storage/2.4.0/delta-storage-2.4.0.jar
SPARK_HOME=/opt/programs/spark-3.4.4-bin-hadoop3/
...
...
var delta_table_path = "/tmp/my-data/tpcds-generated-20.0-delta-partitioned"
var gluten_root = "/opt/code/incubator-gluten"

// File root path: file://, hdfs:// , s3 , ...
// e.g. hdfs://hostname:8020
var delta_file_root = "file://"
...

@zhztheplayer zhztheplayer changed the title [VL] Example workload for benchmarking Gluten + Delta on TPC datasets [VL] Example workload for benchmarking Gluten + Delta on TPC-DS datasets Sep 3, 2025
@zhztheplayer zhztheplayer marked this pull request as draft September 3, 2025 09:40
@zhztheplayer zhztheplayer marked this pull request as ready for review September 4, 2025 08:30
Comment thread tools/workload/tpcds-delta/README.md Outdated

```bash
cd ${GLUTEN_HOME}/tools/gluten-it/
mvn clean install -D spark-3.4,delta
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe -P

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be -P. Thanks for catching.

@zhztheplayer zhztheplayer merged commit 7a7b93c into apache:main Sep 15, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants