From 8e45c179dad36bff1f195c3a287f7751ad14b0f4 Mon Sep 17 00:00:00 2001 From: Nihal Rajak Date: Thu, 25 Sep 2025 19:38:46 +0530 Subject: [PATCH 1/3] docs: add Ballista link to landing page (#17746) This adds a link and description for DataFusion Ballista to the landing page, as suggested in issue #17746. Ballista is a distributed compute platform built on top of DataFusion. Closes: #17746 --- docs/source/index.rst | 3 +++ 1 file changed, 3 insertions(+) diff --git a/docs/source/index.rst b/docs/source/index.rst index edc6d311ceee..4815f5bd1c49 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -51,12 +51,15 @@ The following related subprojects target end users and have separate documentati queries. - `DataFusion Comet `_ is an accelerator for Apache Spark based on DataFusion. +- `DataFusion Ballista `_ is a distributed compute platform built on + top of DataFusion for scalable execution. "Out of the box," DataFusion offers `SQL `_ and `Dataframe `_ APIs, excellent `performance `_, built-in support for CSV, Parquet, JSON, and Avro, extensive customization, and a great community. `Python Bindings `_ are also available. +A distributed compute platform is also available via `Ballista `_. DataFusion features a full query planner, a columnar, streaming, multi-threaded, vectorized execution engine, and partitioned data sources. You can From 002eaac3529747736e72866634341790b24984f1 Mon Sep 17 00:00:00 2001 From: Nihal Rajak Date: Thu, 25 Sep 2025 22:50:16 +0530 Subject: [PATCH 2/3] fix(docs): update Ballista link --- docs/source/index.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/source/index.rst b/docs/source/index.rst index 4815f5bd1c49..a6456f5a5e14 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -51,7 +51,7 @@ The following related subprojects target end users and have separate documentati queries. - `DataFusion Comet `_ is an accelerator for Apache Spark based on DataFusion. -- `DataFusion Ballista `_ is a distributed compute platform built on +- `DataFusion Ballista `_ is a distributed compute platform built on top of DataFusion for scalable execution. "Out of the box," DataFusion offers `SQL `_ @@ -59,7 +59,7 @@ and `Dataframe `_, built-in support for CSV, Parquet, JSON, and Avro, extensive customization, and a great community. `Python Bindings `_ are also available. -A distributed compute platform is also available via `Ballista `_. +A distributed compute platform is also available via `Ballista `_. DataFusion features a full query planner, a columnar, streaming, multi-threaded, vectorized execution engine, and partitioned data sources. You can @@ -166,6 +166,6 @@ To get started, see :maxdepth: 1 :caption: DataFusion Subprojects - DataFusion Ballista + DataFusion Ballista DataFusion Comet DataFusion Python From 485f935c77f2196160f7661e53cbca2484da9a83 Mon Sep 17 00:00:00 2001 From: Nihal Rajak Date: Fri, 26 Sep 2025 07:06:09 +0530 Subject: [PATCH 3/3] updated theory part --- docs/source/index.rst | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/docs/source/index.rst b/docs/source/index.rst index a6456f5a5e14..574c285b0e65 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -51,15 +51,14 @@ The following related subprojects target end users and have separate documentati queries. - `DataFusion Comet `_ is an accelerator for Apache Spark based on DataFusion. -- `DataFusion Ballista `_ is a distributed compute platform built on - top of DataFusion for scalable execution. +- `DataFusion Ballista `_ is distributed processing extension for DataFusion. "Out of the box," DataFusion offers `SQL `_ and `Dataframe `_ APIs, excellent `performance `_, built-in support for CSV, Parquet, JSON, and Avro, extensive customization, and a great community. `Python Bindings `_ are also available. -A distributed compute platform is also available via `Ballista `_. +`Ballista `_ is Apache DataFusion extension enabling the parallelized execution of workloads across multiple nodes in a distributed environment. DataFusion features a full query planner, a columnar, streaming, multi-threaded, vectorized execution engine, and partitioned data sources. You can