apache · andygrove · Sep 29, 2022 · Sep 24, 2022 · Sep 24, 2022 · Sep 25, 2022
diff --git a/docs/README.md b/docs/README.md
@@ -17,16 +17,20 @@
   under the License.
 -->
 
-# Developer Documentation
+# Ballista Documentation
 
-Developer documentation can be found [here](developer/README.md).
-User documentation can be found [here](source/user-guide/introduction.md).
+## User Documentation
+
+Documentation for the current published release can be found at https://arrow.apache.org/ballista and the source
+content is located [here](source/user-guide/introduction.md).
 
-# User Documentation
+## Developer Documentation
+
+Developer documentation can be found [here](developer/README.md).
 
-_These instructions were forked from the `arrow-datafusion` repository and are outdated_
+## Building the User Guide
 
-## Dependencies
+### Dependencies
 
 It's recommended to install build dependencies and build the documentation
 inside a Python virtualenv.
@@ -38,21 +42,21 @@ inside a Python virtualenv.
 ## Build
 
 ```bash
-make html
+./build.sh
 ```
 
 ## Release
 
-The documentation is served through the
-[arrow-site](https://github.com/apache/arrow-site/) repo. To release a new
-version of the docs, follow these steps:
+The documentation is served through the [arrow-site](https://github.com/apache/arrow-site/) repository. To release
+a new version of the documentation, follow these steps:
 
-1. Run `make html` inside `docs` folder to generate the docs website inside the `build/html` folder.
-2. Clone the arrow-site repo
-3. Checkout to the `asf-site` branch (NOT `master`)
-4. Copy build artifacts into `arrow-site` repo's `datafusion` folder with a command such as
+1. Download the release source tarball (we can only publish documentation from official releases)
+2. Run `./build.sh` inside `docs` folder to generate the docs website inside the `build/html` folder.
+3. Clone the arrow-site repo
+4. Checkout to the `asf-site` branch (NOT `master`)
+5. Copy build artifacts into `arrow-site` repo's `ballista` folder with a command such as
 
-- `cp -rT ./build/html/ ../../arrow-site/datafusion/` (doesn't work on mac)
-- `rsync -avzr ./build/html/ ../../arrow-site/datafusion/`
+- `cp -rT ./build/html/ ../../arrow-site/ballista/` (doesn't work on mac)
+- `rsync -avzr ./build/html/ ../../arrow-site/ballista/`
 
-5. Commit changes in `arrow-site` and send a PR.
+6. Commit changes in `arrow-site` and send a PR.
diff --git a/docs/build.sh b/docs/build.sh
@@ -0,0 +1,21 @@
+#!/bin/bash
+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+rm -rf build
+make html
diff --git a/...ce/user-guide/deployment/configuration.md → docs/developer/configuration.md b/...ce/user-guide/deployment/configuration.md → docs/developer/configuration.md
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -29,11 +29,28 @@ Table of content
    :maxdepth: 1
    :caption: User Guide
 
-   user-guide/introduction
-   user-guide/deployment/index
-   user-guide/python
-   user-guide/rust
-   user-guide/cli
+   Introduction <user-guide/introduction>
+
+.. toctree::
+   :maxdepth: 1
+   :caption: Cluster Deployment
+
+   Deployment <user-guide/deployment/index>
+   Scheduler <user-guide/scheduler>
+
+.. toctree::
+   :maxdepth: 1
+   :caption: Clients
+
+   Python <user-guide/python>
+   Rust <user-guide/rust>
+   Flight SQL JDBC <user-guide/flightsql>
+   SQL CLI <user-guide/cli>
+
+.. toctree::
+   :maxdepth: 1
+   :caption: Reference
+
    user-guide/configs
    user-guide/tuning-guide
    user-guide/faq

diff --git a/docs/source/user-guide/cli.md b/docs/source/user-guide/cli.md
@@ -17,27 +17,35 @@
   under the License.
 -->
 
-# DataFusion Command-line Interface
+# Ballista Command-line Interface
 
-The DataFusion CLI allows SQL queries to be executed by an in-process DataFusion context, or by a distributed
-Ballista context.
+The Ballista CLI allows SQL queries to be executed against a Ballista cluster, or in standalone mode in a single
+process.
 
+Use Cargo to install:
+
+```bash
+cargo install ballista-cli
 ```
-USAGE:
-    datafusion-cli [FLAGS] [OPTIONS]
 
-FLAGS:
-    -h, --help       Prints help information
-    -q, --quiet      Reduce printing other than the results and work quietly
-    -V, --version    Prints version information
+## Usage
+
+```
+USAGE:
+    ballista-cli [OPTIONS]
 
 OPTIONS:
-    -c, --batch-size <batch-size>    The batch size of each query, or use DataFusion default
-    -p, --data-path <data-path>      Path to your data, default to current directory
-    -f, --file <file>...             Execute commands from file(s), then exit
-        --format <format>            Output format [default: table]  [possible values: csv, tsv, table, json, ndjson]
-        --host <host>                Ballista scheduler host
-        --port <port>                Ballista scheduler port
+    -c, --batch-size <BATCH_SIZE>    The batch size of each query, or use DataFusion default
+    -f, --file <FILE>...             Execute commands from file(s), then exit
+        --format <FORMAT>            [default: table] [possible values: csv, tsv, table, json,
+                                     nd-json]
+    -h, --help                       Print help information
+        --host <HOST>                Ballista scheduler host
+    -p, --data-path <DATA_PATH>      Path to your data, default to current directory
+        --port <PORT>                Ballista scheduler port
+    -q, --quiet                      Reduce printing other than the results and work quietly
+    -r, --rc <RC>...                 Run the provided files on startup instead of ~/.datafusionrc
+    -V, --version                    Print version information
 ```
 
 ## Example
@@ -48,10 +56,12 @@ Create a CSV file to query.
 $ echo "1,2" > data.csv
 ```
 
+## Run Ballista CLI in Standalone Mode
+
 ```bash
-$ datafusion-cli
+$ ballista-cli
 
-DataFusion CLI v8.0.0
+Ballista CLI v8.0.0
 
 > CREATE EXTERNAL TABLE foo (a INT, b INT) STORED AS CSV LOCATION 'data.csv';
 0 rows in set. Query took 0.001 seconds.
@@ -65,36 +75,17 @@ DataFusion CLI v8.0.0
 1 row in set. Query took 0.017 seconds.
 ```
 
-## DataFusion-Cli
-
-Build the `datafusion-cli` without the feature of ballista.
-
-```bash
-cd arrow-datafusion/datafusion-cli
-cargo build
-```
-
-## Ballista
-
-The DataFusion CLI can also connect to a Ballista scheduler for query execution.
-
-Before you use the `datafusion-cli` to connect the Ballista scheduler, you should build/compile
-the `datafusion-cli` with feature of "ballista" first.
-
-```bash
-cd arrow-datafusion/datafusion-cli
-cargo build --features ballista
-```
+## Run Ballista CLI in Distributed Mode
 
-Then, you can connect the Ballista by below command.
+The CLI can also connect to a Ballista scheduler for query execution.
 
 ```bash
 datafusion-cli --host localhost --port 50050
 ```
 
 ## Cli commands
 
-Available commands inside DataFusion CLI are:
+Available commands inside Ballista CLI are:
 
 - Quit
 

diff --git a/docs/source/user-guide/deployment/docker-compose.md b/docs/source/user-guide/deployment/docker-compose.md
@@ -28,8 +28,8 @@ There is no officially published Docker image so it is currently necessary to bu
 Run the following commands to clone the source repository and build the Docker image.
 
 ```bash
-git clone git@github.com:apache/arrow-datafusion.git -b 8.0.0
-cd arrow-datafusion
+git clone git@github.com:apache/arrow-ballista.git -b 8.0.0
+cd arrow-ballista
 ./dev/build-ballista-docker.sh
 ```
 

diff --git a/docs/source/user-guide/deployment/docker.md b/docs/source/user-guide/deployment/docker.md
@@ -26,8 +26,8 @@ There is no officially published Docker image so it is currently necessary to bu
 Run the following commands to clone the source repository and build the Docker image.
 
 ```bash
-git clone git@github.com:apache/arrow-datafusion.git -b 8.0.0
-cd arrow-datafusion
+git clone git@github.com:apache/arrow-ballista.git -b 8.0.0
+cd arrow-ballista
 ./dev/build-ballista-docker.sh
 ```
 

diff --git a/docs/source/user-guide/deployment/index.rst b/docs/source/user-guide/deployment/index.rst
@@ -21,8 +21,7 @@ Start a Ballista Cluster
 .. toctree::
    :maxdepth: 2
 
-   cargo-install
-   docker
-   docker-compose
-   kubernetes
-   configuration
+   Cargo <cargo-install>
+   Docker <docker>
+   Docker Compose <docker-compose>
+   Kubernetes <kubernetes>
diff --git a/docs/source/user-guide/flightsql.md b/docs/source/user-guide/flightsql.md
@@ -109,7 +109,7 @@ To register a table, find a `.csv`, `.json`, or `.parquet` file for testing, and
 
 ```sql
 create external table customer stored as CSV with header row
-    location '/home/username/arrow-datafusion/datafusion/core/tests/tpch-csv/customer.csv';
+    location '/path/to/customer.csv';
 ```
 
 Once the table has been registered, all the normal SQL queries can be performed:

diff --git a/docs/source/user-guide/images/ballista-web-ui.png b/docs/source/user-guide/images/ballista-web-ui.png
diff --git a/docs/source/user-guide/images/example-query-plan.png b/docs/source/user-guide/images/example-query-plan.png
diff --git a/docs/source/user-guide/introduction.md b/docs/source/user-guide/introduction.md
@@ -19,16 +19,19 @@
 
 # Overview
 
-Ballista is a distributed compute platform primarily implemented in Rust, and powered by Apache Arrow. It is
-built on an architecture that allows other programming languages to be supported as first-class citizens without paying
-a penalty for serialization costs.
+Ballista is a distributed compute platform primarily implemented in Rust, and powered by Apache Arrow.
 
-The foundational technologies in Ballista are:
+Ballista has a scheduler and an executor process that are standard Rust executables and can be executed directly, but
+Dockerfiles are provided to build images for use in containerized environments, such as Docker, Docker Compose, and
+Kubernetes. See the [deployment guide](deployment.md) for more information
 
-- [Apache Arrow](https://arrow.apache.org/) memory model and compute kernels for efficient processing of data.
-- [Apache Arrow Flight Protocol](https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/) for efficient data transfer between processes.
-- [Google Protocol Buffers](https://developers.google.com/protocol-buffers) for serializing query plans.
-- [DataFusion](https://github.com/apache/arrow-datafusion/) for query execution.
+SQL and DataFrame queries can be submitted from Python and Rust, and SQL queries can be submitted via the Arrow
+Flight SQL JDBC driver, supporting your favorite JDBC compliant tools such as [DataGrip](datagrip)
+or [tableau](tableau). For setup instructions, please see the [FlightSQL guide](flightsql.md).
+
+The scheduler has a web user interface for monitoring query status as well as a REST API.
+
+![Ballista Scheduler Web UI](./images/ballista-web-ui.png)
 
 ## How does this compare to Apache Spark?
 
@@ -45,10 +48,6 @@ Although Ballista is largely inspired by Apache Spark, there are some key differ
 - The use of Apache Arrow as the memory model and network protocol means that data can be exchanged between executors
   in any programming language with minimal serialization overhead.
 
-## Status
-
-Ballista is still in the early stages of development but is capable of executing complex analytical queries at scale.
-
-## Usage
-
-Ballista can be used from your favorite JDBC compliant tools such as [DataGrip](https://www.jetbrains.com/datagrip/) or [tableau](https://help.tableau.com/current/pro/desktop/en-us/examples_otherdatabases_jdbc.htm). For setup instructions, please see the [FlightSQL guide](flightsql.md).
+[deployment](./deployment)
+[datagrip](https://www.jetbrains.com/datagrip/)
+[tableau](https://help.tableau.com/current/pro/desktop/en-us/examples_otherdatabases_jdbc.htm)
diff --git a/docs/source/user-guide/python.md b/docs/source/user-guide/python.md
@@ -21,6 +21,9 @@
 
 Ballista provides Python bindings, allowing SQL and DataFrame queries to be executed from the Python shell.
 
+Like PySpark, it allows you to build a plan through SQL or a DataFrame API against Parquet, CSV, JSON, and other
+popular file formats files, run it in a distributed environment, and obtain the result back in Python.
+
 ## Connecting to a Cluster
 
 The following code demonstrates how to create a Ballista context and connect to a scheduler.
@@ -30,7 +33,13 @@ The following code demonstrates how to create a Ballista context and connect to
 >>> ctx = ballista.BallistaContext("localhost", 50050)
 ```
 
-## Registering Tables
+## SQL
+
+The Python bindings support executing SQL queries as well.
+
+### Registering Tables
+
+Before SQL queries can be executed, tables need to be registered with the context.
 
 Tables can be registered against the context by calling one of the `register` methods, or by executing SQL.
 
@@ -42,7 +51,7 @@ Tables can be registered against the context by calling one of the `register` me
 >>> ctx.sql("CREATE EXTERNAL TABLE trips STORED AS PARQUET LOCATION '/mnt/bigdata/nyctaxi'")
 ```
 
-## Executing Queries
+### Executing Queries
 
 The `sql` method creates a `DataFrame`. The query is executed when an action such as `show` or `collect` is executed.
 
@@ -88,3 +97,37 @@ The `explain` method can be used to show the logical and physical query plans fo
 |               |                                                             |
 +---------------+-------------------------------------------------------------+
 ```
+
+## DataFrame
+
+The following example demonstrates creating arrays with PyArrow and then creating a Ballista DataFrame.
+
+```python
+import ballista
+import pyarrow
+
+# an alias
+f = ballista.functions
+
+# create a context
+ctx = ballista.BallistaContext("localhost", 50050)
+
+# create a RecordBatch and a new DataFrame from it
+batch = pyarrow.RecordBatch.from_arrays(
+    [pyarrow.array([1, 2, 3]), pyarrow.array([4, 5, 6])],
+    names=["a", "b"],
+)
+df = ctx.create_dataframe([[batch]])
+
+# create a new statement
+df = df.select(
+    f.col("a") + f.col("b"),
+    f.col("a") - f.col("b"),
+)
+
+# execute and collect the first (and only) batch
+result = df.collect()[0]
+
+assert result.column(0) == pyarrow.array([5, 7, 9])
+assert result.column(1) == pyarrow.array([-3, -3, -3])
+```
diff --git a/docs/source/user-guide/rust.md b/docs/source/user-guide/rust.md
@@ -19,10 +19,7 @@
 
 # Ballista Rust Client
 
-Ballista usage is very similar to DataFusion. Tha main difference is that the starting point is a `BallistaContext`
-instead of the DataFusion `SessionContext`. Ballista uses the same DataFrame API as DataFusion.
-
-The following code sample demonstrates how to create a `BallistaContext` to connect to a Ballista scheduler process.
+To connect to a Ballista cluster from Rust, first start by creating a `BallistaContext`.
 
 ```rust
 let config = BallistaConfig::builder()