From 53911f4230b12034e7ec75c9f8e893b78da1cbfc Mon Sep 17 00:00:00 2001 From: Andy Grove Date: Wed, 13 May 2026 16:03:53 -0600 Subject: [PATCH] docs: remove project-status checklist MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The checklist was already stale within a day of being added — the SessionContextBuilder (#28), writeParquet (#27), and additional DataFrame transformations (#30) shipped after it was written. Future work will be tracked as GitHub issues instead. Also refresh the two sentences that referenced it so they don't carry stale claims forward: dataframe.md now lists the real current transformations, and sessioncontext.md documents SessionContext.builder() (which had no user-guide coverage). --- README.md | 2 +- docs/source/user-guide/dataframe.md | 4 +- docs/source/user-guide/index.md | 1 - docs/source/user-guide/project-status.md | 49 ------------------------ docs/source/user-guide/sessioncontext.md | 24 +++++++++--- 5 files changed, 22 insertions(+), 58 deletions(-) delete mode 100644 docs/source/user-guide/project-status.md diff --git a/README.md b/README.md index 8085e8d..7c0caeb 100644 --- a/README.md +++ b/README.md @@ -43,7 +43,7 @@ and is built with Sphinx (see [`docs/README.md`](docs/README.md) for the build steps): - [User guide](docs/source/user-guide/index.md) — installation, the - DataFrame and SQL APIs, Parquet ingestion, project status. + DataFrame and SQL APIs, Parquet ingestion. - [Contributor guide](docs/source/contributor-guide/index.md) — build, test, code style, and how to bump the DataFusion version. diff --git a/docs/source/user-guide/dataframe.md b/docs/source/user-guide/dataframe.md index e91eab7..b019e96 100644 --- a/docs/source/user-guide/dataframe.md +++ b/docs/source/user-guide/dataframe.md @@ -35,8 +35,8 @@ not start until you pull results. ## DataFrame transformations -The DataFrame API exposes `select` and `filter` today. Other -transformations are TBD — see [Project status](project-status.md). +The DataFrame API exposes `select`, `filter`, `limit`, `distinct`, +`dropColumns`, and `withColumnRenamed`. ```java try (DataFrame df = ctx.readParquet("/path/to/orders.parquet")) { diff --git a/docs/source/user-guide/index.md b/docs/source/user-guide/index.md index 50e41d7..2f32499 100644 --- a/docs/source/user-guide/index.md +++ b/docs/source/user-guide/index.md @@ -37,7 +37,6 @@ sessioncontext dataframe parquet proto-plans -project-status api-reference ``` diff --git a/docs/source/user-guide/project-status.md b/docs/source/user-guide/project-status.md deleted file mode 100644 index 82154ea..0000000 --- a/docs/source/user-guide/project-status.md +++ /dev/null @@ -1,49 +0,0 @@ - - -# Project status - -A snapshot of what works today. The library is in early development; the -API will change before the first release. - -## Query interfaces - -- [x] SQL: `SessionContext.sql(String)` -- [x] DataFrame: `select`, `filter` (other transformations TBD) -- [x] DataFusion-Proto `LogicalPlanNode`: `SessionContext.fromProto(byte[])`. - The `datafusion-proto` Java classes are generated by the build. - -## Data sources - -- [x] Parquet via `registerParquet` / `readParquet`, with `ParquetReadOptions` -- [x] CSV via `registerCsv` / `readCsv`, with `CsvReadOptions` -- [ ] JSON, Avro -- [ ] Custom catalog and table providers - -## Results - -- [x] `DataFrame.collect(allocator)` — Arrow C Data Interface stream -- [x] `DataFrame.count()`, `show()`, `show(int)` -- [x] `SessionContext.tableSchema(String)` - -## Not yet - -- [ ] `SessionConfig` / `RuntimeEnv` knobs -- [ ] Java UDFs -- [ ] `write_*` outputs diff --git a/docs/source/user-guide/sessioncontext.md b/docs/source/user-guide/sessioncontext.md index 14111b8..818ee0e 100644 --- a/docs/source/user-guide/sessioncontext.md +++ b/docs/source/user-guide/sessioncontext.md @@ -40,9 +40,23 @@ A `SessionContext` is **not thread-safe**. Do not share one across threads without external synchronization. The simplest pattern is one context per thread. -## What's configurable today +## Configuration + +`SessionContext.builder()` exposes a fluent builder for overriding +DataFusion defaults — batch size, target partitions, statistics +collection, information schema, memory pool size, and the spill +directory. See the + +SessionContextBuilder +Javadoc for the full list. -Today, `SessionContext` exposes only data-source registration and query -construction. Tuning knobs that DataFusion offers natively -(`SessionConfig`, `RuntimeEnv`) are not yet wired through the Java API. -See [Project status](project-status.md) for the current shape of the API. +```java +try (SessionContext ctx = SessionContext.builder() + .batchSize(4096) + .targetPartitions(8) + .build()) { + // ... +} +```