Skip to content

[SPARK-57072][PYTHON][DOCS] Add missing 4.2 methods to PySpark API reference#56116

Closed
zhengruifeng wants to merge 3 commits into
apache:masterfrom
zhengruifeng:spark-doc-methods-dev2
Closed

[SPARK-57072][PYTHON][DOCS] Add missing 4.2 methods to PySpark API reference#56116
zhengruifeng wants to merge 3 commits into
apache:masterfrom
zhengruifeng:spark-doc-methods-dev2

Conversation

@zhengruifeng
Copy link
Copy Markdown
Contributor

@zhengruifeng zhengruifeng commented May 26, 2026

What changes were proposed in this pull request?

Add public PySpark APIs that were added in Spark 4.2 but missing from the rendered Python API reference. This PR is documentation-only.

python/docs/source/reference/pyspark.sql/dataframe.rst:

  • DataFrame.zipWithIndex

python/docs/source/reference/pyspark.sql/datasource.rst:

  • DataSourceStreamReader.getDefaultReadLimit
  • DataSourceStreamReader.reportLatestOffset

python/docs/source/reference/pyspark.sql/io.rst:

  • DataFrameReader.changes

python/docs/source/reference/pyspark.ss/io.rst:

  • DataStreamReader.changes
  • DataStreamReader.name

Why are the changes needed?

All of the above are public, marked .. versionadded:: 4.2.0, and reachable through their respective public modules, but the autosummary entries were never added so they do not appear in the rendered API reference.

Original JIRAs:

Does this PR introduce any user-facing change?

Documentation-only change; the methods themselves are unchanged.

How was this patch tested?

Docs-only change. New entries inserted alphabetically within each autosummary block (DataFrame.zipWithIndex is appended after the existing trailing DataFrame.pandas_api since it is alphabetically last).

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (model: claude-opus-4-7)

@zhengruifeng zhengruifeng changed the title [PYTHON][DOCS] Add missing 4.2 methods to PySpark API reference [PYTHON][DOCS] Add missing 4.2 entries to PySpark API reference May 26, 2026
@zhengruifeng zhengruifeng changed the title [PYTHON][DOCS] Add missing 4.2 entries to PySpark API reference [SPARK-57072[PYTHON][DOCS] Add missing 4.2 methods to PySpark API reference May 26, 2026
@zhengruifeng zhengruifeng changed the title [SPARK-57072[PYTHON][DOCS] Add missing 4.2 methods to PySpark API reference [SPARK-57072][PYTHON][DOCS] Add missing 4.2 methods to PySpark API reference May 26, 2026
@zhengruifeng zhengruifeng marked this pull request as ready for review May 26, 2026 10:52
Copy link
Copy Markdown
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @zhengruifeng . Are these all missing instances? How do we verify it?

How was this patch tested?

Docs-only change. New entries inserted alphabetically within each autosummary block (DataFrame.zipWithIndex is appended after the existing trailing DataFrame.pandas_api since it is alphabetically last).

@zhengruifeng
Copy link
Copy Markdown
Contributor Author

+1, LGTM. Thank you, @zhengruifeng . Are these all missing instances? How do we verify it?

How was this patch tested?

Docs-only change. New entries inserted alphabetically within each autosummary block (DataFrame.zipWithIndex is appended after the existing trailing DataFrame.pandas_api since it is alphabetically last).

I am asking AI to generate the candidates and then check manually.
Will send new PRs if I find more issues

Add three methods added in Spark 4.2 that were missing from the Python
API reference:

- DataFrame.zipWithIndex (SPARK-55229/SPARK-55231)
- DataSourceStreamReader.getDefaultReadLimit (SPARK-55304)
- DataSourceStreamReader.reportLatestOffset (SPARK-55304)

The methods themselves were shipped with .. versionadded:: 4.2.0 and are
exported from their respective public modules; only the autosummary
entries in reference/pyspark.sql/{dataframe,datasource}.rst were absent.
…ference

Add the changes() reader method (SPARK-55950, .. versionadded:: 4.2.0)
to the Python API reference for both the batch and streaming sides:

- DataFrameReader.changes -> reference/pyspark.sql/io.rst
- DataStreamReader.changes -> reference/pyspark.ss/io.rst
DataStreamReader.name (SPARK-55121, .. versionadded:: 4.2.0) is public
on pyspark.sql.streaming.DataStreamReader but was missing from
reference/pyspark.ss/io.rst. Insert alphabetically between load and
option.
@zhengruifeng zhengruifeng force-pushed the spark-doc-methods-dev2 branch from c95af5b to 5a1dedf Compare May 27, 2026 03:33
zhengruifeng added a commit that referenced this pull request May 27, 2026
…erence

### What changes were proposed in this pull request?

Add public PySpark APIs that were added in Spark 4.2 but missing from the rendered Python API reference. This PR is documentation-only.

`python/docs/source/reference/pyspark.sql/dataframe.rst`:
- `DataFrame.zipWithIndex`

`python/docs/source/reference/pyspark.sql/datasource.rst`:
- `DataSourceStreamReader.getDefaultReadLimit`
- `DataSourceStreamReader.reportLatestOffset`

`python/docs/source/reference/pyspark.sql/io.rst`:
- `DataFrameReader.changes`

`python/docs/source/reference/pyspark.ss/io.rst`:
- `DataStreamReader.changes`
- `DataStreamReader.name`

### Why are the changes needed?

All of the above are public, marked `.. versionadded:: 4.2.0`, and reachable through their respective public modules, but the autosummary entries were never added so they do not appear in the rendered API reference.

Original JIRAs:
- `DataFrame.zipWithIndex` — SPARK-55229 / SPARK-55231
- `DataSourceStreamReader.getDefaultReadLimit` / `reportLatestOffset` — SPARK-55304
- `DataFrameReader.changes` / `DataStreamReader.changes` — SPARK-55950
- `DataStreamReader.name` — SPARK-55121

### Does this PR introduce _any_ user-facing change?

Documentation-only change; the methods themselves are unchanged.

### How was this patch tested?

Docs-only change. New entries inserted alphabetically within each autosummary block (`DataFrame.zipWithIndex` is appended after the existing trailing `DataFrame.pandas_api` since it is alphabetically last).

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (model: claude-opus-4-7)

Closes #56116 from zhengruifeng/spark-doc-methods-dev2.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Ruifeng Zheng <ruifengz@foxmail.com>
(cherry picked from commit 64a8b51)
Signed-off-by: Ruifeng Zheng <ruifengz@foxmail.com>
zhengruifeng added a commit that referenced this pull request May 27, 2026
…erence

### What changes were proposed in this pull request?

Add public PySpark APIs that were added in Spark 4.2 but missing from the rendered Python API reference. This PR is documentation-only.

`python/docs/source/reference/pyspark.sql/dataframe.rst`:
- `DataFrame.zipWithIndex`

`python/docs/source/reference/pyspark.sql/datasource.rst`:
- `DataSourceStreamReader.getDefaultReadLimit`
- `DataSourceStreamReader.reportLatestOffset`

`python/docs/source/reference/pyspark.sql/io.rst`:
- `DataFrameReader.changes`

`python/docs/source/reference/pyspark.ss/io.rst`:
- `DataStreamReader.changes`
- `DataStreamReader.name`

### Why are the changes needed?

All of the above are public, marked `.. versionadded:: 4.2.0`, and reachable through their respective public modules, but the autosummary entries were never added so they do not appear in the rendered API reference.

Original JIRAs:
- `DataFrame.zipWithIndex` — SPARK-55229 / SPARK-55231
- `DataSourceStreamReader.getDefaultReadLimit` / `reportLatestOffset` — SPARK-55304
- `DataFrameReader.changes` / `DataStreamReader.changes` — SPARK-55950
- `DataStreamReader.name` — SPARK-55121

### Does this PR introduce _any_ user-facing change?

Documentation-only change; the methods themselves are unchanged.

### How was this patch tested?

Docs-only change. New entries inserted alphabetically within each autosummary block (`DataFrame.zipWithIndex` is appended after the existing trailing `DataFrame.pandas_api` since it is alphabetically last).

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (model: claude-opus-4-7)

Closes #56116 from zhengruifeng/spark-doc-methods-dev2.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Ruifeng Zheng <ruifengz@foxmail.com>
(cherry picked from commit 64a8b51)
Signed-off-by: Ruifeng Zheng <ruifengz@foxmail.com>
@zhengruifeng zhengruifeng deleted the spark-doc-methods-dev2 branch May 27, 2026 05:34
@zhengruifeng
Copy link
Copy Markdown
Contributor Author

thanks all, merged to master/4.x/4.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants