[SPARK-46776][PYTHON] Support pa.ChunkedArray columns in createDataFrame from pandas by Yicong-Huang · Pull Request #56157 · apache/spark

Yicong-Huang · 2026-05-27T21:34:12Z

What changes were proposed in this pull request?

pa.Array.from_pandas may return a pa.ChunkedArray (e.g. for string[pyarrow] dtype, or string data over 2 GB), which pa.RecordBatch.from_arrays rejects. Route the conversion through pa.Table.from_arrays(...).to_batches() instead, which accepts both Array and ChunkedArray and emits zero-copy RecordBatches aligned on a common chunk boundary.

Why are the changes needed?

Fix createDataFrame from pandas raising TypeError: Cannot convert pyarrow.lib.ChunkedArray to pyarrow.lib.Array. See SPARK-46776.

Does this PR introduce any user-facing change?

Yes. createDataFrame(pandas_df) no longer raises when a column is backed by a multi-chunk pyarrow array.

How was this patch tested?

New test ArrowTestsMixin.test_createDataFrame_pandas_chunked_array_backed, exercised on both classic and Spark Connect paths via the parity suite.

Was this patch authored or co-authored using generative AI tooling?

No

Yicong-Huang · 2026-05-28T17:27:53Z

cc @HyukjinKwon @zhengruifeng

…ame from pandas ### What changes were proposed in this pull request? `pa.Array.from_pandas` may return a `pa.ChunkedArray` (e.g. for `string[pyarrow]` dtype, or string data over 2 GB), which `pa.RecordBatch.from_arrays` rejects. Route the conversion through `pa.Table.from_arrays(...).to_batches()` instead, which accepts both `Array` and `ChunkedArray` and emits zero-copy `RecordBatch`es aligned on a common chunk boundary. ### Why are the changes needed? Fix `createDataFrame` from pandas raising `TypeError: Cannot convert pyarrow.lib.ChunkedArray to pyarrow.lib.Array`. See [SPARK-46776](https://issues.apache.org/jira/browse/SPARK-46776). ### Does this PR introduce _any_ user-facing change? Yes. `createDataFrame(pandas_df)` no longer raises when a column is backed by a multi-chunk pyarrow array. ### How was this patch tested? New test `ArrowTestsMixin.test_createDataFrame_pandas_chunked_array_backed`, exercised on both classic and Spark Connect paths via the parity suite. ### Was this patch authored or co-authored using generative AI tooling? No Closes #56157 from Yicong-Huang/SPARK-46776. Authored-by: Yicong Huang <17627829+Yicong-Huang@users.noreply.github.com> Signed-off-by: Yicong-Huang <17627829+Yicong-Huang@users.noreply.github.com> (cherry picked from commit 2f4ed64) Signed-off-by: Yicong-Huang <17627829+Yicong-Huang@users.noreply.github.com>

Yicong-Huang · 2026-05-28T23:56:57Z

Merged to master and branch-4.x.
Thanks @HyukjinKwon!

zhengruifeng · 2026-05-29T01:23:34Z

late lgtm

fix: support pa.ChunkedArray columns in createDataFrame from pandas

4536152

Yicong-Huang force-pushed the SPARK-46776 branch from 3aac97e to 4536152 Compare May 27, 2026 22:22

HyukjinKwon approved these changes May 28, 2026

View reviewed changes

Yicong-Huang closed this in 2f4ed64 May 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-46776][PYTHON] Support pa.ChunkedArray columns in createDataFrame from pandas#56157

[SPARK-46776][PYTHON] Support pa.ChunkedArray columns in createDataFrame from pandas#56157
Yicong-Huang wants to merge 1 commit into
apache:masterfrom
Yicong-Huang:SPARK-46776

Yicong-Huang commented May 27, 2026 •

edited

Loading

Uh oh!

Yicong-Huang commented May 28, 2026

Uh oh!

Yicong-Huang commented May 28, 2026 •

edited

Loading

Uh oh!

zhengruifeng commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Yicong-Huang commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Yicong-Huang commented May 28, 2026

Uh oh!

Yicong-Huang commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhengruifeng commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Yicong-Huang commented May 27, 2026 •

edited

Loading

Yicong-Huang commented May 28, 2026 •

edited

Loading