[SPARK-42639][CONNECT] Add createDataFrame/createDataset methods #40242

hvanhovell · 2023-03-02T00:26:02Z

What changes were proposed in this pull request?

This PR adds all the SparkSession.createDataFrame(..) and SparkSession.createDataset(..) methods we can support in connect. The implicit conversion that uses this is also added.

I moved the ArrowWriter class from sql/core to sql/catalyst for the arrow writing.

Why are the changes needed?

API partity with the existing SQL APIs

Does this PR introduce any user-facing change?

Yes.

How was this patch tested?

I have added a number of tests to ClientE2ETestSuite.

This reverts commit 60b9463.

dongjoon-hyun

+1, LGTM.

### What changes were proposed in this pull request? This PR adds all the `SparkSession.createDataFrame(..)` and `SparkSession.createDataset(..)` methods we can support in connect. The implicit conversion that uses this is also added. I moved the `ArrowWriter` class from sql/core to sql/catalyst for the arrow writing. ### Why are the changes needed? API partity with the existing SQL APIs ### Does this PR introduce _any_ user-facing change? Yes. ### How was this patch tested? I have added a number of tests to `ClientE2ETestSuite`. Closes #40242 from hvanhovell/SPARK-42639. Authored-by: Herman van Hovell <herman@databricks.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit a9626d5) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

dongjoon-hyun · 2023-03-02T07:14:07Z

Thank you, @hvanhovell and @HyukjinKwon . Merged to master/3.4.

### What changes were proposed in this pull request? This PR adds all the `SparkSession.createDataFrame(..)` and `SparkSession.createDataset(..)` methods we can support in connect. The implicit conversion that uses this is also added. I moved the `ArrowWriter` class from sql/core to sql/catalyst for the arrow writing. ### Why are the changes needed? API partity with the existing SQL APIs ### Does this PR introduce _any_ user-facing change? Yes. ### How was this patch tested? I have added a number of tests to `ClientE2ETestSuite`. Closes apache#40242 from hvanhovell/SPARK-42639. Authored-by: Herman van Hovell <herman@databricks.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit a9626d5) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

hvanhovell added 6 commits February 27, 2023 23:35

wip

9952aa7

create local set functionality.

3663d36

Merge remote-tracking branch 'apache/master' into to_local_relation

6494873

Fix paths in integration tests

60b9463

Hook in implicits

1bfa5b9

tests, fixes, and formatting

315a990

github-actions bot added CONNECT SQL labels Mar 2, 2023

hvanhovell added 3 commits March 1, 2023 20:26

Revert "Fix paths in integration tests"

da8e8a8

This reverts commit 60b9463.

update test

8987338

limit vis

7c6608c

HyukjinKwon approved these changes Mar 2, 2023

View reviewed changes

dongjoon-hyun approved these changes Mar 2, 2023

View reviewed changes

hvanhovell added 2 commits March 1, 2023 23:26

fix 2.13

63f1b8d

Merge remote-tracking branch 'apache/master' into SPARK-42639

7631773

dongjoon-hyun closed this in a9626d5 Mar 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-42639][CONNECT] Add createDataFrame/createDataset methods #40242

[SPARK-42639][CONNECT] Add createDataFrame/createDataset methods #40242

hvanhovell commented Mar 2, 2023

dongjoon-hyun left a comment

dongjoon-hyun commented Mar 2, 2023

[SPARK-42639][CONNECT] Add createDataFrame/createDataset methods #40242

[SPARK-42639][CONNECT] Add createDataFrame/createDataset methods #40242

Conversation

hvanhovell commented Mar 2, 2023

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

dongjoon-hyun left a comment

Choose a reason for hiding this comment

dongjoon-hyun commented Mar 2, 2023