Feature Request: Add more I/O functions to the DuckDB PySpark API & toPandas #9639

TomBurdge · 2023-11-10T15:40:45Z

TomBurdge
Nov 10, 2023

Why do you want this feature?

The DuckDB PySpark API is a brilliant idea and it doesn't yet cover some core parts of PySpark's API.

Some useful attributes to add to the experimental PySpark DataFrame class are:

.write.parquet
.write.csv
.toPandas

I have implemented them in my own fork.
This was very simple because the necessary functions already exist in the Spark relation attribute (duckdb.DuckDBPyRelation).

I will make a pull request if the core team thinks this is a good idea!
This would be my first issue & pull request.

I think the use case is here for DuckDB PySpark and I am in the process of also doing a short writeup which shows how to switch the IOs between DuckDB spark and PySpark when using dagster.
Having these features in would make the code a little simpler for the writeup (it's not too hard right to make the right dagster IO class currently - you just need to use the .relation attribute, which isn't part of the spark API).

Tishj · 2023-11-10T18:27:38Z

Tishj
Nov 10, 2023
Collaborator

Those sound like great additions 👍

1 reply

TomBurdge Nov 10, 2023
Author

Awesome! thanks @Tishj. I haven't written the new unit tests yet. Once I have, I will make a pull request.

TomBurdge · 2023-11-14T11:07:46Z

TomBurdge
Nov 14, 2023
Author

@Tishj I have now made a pull request for this feature here.
Some more future easy wins might be rank, dense_rank and DataFrameWriter.json method.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Add more I/O functions to the DuckDB PySpark API & toPandas #9639

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Feature Request: Add more I/O functions to the DuckDB PySpark API & toPandas #9639

TomBurdge Nov 10, 2023

Replies: 2 comments · 1 reply

Tishj Nov 10, 2023 Collaborator

TomBurdge Nov 10, 2023 Author

TomBurdge Nov 14, 2023 Author

TomBurdge
Nov 10, 2023

Replies: 2 comments 1 reply

Tishj
Nov 10, 2023
Collaborator

TomBurdge Nov 10, 2023
Author

TomBurdge
Nov 14, 2023
Author