Replies: 2 comments 1 reply
-
Those sound like great additions 👍 |
Beta Was this translation helpful? Give feedback.
1 reply
-
@Tishj I have now made a pull request for this feature here. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Why do you want this feature?
The DuckDB PySpark API is a brilliant idea and it doesn't yet cover some core parts of PySpark's API.
Some useful attributes to add to the experimental PySpark DataFrame class are:
.write.parquet
.write.csv
.toPandas
I have implemented them in my own fork.
This was very simple because the necessary functions already exist in the Spark
relation
attribute (duckdb.DuckDBPyRelation
).I will make a pull request if the core team thinks this is a good idea!
This would be my first issue & pull request.
I think the use case is here for DuckDB PySpark and I am in the process of also doing a short writeup which shows how to switch the IOs between DuckDB spark and PySpark when using dagster.
Having these features in would make the code a little simpler for the writeup (it's not too hard right to make the right dagster IO class currently - you just need to use the
.relation
attribute, which isn't part of the spark API).Beta Was this translation helpful? Give feedback.
All reactions