Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible to write spark dataframes to glue tables in similar fashion as awswrangler.s3.to_parquet #1743

Open
aabid0193 opened this issue Nov 3, 2022 · 4 comments
Labels
backlog enhancement New feature or request

Comments

@aabid0193
Copy link

If it isn't possible already, it would be nice i we can use spark dataframes to write to glue tables using something similar to wranglers to_parquet method. It works great for pandas and has the ability to set the mode to overwrite partitions and was wondering if we can do this with spark dataframes.

@aabid0193 aabid0193 added the enhancement New feature or request label Nov 3, 2022
@emerson131
Copy link

wranglers to_parquet method. It works great for pandas and has the ability to set the mode to overwrite partitions and was wondering if we can do this with spark dataframes.

If you are using spark, i would image that simply converting your spark dataframe to a pandas one would get you there if you want to use the wrangler.

sparkDF.toPandas()

@aabid0193
Copy link
Author

yeah that is a possibility that you can do right now, however, for large datasets that required the use of spark this wouldn't be ideal

@aabid0193
Copy link
Author

aabid0193 commented Nov 4, 2022

Essentially what i'm wishing for is the ability to register Athena tables based on the Pyspark dataframe metadata. I see that this was implemented here: #29.
However, it seems to me that this method is no longer supported in the newer versions of wrangler. Additionally would like to overwrite partitions

@github-actions
Copy link

github-actions bot commented Jan 3, 2023

Marking this issue as stale due to inactivity. This helps our maintainers find and focus on the active issues. If this issue receives no comments in the next 7 days it will automatically be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlog enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants