-
-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Appending parquet file from python to s3 #327
Comments
The |
Thanks. That works fine, however, while writing it in s3, this also creates a copy of the folder structure in my machine, is it expected ? |
I'm afraid I don't follow you - can you please describe exactly what you did and what happened? |
Ok. Let me explain. I have this folder structure in s3 -
I am running this in Jupyter notebook, when I run this, everything works fine and s3 path looks like this,
However, in my local machine, I have this folder structure created automatically - |
OK, understood. No, the files, are not first created locally and copied. As documented , you should supply not only the function to open, but also the function to make directories. In the case of s3, there is no such concept as directories, so the function your need to provide should not actually do anything, but you still must provide it to avoid using the default, which makes local directories. |
Here is my snippet in spark-shell
jdbcDF.write.mode("append").partitionBy("date").parquet("s3://bucket/Data/")
Problem description
Now, i am trying to do the same thing in python with fastparquet.
First thing, I tried to save as snappy compression,
write('****/20180101.snappy.parquet', data, compression='SNAPPY', open_with=myopen)
but got error,
Then, tried to use GZIP, it worked, but not sure how I can append or create partition here. Here is an issue I created in pandas. https://github.com/pandas-dev/pandas/issues/20638
Thanks.
The text was updated successfully, but these errors were encountered: