Conversation
| df = pd.DataFrame({'x': [1, 2, 3]}) | ||
| fastparquet.write(fn, df) | ||
| ddf = dd.io.parquet.read_parquet(fn) | ||
| import pdb; pdb.set_trace() |
|
I guess this will then be a PR in fastparquet. May as well have To be sure: I don't think that to_parquet should support single-file mode on write, as that would only be possible for a one-division dataframe. I was, however, thinking of the possibility of a simple function that could collect several isolated parquet files into a logical collection, if they had compatible schemas, of course. |
|
Were you planning on merging this here, and the changes above into fastparquet? |
|
Honestly I had forgotten about it. I've been swamped with a few other things. I'll try to get back to it in a couple of days. Feel free to steal it from me if you have time. |
|
NB failure fixed in dask/fastparquet#34 - is this test skipped in appveyor? |
|
This now passes following merger in fastparquet. |
|
Should we merge? |
|
I believe yes - adding a test that passes has to be a good thing! |
Took a quick stab at supporting single-file parquet datasets from dask.dataframe. Added a test here. Also tried making the following changes to fastparquet
Though now I have to step out for the day. cc @martindurant