-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split input into multiple partitions on request? #3
Comments
Yes, and support efficient buffering and appends, like bcolz. But I would say, that is a feature for the future. It also depends on what our 'clients' want. |
wouldn't this be better suited for dask? something like df.repartition(by='H') |
currently repartition is a bit more generic, but i could imagine an index-type-sensitive version of it |
This would still be important for data access. The idea here is that one might want data that is accessible in hourly chunks (or some other fixed period.) While dask could repartition this data it would still have to read possibly larger chunks at a time. |
hm thinking a bit more i can see the value of having the splitting done as soon as you call |
Fixed by #40. Closing. |
There is the use case of "please partition my data by hour, regardless of how much data I give to you". is this something that we want to support on top of castra?
The text was updated successfully, but these errors were encountered: