Split input into multiple partitions on request? #3

mrocklin · 2015-06-07T21:50:36Z

There is the use case of "please partition my data by hour, regardless of how much data I give to you". is this something that we want to support on top of castra?

esc · 2015-06-09T11:05:58Z

Yes, and support efficient buffering and appends, like bcolz. But I would say, that is a feature for the future. It also depends on what our 'clients' want.

cpcloud · 2015-07-15T14:26:36Z

wouldn't this be better suited for dask?

something like

df.repartition(by='H')

cpcloud · 2015-07-15T14:27:03Z

currently repartition is a bit more generic, but i could imagine an index-type-sensitive version of it

mrocklin · 2015-07-15T14:28:21Z

This would still be important for data access. The idea here is that one might want data that is accessible in hourly chunks (or some other fixed period.) While dask could repartition this data it would still have to read possibly larger chunks at a time.

cpcloud · 2015-07-15T14:29:08Z

hm thinking a bit more i can see the value of having the splitting done as soon as you call extend

jcrist · 2015-08-27T20:31:54Z

Fixed by #40. Closing.

cpcloud added the enhancement label Jul 15, 2015

This was referenced Aug 26, 2015

Support for on-disk appends, partitioning #36

Open

Add extend_sequence #40

Merged

jcrist closed this as completed Aug 27, 2015

slavi mentioned this issue Sep 14, 2015

Partitioning in extend_sequence does not work if len(seq) == 1 #50

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split input into multiple partitions on request? #3

Split input into multiple partitions on request? #3

mrocklin commented Jun 7, 2015

esc commented Jun 9, 2015

cpcloud commented Jul 15, 2015

cpcloud commented Jul 15, 2015

mrocklin commented Jul 15, 2015

cpcloud commented Jul 15, 2015

jcrist commented Aug 27, 2015

Split input into multiple partitions on request? #3

Split input into multiple partitions on request? #3

Comments

mrocklin commented Jun 7, 2015

esc commented Jun 9, 2015

cpcloud commented Jul 15, 2015

cpcloud commented Jul 15, 2015

mrocklin commented Jul 15, 2015

cpcloud commented Jul 15, 2015

jcrist commented Aug 27, 2015