[WIP] A first proposal for a Table object #199

FrancescAlted · 2014-03-22T08:48:24Z

This is a more cleaner PR for the proposal. Will close previous #198, and will copy comments to here.

FrancescAlted · 2014-03-22T08:49:03Z

Andy commented:

We also need to stick somewhat close to a Pandas Dataframe syntax. So it would be good to have an index for different arrays, Might be good to add an index object to an array first...

FrancescAlted · 2014-03-22T08:58:12Z

@aterrel Sounds good. If we want to replicate pandas dataframe structure then probably the columns cannot be pure blaze Array objects, but something more like a pandas Series.

mwiebe · 2014-03-24T00:31:24Z

blaze/objects/table.py

+        The list of column data to build the Table object.  This list would
+        typically be made of Blaze Array objects, but could also understand
+        DyND or NumPy arrays.  A list of lists or tuples is valid too, as
+        long as they can be converted into barray objects.


I think this would be better described as "objects convertible to Blaze Arrays"

aterrel · 2014-03-25T18:33:31Z

At the call today @mwiebe and I thought it would be best to add a Series object that is just like the Panda's Series object to hold an index.

mrocklin · 2014-04-07T14:37:23Z

This abstraction seems to implement a table as a set of named/typed columns. While this layout matches systems like Pandas and other column stores I suspect that it will be difficult to match row/tuple-based systems like SQL. If we intend to target non-column-store systems then this abstraction might get in the way. Perhaps the full Table abstraction should be even more abstract. Perhaps the work in this PR would be better named as ColumnStore implementing Table or something along those lines?

Disclaimer: I haven't been in the conversation until now, so this might be a dumb idea.

aterrel · 2014-04-07T15:12:53Z

@mrocklin I don't think it is a dumb idea, but the intention for me is to get a specific table api up and going compatible with the Pandas API first the perhaps branch out to the different variants.

The orthogonality of data sources should be provided by the data descriptor layer. Thus mapping expressions on the Table to the sql versus other stores happens at the io/compute interaction.

So while there is a Table and we are targeting column oriented stores, we can take those operations and interpret them differently on other backends. At least that's the current plan.

Adding a first proposal for the table object

bb99224

FrancescAlted mentioned this pull request Mar 22, 2014

[WIP] A first proposal for a Table object #198

Closed

mwiebe reviewed Mar 24, 2014
View reviewed changes

mrocklin closed this May 2, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] A first proposal for a Table object #199

[WIP] A first proposal for a Table object #199

FrancescAlted commented Mar 22, 2014

FrancescAlted commented Mar 22, 2014

FrancescAlted commented Mar 22, 2014

mwiebe Mar 24, 2014

aterrel commented Mar 25, 2014

mrocklin commented Apr 7, 2014

aterrel commented Apr 7, 2014

[WIP] A first proposal for a Table object #199

[WIP] A first proposal for a Table object #199

Conversation

FrancescAlted commented Mar 22, 2014

FrancescAlted commented Mar 22, 2014

FrancescAlted commented Mar 22, 2014

mwiebe Mar 24, 2014

Choose a reason for hiding this comment

aterrel commented Mar 25, 2014

mrocklin commented Apr 7, 2014

aterrel commented Apr 7, 2014