Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] A first proposal for a Table object #199

Closed
wants to merge 1 commit into from
Closed

[WIP] A first proposal for a Table object #199

wants to merge 1 commit into from

Conversation

FrancescAlted
Copy link
Contributor

This is a more cleaner PR for the proposal. Will close previous #198, and will copy comments to here.

@FrancescAlted
Copy link
Contributor Author

Andy commented:

We also need to stick somewhat close to a Pandas Dataframe syntax. So it would be good to have an index for different arrays, Might be good to add an index object to an array first...

@FrancescAlted
Copy link
Contributor Author

@aterrel Sounds good. If we want to replicate pandas dataframe structure then probably the columns cannot be pure blaze Array objects, but something more like a pandas Series.

The list of column data to build the Table object. This list would
typically be made of Blaze Array objects, but could also understand
DyND or NumPy arrays. A list of lists or tuples is valid too, as
long as they can be converted into barray objects.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be better described as "objects convertible to Blaze Arrays"

@aterrel
Copy link
Contributor

aterrel commented Mar 25, 2014

At the call today @mwiebe and I thought it would be best to add a Series object that is just like the Panda's Series object to hold an index.

@mrocklin
Copy link
Member

mrocklin commented Apr 7, 2014

This abstraction seems to implement a table as a set of named/typed columns. While this layout matches systems like Pandas and other column stores I suspect that it will be difficult to match row/tuple-based systems like SQL. If we intend to target non-column-store systems then this abstraction might get in the way. Perhaps the full Table abstraction should be even more abstract. Perhaps the work in this PR would be better named as ColumnStore implementing Table or something along those lines?

Disclaimer: I haven't been in the conversation until now, so this might be a dumb idea.

@aterrel
Copy link
Contributor

aterrel commented Apr 7, 2014

@mrocklin I don't think it is a dumb idea, but the intention for me is to get a specific table api up and going compatible with the Pandas API first the perhaps branch out to the different variants.

The orthogonality of data sources should be provided by the data descriptor layer. Thus mapping expressions on the Table to the sql versus other stores happens at the io/compute interaction.

So while there is a Table and we are targeting column oriented stores, we can take those operations and interpret them differently on other backends. At least that's the current plan.

@mrocklin mrocklin closed this May 2, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants