Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make common convenience methods for aggregating data for Group.members and Bundle. #8

Closed
dotsdl opened this issue Jan 29, 2015 · 2 comments
Assignees
Milestone

Comments

@dotsdl
Copy link
Member

dotsdl commented Jan 29, 2015

Group.members and Bundle are intended to make it easy to manipulate many Containers at once, but currently they only give access to the objects themselves. It would be useful to include methods that yield aggregate information from these collections. Both objects would have these methods in common.

For example, could have

Bundle.data, which gives access to concatenations of stored pandas data sets. It grabs any datasets it can that match the handle given, and tries to concatenate them. Would be useful for quickly aggregating and manipulating ensemble data.

Bundle.tags, which gives all tags present in the collection. Could have keywords for any and all criterion for what to return.

@dotsdl
Copy link
Member Author

dotsdl commented Feb 18, 2015

This may require a thin abstraction layer between Bundle and its underlying members in order for it and Members to include the same methods.

@dotsdl dotsdl self-assigned this Apr 12, 2015
@dotsdl
Copy link
Member Author

dotsdl commented Apr 12, 2015

Working on this now. The basic idea is that Bundle and Members will look similar in interface to a Container, having tags, categories, and data properties that allow manipulations on all members.

One point of interest: I currently have a Group.members.data.retrieve() method that will work for dataframe data by creating a multi-index concatenation of the dataframes for each member corresponding to the given handle. The first index level contains the name of the member (which is actually a minor problem, since names need not be unique (though neither must indexes)), while the second level gives the original index. This can work fine for dataframes and series, but will simply not work for Panel and Panel4D structures.

I'm not sure what reasonable aggregation scheme for these structures should be. Further, what's a reasonable structure for data that are numpy arrays or pure python (pickled) structures? Should we just through the structures in a list, essentially doing:

out = [ member.data.retrieve(handle) for member in group.members ]

Thoughts?

@dotsdl dotsdl added this to the 0.5.0 milestone Apr 12, 2015
@dotsdl dotsdl modified the milestones: 0.5.0, 0.5.1 Jun 22, 2015
@dotsdl dotsdl closed this as completed in 690560c Jul 7, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant