Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UX Discussion: Returning datasets like numpy arrays #14

Open
rwegener2 opened this issue Aug 29, 2023 · 0 comments
Open

UX Discussion: Returning datasets like numpy arrays #14

rwegener2 opened this issue Aug 29, 2023 · 0 comments

Comments

@rwegener2
Copy link
Collaborator

This is a bit of a very early stages idea, but I'm just putting a ticket up for documenting and sharing ideas. I think it would be really interesting for an H5Dataset to be something of a subclassed numpy array. I'm interested in that object acting like a numpy array but having additional h5coro specific attributes and methods.

I still need to figure out exactly what this means, probably by doing some playing around. I oscillate between being really excited about it and feeling like we should leave super user-oriented data structures up to the higher level libraries like pandas or xarray.

I dug into the feasibility of this and it turns out that actually subclassing numpy arrays is tricky and, overall,
not encouraged (reference). There are, however, a few pages that discuss creating numpy-compatible containers: See interoperability or custom array containers. Of particular interest was the example I read using pandas dataframes as an example of achieving numpy interoperability with a separate array-like data structure. Dask arrays were another interesting example, in particular because dask also has to deal with an async/lazy loading paradigm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant