You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a bit of a very early stages idea, but I'm just putting a ticket up for documenting and sharing ideas. I think it would be really interesting for an H5Dataset to be something of a subclassed numpy array. I'm interested in that object acting like a numpy array but having additional h5coro specific attributes and methods.
I still need to figure out exactly what this means, probably by doing some playing around. I oscillate between being really excited about it and feeling like we should leave super user-oriented data structures up to the higher level libraries like pandas or xarray.
I dug into the feasibility of this and it turns out that actually subclassing numpy arrays is tricky and, overall,
not encouraged (reference). There are, however, a few pages that discuss creating numpy-compatible containers: See interoperability or custom array containers. Of particular interest was the example I read using pandas dataframes as an example of achieving numpy interoperability with a separate array-like data structure. Dask arrays were another interesting example, in particular because dask also has to deal with an async/lazy loading paradigm.
The text was updated successfully, but these errors were encountered:
This is a bit of a very early stages idea, but I'm just putting a ticket up for documenting and sharing ideas. I think it would be really interesting for an H5Dataset to be something of a subclassed numpy array. I'm interested in that object acting like a numpy array but having additional h5coro specific attributes and methods.
I still need to figure out exactly what this means, probably by doing some playing around. I oscillate between being really excited about it and feeling like we should leave super user-oriented data structures up to the higher level libraries like pandas or xarray.
I dug into the feasibility of this and it turns out that actually subclassing numpy arrays is tricky and, overall,
not encouraged (reference). There are, however, a few pages that discuss creating numpy-compatible containers: See interoperability or custom array containers. Of particular interest was the example I read using pandas dataframes as an example of achieving numpy interoperability with a separate array-like data structure. Dask arrays were another interesting example, in particular because dask also has to deal with an async/lazy loading paradigm.
The text was updated successfully, but these errors were encountered: