-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide examples #9
Comments
I'm planning to work on the datasets reading/writing in February, at which point it will become fully useable. I've put quite a lot of effort to make the bindings thread-safe (which other wrapping libraries don't really care about), this includes locking operations with reentrant mutexes and providing some helper macros. I guess for now you can look at the tests for each module (file/group) to see how it works. There's one major stumbling point (or a problem of choice, rather) which prevented me from implementing datasets earlier -- how exactly to deal with reading/writing structured datasets, I don't have a good idea of how the API should look like (given there's no proper runtime struct introspection in Rust). |
I was also thinking of using |
@aldanor Maybe I don't understand correctly, what you are getting at. But why do you need runtime introspection? Wouldn't rust-serialize or serde help you, to do the job at compile time? |
@mokasin Because that's how HDF5 works:
As a matter of fact, I've written a little utility package which should help with all this: https://github.com/aldanor/typeinfo |
I was particularly interested in reading datasets from HDF5 files, for which so far there is no high-level API for Rust. Is there any progress on this particular feature? |
@Enet4 As a matter of fact, I'm sort of working on that right now because I need it too :) There's a few stumbling blocks to which I have no particular solutions yet, for instance how to represent the variable-length types (strings in particular) -- and whether to support them at all. |
It's more of a matter of figuring things out than anything. If anyone in this thread has time/desire to discuss possible ways of handling compound / vlen datatypes in Rust -- this could speed things up 😉 |
I'm a beginner on Rust, unfortunately. Still, I am interested in this and would like to keep in touch with any discussion that will hopefully take place, perhaps even managing to contribute with my thoughts. |
The main question is what to do with types that are not For element of a variable-length datatype, I think struct hvl_t {
len: size_t,
p: *mut c_void,
} However it then has to be somehow translated to Rust structs with proper drop semantics etc. |
Wrapping structs like this in Rust is possible, but if there's a As of today, the only way to avoid that is to add I wonder if that'd be an acceptable solution. |
Proof of concept: https://gist.github.com/aldanor/8ca3bb807c79836455a87c699d844efe |
IMHO: If this is the only feasible solution, I'd say go for it. A crate only working on nightly is better than a non working crate. And either this stabilise itself or someone finds a different solution later. |
Couldn't there be an implicit conversion between a type with normal Rust semantics and an internal, "unsafe" implementation whenever that is needed? Or am I just missing the point? |
@Enet4 Would you care to elaborate on what exactly do you mean by "implicit conversion"? Could you provide an example? Imagine a case where you read the dataset where the datatype is compound, one field is an int, another a variable-length string. struct A {
int x; // normal field
struct { // vlen field
void *p;
size_t len;
} y;
}; Now assume you have 1 billion of these (so calling 1 billion constructors is out of question) taking almost all of your RAM (so copying all of the data is out of question), you have a C pointer to this data which you received from HDF5, what would you do next to construct a zero-copy Rust view on this which behaves like a bunch of normal Rust objects and ensures the proper cleanup? For each of the variable-length elements, HDF5 will The only way to have both The user-code would look like this, I think: h5def! { // macro that enables extracting offset/type information at runtime
pub struct A {
pub x: u32,
pub y: VLString, // variable-length zero-copy string view on a malloc'd buffer
}
} |
Does this also mean, that a user of this crate would have to drop the data himself? |
No, quite the opposite -- the whole point here is to be able to cast C-allocated buffers (potentially nested in weird ways) into Rust-managed structs that would automatically free C memory in accordance with normal drop semantics. Hypothetical example: {
let data = dataset.read::<A>().unwrap();
} // nested heap-allocated elements owned by `data` dropped here, e.g. var-len strings The I'll still have to double-check that |
I pondered a bit more on this and prototyped the possible implementation for fixed size strings / var len strings / var len arrays, and I think the compromise is to only support fixed-length strings and fixed-size arrays on stable, and also support variable-length strings/arrays on nightly. This way the library would still work on the stable / beta channels, and once the drop flag business is solved upstream, everything including var-len datatypes would work on stable as well. |
A little update in case anyone's still interested :) Good news, I've been able to make the type system work (the Here's an example: import h5py
f = h5py.File('foo.h5')
arr = np.core.rec.fromarrays([[1, 2, 3, 4], ['foo', 'bar', 'x', ''],
[True, True, False, True]], names='a,b,c')
f['test'] = arr
f.close() This now works (local dev version, this hasn't been pushed yet): #[macro_use]
extern crate hdf5_rs;
use hdf5_rs::new_datatype;
use hdf5_rs::Container;
use hdf5_rs::FixedString;
fn main() {
let f = hdf5_rs::File::open("foo.h5", "r").unwrap();
let ds = f.dataset("/test").unwrap();
h5def!(
#[derive(Debug)]
struct T {
a: i64,
b: FixedString<[u8; 3]>,
c: bool,
}
);
let arr = ds.read::<T>().unwrap();
println!("{:?}", arr);
} which prints (with whitespace formatted) [T { a: 1, b: "foo", c: true },
T { a: 2, b: "bar", c: true },
T { a: 3, b: "x", c: false },
T { a: 4, b: "", c: true }] shape=[4], strides=[1] |
^ This looks good. Any plans to push this? |
@aldanor May I ask what is the read/write functionality status now? I am working on a project heavily involved with hdf5, and really wish to have it coded all in Rust. Thanks! |
A tiny re-bump here. How would one read and write simple datasets of a scalar type at this time? Would I need to be aware of chunks? In my use case, I'm only using chunks in order to make the dataset resizable, and I would prefer using an abstraction over them. As in, treating the dataset as a contiguous, elastic n-dimensional array of data. |
@Enet4 @andysureway-10x sorry for the delay folks, I fell off the face of the earth for a bit with the pycon and other stuff; haven't had time to finish the types/read/write branch as of recent but hope to do it (even if partially) reasonably soon :) @Enet4 No, generally you won't need to be aware of the chunks for reading, HDF5 takes care of that (unless you're trying to do something very smart). Resizable datasets are generally not a great idea in HDF5 from my experience; plus, they've only just added the functionality to reclaim space in 1.10, in previous versions you end up having to repack. |
I'm actually just stacking data to a 4-dimensional dataset on a specific axis. I end up relying on a chunked dataset because I do not know in advance how many volumes I'll be stacking nor how large they are, and I may wish to increase it at a later time without creating copies (think GB scale). So I don't need to remove or modify existing data. Nevertheless, I'm looking forward to this. Keep up the good work. :) |
By the way, in regard to strings, it looks like there will have to be four different string types unfortunately and not just two, something like So you would basically have to use one of these four as struct fields in order to deal with strings. As for attributes / dataset names, could probably just use ASCII.. |
I suppose that would be fine, as long as those elements and attributes can be trivially converted (even if explicitly) to string slices. |
@Enet4 Yes most traits you'd expect from a string type are already there: https://github.com/aldanor/hdf5-rs/blob/feature/types/src/types/fixed_string.rs#L106 |
@aldanor - Sir, is there any chance of an updated example? The keywords and some datatypes from what I can see, don't seem to be in the documentation. I attempted to use your above example, however "FixedString" or the "h5def!" macro doesn't seem to work... Appreciated! :) |
An example has been added to the README. |
What's the current state of this? It seems like out of all the hdf5 bindings out there, yours seems to be the most actively developed.
Are you planning to provide some example on how to interact with
hdf5-rs
, or do you think it is not ready for consumption yet?The text was updated successfully, but these errors were encountered: