-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mapping Rust tuples to HDF5 compound types #19
Comments
I'm not quite sure of how the native library handles this, but I hope the spec does not require fields to be packed (as in, no padding between fields). Otherwise, that would leave a lot of room for UB, and all hopes for no-ops would go down the drain.
I would prefer not to rely on this, since in theory it's something that can change between compiler versions, and we're doing persistent data objects meant to be portable across multiple systems through space and time (I know, this last part was exaggeratedly poetic).
I'm probably overlooking something, but we can ensure this layout in user code with |
There's not many requirements, mostly just common-sense kind of restrictions; HDF5 doesn't require structs to be packed, and it doesn't require any kind of fixed layout either. The offsets of consecutive fields should not decrease, and fields should not overlap.
As a matter of fact, we're already requiring I wonder if that's overly restrictive, since there's also
A little correction, not "we", but rather the users... as for "us", we can't really ensure it :)
Of course the users can use structs instead of tuples, and use Another compromise, I guess, would be to support all tuples where the fields don't get reordered (this will be have checked at runtime, something like |
Right, that's what I meant. Although we can, in a way, check whether the type's layout is compatible at run-time (totally not ideal, O
We'd have to explain that tuples have a very specific behaviour in Rust, and we cannot enforce how they are aligned in memory. Can't think of anything better without delving into obscure procedural macros. The way I currently see things is that if there is to be a way to work with packed structs and tuples, conversions have to take place when reading and writing. Attempts of no-op may seem promising, but all code relying on it would be standing on thin ice. |
Packed where? In memory or in file? If it's packed in file - HDF5 soft conversions would take care of it. As for packed in memory - I don't really wish to support that out of the box, so that we just strictly require repr(C) for memory types. If the users really want/need to,
Yea. So with some (most?) tuples it will "just work", and the conversion will be no-op. With other tuples it would require a soft conversion. One important takeaway here is that perhaps unsafe pub trait H5Type {
// This is "file" descriptor; this is how the type will be stored in HDF5 if we create a new
// dataset with it. It may not be a direct memory mapping of the underlying type and
// may require conversion.
fn type_descriptor() -> TypeDescriptor;
// This is "memory" descriptor; it is a description for HDF5 internal routines of how our
// types are laid out in memory -- this can depend e.g. on Rust compiler version.
// Datatypes based on this descriptor never get stored in files.
// Defaults to being the same as file descriptor (true for most types), but can be overridden.
fn memory_descriptor() -> TypeDescriptor {
Self::type_descriptor()
}
} Hmm... I guess we could even add fn requires_conversion() -> Option<Conversion> {
None
} This way, we can at least query at runtime whether a tuple type is "irregular" (some other types like varlen arrays/strings also always require soft conversion because they require allocations). Detailed example: // this would create a dataset with "file type":
// file type = [("0", i8, 0), ("1", u64, 8), ("2", f32, 16)], sizeof=20
ds = file.new_dataset::<(i8, u64, f32)>::create_anon(2)?;
// this would attempt to write from a memory buffer of "memory type":
// memory type = [("1", u64, 0), ("0", i8, 8), ("2", f32, 12)], sizeof=16
ds.write(&[(1, 2, 3.), (4, 5, 6.)])?; // <-- this fails ("soft conversion required");
ds.as_writer().soft().write(&[(1, 2, 3.), (4, 5, 6.)])?; // <-- this works On a side-note, I really don't want to make soft-conversions default (like they are in h5py/HDF5), this being Rust I'd prefer to be warned of any non-zero-cost actions, especially when they may be very obscure and unobvious (I honestly didn't know about clip-by-default policy when shrinking integer bytesize being the default until I've started digging into this...) |
Yes, in file. We agree here that there must be a soft conversion in this case. 👍
For the case of tuples however, it's hard for me to endorse the idea, even with the suggested work-around: // This is "memory" descriptor; it is a description for HDF5 internal routines of how our
// types are laid out in memory -- this can depend e.g. on Rust compiler version.
// Datatypes based on this descriptor never get stored in files.
// Defaults to being the same as file descriptor (true for most types), but can be overridden.
fn memory_descriptor() -> TypeDescriptor {
Self::type_descriptor()
} How would this one be implemented for the example tuple |
You mean the other descriptor? Because the memory descriptor must literallly describe what’s in the memory. In this case it has to be option 1 from my first message. As for the “in-file descriptor” if you’re creating new datasets with this tuple, I see at least two ways:
By “converting to repr(C)”, I mean one of the two things:
|
(Related chapter in Unsafe Code Guidelines: https://github.com/rust-rfcs/unsafe-code-guidelines/blob/master/reference/src/representation/structs-and-tuples.md) |
That’s my whole point, we don’t need to ensure consistency between memory descriptors. They map to internal memory representation, whatever’s hardcoded in the current compiler version, and never get published anywhere; whereas “file descriptors” describe the type of new datasets that get stored in the file. |
@Enet4 so I'm almost done with it, it seems, I think it will work -- had to implement C alignment and layout logic manually due to current stupid limitations ("can't use outer type parameters") but it wasn't too bad... Another thing I noticed was -- currently if you provide a one-element tuple, like // obviously, empty/unit tuples are banned as well since they make no sense |
Ok, done (changes pushed to 2018 branch); summary:
Will probably require some further tests to be added at a later point, but so far everything looks good. |
While working on a test suite for dataset reading/writing, I've encountered a curious problem.
Facts:
H5Type
).Here's an example (playground): field offsets for tuple
(i8, u64, f32)
are(8, 0, 12)
– Rust reorders the fields so the largest one comes the first, and is rightfully free to do so.Now, we want to bind this to an HDF5 compound datatype, with fields named "0", "1" and "2". In HDF5, however, the offsets must be increasing strictly; providing a decreasing offset will be considered as datatype shrinking and will most likely yield an error.
So, we have a few options:
Compound datatype with fields ["1", "0", "2"] and offsets [0, 8, 12].
This can be mapped directly to the Rust tuple and won't require any conversion (no-op). However, the fields are ordered in a weird way. This ordering also depends on the internals of the Rust compiler (although this part isn't likely to change).
Compound datatype with fields ["0", "1", "2"] and offsets [0, 8, 16].
The fields now have the same order as in Rust, however the memory layout is different (and the data element has a bigger size). However, due to incompatible memory layout, this will require a soft conversion each time the dataset is being read/written (an extra step and an extra memory allocation). This is pretty weird and confusing as well, e.g. you want to create a new dataset with this tuple type, and suddenly you're being asked to enable soft conversion. It would also be hard for the crate user to predict whether a given tuple type would require enabling soft conversion for reading/writing – and this would require knowledge of compiler internals.
Compound datatype with fields ["0", "1", "2"] and offsets [0, 8, 12].
This doesn't require soft conversion, the fields in HDF5 are named 0-1-2, but "0" is now field 1, and "1" is now field 0, which is confusing.
? (any other options I've missed?)
For reference: tuple type binding implementation if it's of any help (it's pretty hairy macro stuff).
The text was updated successfully, but these errors were encountered: