-
-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request for feedback: 0.8 to 1.0 #67
Comments
I'm still interested in #26, but haven't had any time to play more with rkyv yet. At work we mostly use cbor (with zstd compression) and a little bit of json/protobuf, but I think that rkyv combined with zstd compression would be something we might consider in the future. |
#26 is still on my list, but since it's going to be part of a larger schema system it'll probably be a bit before I get something workable in people's hands. I'll consider splitting out just the functionality needed for that issue and releasing it sooner. |
Along the lines of "What needs improvement or is holding you back", I was really excited to try out rkyv after seeing your benchmark post, but unfortunately it ended up being larger than serde_cbor for my usecase (serializing a dictionary + inverted index for searching). The data looks like: #[derive(Archive, Serialize)]
struct CedictIndex {
entries: Vec<Entry>,
inverted_index: HashMap<String, Vec<(u32, u8)>>,
version: String,
}
#[derive(Archive, Debug, Serialize)]
struct Entry {
trad: String,
simp: String,
pinyin: String,
definitions: Vec<String>,
frequency: u32,
} Rkyv is about 32MiB and sede_cbor about 24MiB. Totally fine if that's just how it is, but I was wondering if you had any idea why cbor is smaller for this particular use case, given that rkyv was always smaller in the bench. Thanks!! |
In this particular case I wonder if the hashmap has something to do with it. Have you tried comparing the outputs after compression? |
The hashmap implementation rkyv uses is compress-hash-displace, which has the tradeoff of taking longer to serialize but (usually) using less memory on-disk while still being a usable hashmap. I believe that serde_cbor uses a list of pairs style for serialization, which means that it doesn't have any overhead on-disk. One option for solving this problem is making a wrapper type (e.g. |
Linking #68 |
Thanks for pointing out the compression bit - When compressed with gzip, serde_cbor is still smaller by 2MiB.
Thanks, I'll give this a shot and see how it goes! Also, I tried commenting the HashMap out of the struct -- without that field, CBOR is actually still smaller, but only by about ~0.5 MiB (1 MiB after compression.) So maybe there's something else at play too... |
We're using The microkernel is based around shoveling memory pages back and forth. A memory page is allocated on a multiple of 4 kB, which we fill with data. On the other side, we use An example of something like that is the This then gets bundled into a We're still working to get libstd working in our operating system, so we're still working on how to work with Also, with |
Thanks for trying out rkyv! It's exciting to see it getting used in xous, especially since I think it will bring focus to a lot of aspects that aren't exercised by most uses. I definitely agree that writing archive implementations for containers is not ergonomic, and that's something I'd be interested in looking at improving over time. And even if improving the ergonomics isn't possible, reaching out to other crates like
|
Is stream something to look into later? Maybe we should do a feature comparison between rkyv and flatbuffers and capn' proto, to know if rkyv supports something others wanted? |
I think streaming and async would be cool, but I hesitate to add them without any use cases in mind. Making all the serializers async would add a lot of complexity and could impact performance. I'm not totally convinced of what benefit it would have either. That said, Cap'n Proto does support reading data before the whole message has arrived. It can do that because it writes outer objects before inner objects, which is the opposite of how rkyv (and FlatBuffers?) works. Right now rkyv always writes inner objects before outer ones. We might be able to switch that by changing the serializer to write backwards (i.e. position starts at zero and decreases with every write), but there's more thinking to do here. To move forward on streaming, we need some use cases and some ideas for placing outer objects before inner ones. I think doing feature comparisons is a great idea. You can't beat your competitors if you can't compete! Schema evolution is a big one that rkyv doesn't have, but there are definitely more. I've been skimming through the Cap'n Proto site as a place to start. If you spot any major feature gaps, that would be a great discussion starter. Another one that I noticed Cap'n Proto has is on-demand validation, which only validates the parts of a message that you actually read. It gives a pretty sizeable advantage in some of the benchmarks, and could be a common use case. I think bytecheck would be the right place to add it, and it could probably support on-demand validation with a little elbow grease and macro magic. |
Following up on @xobs comments, we were just trying to integrate rkyv 0.4.1 into our latest version of the OS and got blocked by the fact that |
You shouldn't need So there might be a bit of confusion here around whether you need to implement If you have some code available that demonstrates your problem I can also take a look at it and see if I can help. 🙂 |
Sure...I could completely believe we are doing this the wrong way, but basically, what is the right way to do the equivalent of what was Obviously, this doesn't work because we don't specify a deserializer, but when I looked here: https://github.com/djkoloski/rkyv/blob/master/rkyv/src/de/deserializers.rs I noticed that every type of deserializer provided required std, so I figured that was the end of the story and reverted the refactor. FYI this is how we do it in 0.3.1, and it works: All of our types are non-allocating types, as we don't have alloc and it's all stack-based right now. I actually don't follow what you mean by the requirements of the Deserializer requiring Fallible. If you could please provide a really simple example of how to implement one, I can use that as a template to make our own; but more or less I'm just trying to copy what's put in the examples given at https://docs.rs/rkyv/0.4.1/rkyv/trait.Archive.html. rkyv does magical things that I don't understand at a deep level; I understand just enough to do copypasta and be dangerous to myself and possibly others. |
Ah, that makes sense! This is something that the book covers under the probably poorly-named Extensions page. Some background as to the The way that // This type can be deserialized with any deserializer
impl<D: Fallible + ?Sized> Deserialize<MyType, D> for ArchivedMyType {
// ...
}
// This type needs deserializers that implement Deserialize because it needs to allocate memory
impl<D: Deserialize + ?Sized> Deserialize<MyAllocType, D> for ArchivedMyAllocType {
// ...
} All deserializers (and serializers) must declare what errors they can produce upfront by implementing With that out of the way, as for actually solving your issue: you just have to make your own Deserializer: pub struct XousDeserializer;
// Unreachable enum pattern, swap out for the never type (!) whenever that gets stabilized
pub enum XousUnreachable {}
impl rkyv::Fallible for XousDeserializer {
type Error = XousUnreachable;
} Usage: result.deserialize(&XousDeserializer); This might be worth providing a
This is actually what I want; don't be afraid of zero-copy deserialization! 🙂 |
Ah! ahah. Yes, it was not clear to me that something as simple as that was necessary. Fortunately I put this all in a branch and I'll try picking up the refactor once again. Thanks again for the help explaining that. And yes, a BasicDeserializer would go a long way toward helping out a newcomer like me. It was not obvious (to me) from pattern matching that the null element was in the set of valid deserializers. I did even stumble on the Extensions page but it didn't register to me what you were saying, because also I had not encountered Fallible as a type yet, I've only used vanilla Result Ok/Error patterns in my code to date. Whenever I encounter macros in Rust my eyes glaze over, so anything that involves a Derive is basically a black box to me even if it's quite trivial. I'll get used to Rust macro syntax someday; it's very powerful and I see why they had to do it, but it's almost a different language from non-macro Rust unto itself. |
I've been experimenting with rkyv in a fairly narrow use case: an immutable, on-disk key value store. At my dayjob, we use discodb extensively. What caught my eye about your project was that you use the same perfect hashing algorithm as disco. Querying a single key from a serialized hashmap is basically instantaneous. Super impressive! Hyperfine has a tough time scoring it it's so fast. Disco uses a huffman encoding for compression and my rust skills can't beat it in speed. I did implement zstd block compression on each value (resulting in smaller size on disk), but I need to find a faster way of iteratively decoding the blocks. (I saw your answer in #48, but compressing the whole obj negates the benefit of memory map reads afaict) Disco also supports multiple values per key. I implemented this with an indexset to assign an id per unique value and a hashmap of keys pointing to a vector of ids. I collect the indexset into a vector for serialization. In short, rkyv let a novice rust coder build a mostly faster kv store! My feature requests are: first class functions for reading and writing to disk and compression |
That's awesome! These are the sorts of use cases that I'm looking for inspiration from. When you're talking about performing compression on the values, did you compress the values and store them in the hashmap? I think there are some interesting possibilities for compression like object interning and compression contexts. These are things I'd be keen on implementing in some accessory packages (e.g. For reading and writing to disk, I think some helpers in a new crate ( |
Yup! I was rather pleased with the compression. I sample the vector and build a zstd training dict before block compressing each vec entry in an iter_mut. The struct to rkyv (as a verb?) has the hashmap, vec, and zstd training dict. For keys with small numbers of values, this approach is faster and smaller on disk. I think I just need to think smarter about my query/decompression loop when there are hundreds of values. I like the |
Hello! Thanks for your work on Rkyv! I'm from the team developing ICU4X, a library for internationalization written in Rust to target multiple platforms and lower-resource, client-side devices. We are looking into Rkyv as an option for delivery of locale data to power our APIs. These data are highly structured, consisting of mostly strings, numbers, and vectors of strings and numbers. Our high-level needs include:
I made a small writeup comparing Rkyv with Postcard and Bincode, which you can find at unicode-org/icu4x#78 (comment). Rkyv Archive is extremely fast and small code, validating the design goals. Postcard wins on data size by a fair margin, consistent with your findings in https://davidkoloski.me/blog/rkyv-is-faster-than/. I'd say the main puzzle I'm trying to solve right now is Serde compatibility. Right now, it seems that Rkyv does not play very well with data structs that have lifetime parameters. For example, to achieve zero-copy string deserialization in JSON, Postcard, or Bincode, one writes a data struct such as #[derive(serde::Serialize, serde::Deserialize)]
pub struct DataStruct<'s> {
message: &'s str,
} However, the above struct does not compile with Rkyv. #[derive(rkyv::Serialize, rkyv::Archive, rkyv::Deserialize)]
pub struct DataStruct<'s> {
message: &'s str,
} Example errorserror[E0277]: the trait bound `&str: Serialize>` is not satisfied --> utils/litemap/benches/bin/litemap_rkyv_deserialize.rs:52:16 | 52 | serializer.serialize_value(&d).expect("failed to archive test"); | ^^^^^^^^^^^^^^^ the trait `Serialize>` is not implemented for `&str` | = note: required because of the requirements on the impl of `Serialize>` for `DataStruct<'_>` Similarly, the Rkyv Archive data structs being special types with Note that serialization speed is unimportant for us, because the data are serialized during the build process. We are happy to take hits in serialization performance in order to reduce data size or achieve other goals. |
Thanks for looking at rkyv! I think it could be a good fit for your use case. First to clarify, rkyv is not serde-compatible because serde only supports partial zero-copy deserialization. This means that while you can borrow large parts of your data from the source buffer, you will still need to create a new object to hold references to the borrowed memory. On the other hand, rkyv supports total zero-copy deserialization. This means that no extra memory is required aside from the data buffer. More concretely:
You can read more about how rkyv compares with serde in the book. rkyv doesn't provide serialization capabilities for references out of the box for a couple of reasons:
For these reasons, lifetimes are not supported in archived data. You can still get some of these features using shared pointers, but in general it helps to think of the archived type always having an implicit lifetime parameter in its reference:
Not 100% sure this answers the questions you have, so please let me know if I missed anything. |
I do know rkyv does not have serde compatibility, but does that mean we need to have third party to support rkyv (which is a similar case to pyo3), for examples popular crates like chrono only works with serde so putting rkyv means it won't work unless only primitive types are used. Or maybe in the future rkyv is aiming to be a serde replacement so we don't have to fracture the ecosystem like what serde did to rustc-serialize back then? |
I think of rkyv as a complement to serde, not really as a replacement. For third-party crates, I'm willing to reach out and provide implementations for their types (see #88), but hopefully in the future it's something that crates will provide out of the box. I don't think there's a really great solution unfortunately. If rkyv support doesn't gain traction among popular crates, I'll have to loop back and rethink how to approach this problem. |
I'll just chime in and say that schema evolution would be outstanding. |
I think that's reasonable, I'll update the thread name one more time. I think "0.8 to 1.0" gives the right framing: new features and breaking changes are welcome, and so are suggestions for stabilization work. |
I just wondering if rkyv should rush to 1.0 because from what I see quite some important features are not available. I think it's better to have few more releases and get more feedbacks until it's stable enough to release 1.0 |
Is cross language a goal #144? I am just wondering if it's possible as a consideration for helix plugin system helix-editor/helix#122 |
Cross-language compatibility is possible, but:
C++ is likely possible, but will probably require concepts to be usable (C++20) because rkyv makes heavy use of generics. Could you suggest what other languages you'd want supported? |
I'm not very familiar with this project nor the issue it tackles, but would C bindings be possible? |
C bindings can absolutely be offered, and the library actually comes with a
That would let you access data in a zero-copy fashion. If you wanted to serialize and deserialize data, you may need more advanced capabilities. This would probably require writing a version of rkyv for your language or using rust and using bindings. I'm not extremely well-versed in the capabilities of other languages, but as I understand it C++ would be capable of this with concepts (C++20). Getting a version of rkyv up and running in C++ would take a while. |
I'd love to see cyclic object graphs supported (see #214). It would make taking snapshots of running processes MUCH simpler! |
Just looping back to this thread -- we're contemplating upgrading |
Thanks for coordinating. 🙂 I'm planning to release 0.8 at some point in the next year, but the timeline is not solid yet. It will likely be a large change, but probably the final revision before 1.0. So 1.0 is on the horizon but not yet in sight. 0.8 is slated to bring some highly requested new features and address some long-standing functionality shortfalls. I think that the gap between 0.8 and 1.0 is likely to be small. |
sounds solid to me. I'll keep an eye out for 0.8 then. thanks! |
I haven't had a chance to use this crate yet, but was lightly investigating it for some potential embedded usage. However, one thing that turned me off was these features: size_16 = []
size_32 = []
size_64 = [] This means that I couldn't use one program to communicate with both an Arduino (where I'd want to use Data format settings shouldn't be incompatible at this broad of a level. |
Thanks for the feedback, as part of the work for 0.8 the pointer widths are moving into a generic parameter for most structures. This will allow you to mix and match data formats of different pointer widths, as well as use different pointer widths throughout a single archived object. rkyv currently supports being imported multiple times under different names. This allows you to set up different rkyv dependencies for different feature choices. You can use the The reasoning behind having features for relative pointer widths (as well as little/big/native endianness) is explored a bit more in #169. The gist of it is that intermediate dependencies (say, if we depend on |
I had noticed that attribute 1, but didn't think it could be used here as Cargo prevents it 2: [package]
name = "double-use"
version = "0.1.0"
edition = "2021"
[dependencies]
rkyv-but-16 = { package = "rkyv", version = "0.7.38", default-features = false, features = ["size_16"] }
rkyv-but-32 = { package = "rkyv", version = "0.7.38", default-features = false, features = ["size_32"] }
Do you have an example of importing it twice under different names with different features? Footnotes
|
Sorry, fell off the wagon on this one. Would this work with an intermediate crate that re-exports different rkyv features in different versions? (i.e. a The other crates I've seen have only ever had different versions of rkyv under different names, so I understand the point about the |
I don't believe so:
|
Hmm, that is unfortunate. Sorry about that, and this will change for 0.8. If your use case is blocked on this feature, then we can dig in in a dedicated issue and see what approaches could help. Most of the basic structures (i.e. |
Don't worry about me — I don't have an actual use case at the moment. A friend was (successfully!) using this crate and mentioned it to me, so I glanced at the docs and noticed the incompatibility with some pie-in-the-sky future ideas I had. My comments here are just to provide the feedback for future versions. |
Is it possible to get a feature similar to |
We currently have a |
Hey @djkoloski, I'm looking to default a field if its missing when deserializing. My use case is I have serialized a struct to save as binary into a database. I've added two new fields, so I can't deserialize using the modified struct. Instead of creating versioned copies of the struct it would be much easier for me if I can deserialize the old records with a default value for the new fields which are missing. Some examples:
Or defaulting numbers to 0. |
If you add a new field and add |
Yup making the second new version of the struct and converting between the two is the direction I'm headed in. But if there was a |
Unfortunately it's not possible with the tools we have right now. In the future, we may have a protobuf-style solution for this problem in protoss. |
Just a coordination check from my side -- is there a belly feel on when a 0.8 might be on the horizon? We're starting to get interest from other devs in Xous and I'd like to upgrade our rkyv pin if a 0.8 is imminent. If not, no worries -- you're doing great work here! |
Nothing is imminent, thanks for checking in though! |
Some sort of mechanism to help with DRY with respect to attributes on types would be great, especially in light of rkyv's versatility that's unfortunately resulting in an avalanche of attributes, code repetition and potential for errors (e.g. forgetting a repr(C), an alignment modifier or the like). I think it should be possible to use attribute macros to that end. |
This issue is pretty stale, and since 0.8 has been released I'm going to close this in favor of opening separate issues. Thanks to everyone who provided feedback! |
I'm trying to get a better idea of what the roadmap for rkyv should look like for v0.7 and beyond. As just one person with a very particular use case in mind, I'd like some feedback so I can get a better idea of:
If possible, new features will be aimed toward separate crates that extend and supplement rkyv rather than adding to the core library. The first thing that I'd like to determine is whether rkyv should continue iterating and expanding its capabilities, or start hardening and work toward a stable 1.0.
Thanks to everyone who provided feedback so far, I'm very grateful as it's been tremendously helpful in getting to this point.
The text was updated successfully, but these errors were encountered: