Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for feedback: 0.8 to 1.0 #67

Closed
djkoloski opened this issue Mar 2, 2021 · 62 comments
Closed

Request for feedback: 0.8 to 1.0 #67

djkoloski opened this issue Mar 2, 2021 · 62 comments
Assignees
Labels
help wanted Extra attention is needed

Comments

@djkoloski
Copy link
Collaborator

djkoloski commented Mar 2, 2021

I'm trying to get a better idea of what the roadmap for rkyv should look like for v0.7 and beyond. As just one person with a very particular use case in mind, I'd like some feedback so I can get a better idea of:

  • Who's using (or interested in using) rkyv
  • What you're using it for
  • What needs improvement or is holding you back

If possible, new features will be aimed toward separate crates that extend and supplement rkyv rather than adding to the core library. The first thing that I'd like to determine is whether rkyv should continue iterating and expanding its capabilities, or start hardening and work toward a stable 1.0.

Thanks to everyone who provided feedback so far, I'm very grateful as it's been tremendously helpful in getting to this point.

@djkoloski djkoloski added the help wanted Extra attention is needed label Mar 2, 2021
@djkoloski djkoloski self-assigned this Mar 2, 2021
@dvc94ch
Copy link

dvc94ch commented Mar 9, 2021

I'm still interested in #26, but haven't had any time to play more with rkyv yet. At work we mostly use cbor (with zstd compression) and a little bit of json/protobuf, but I think that rkyv combined with zstd compression would be something we might consider in the future.

@djkoloski
Copy link
Collaborator Author

#26 is still on my list, but since it's going to be part of a larger schema system it'll probably be a bit before I get something workable in people's hands. I'll consider splitting out just the functionality needed for that issue and releasing it sooner.

@DenialAdams
Copy link

DenialAdams commented Mar 11, 2021

Along the lines of "What needs improvement or is holding you back", I was really excited to try out rkyv after seeing your benchmark post, but unfortunately it ended up being larger than serde_cbor for my usecase (serializing a dictionary + inverted index for searching).

The data looks like:

#[derive(Archive, Serialize)]
struct CedictIndex {
   entries: Vec<Entry>,
   inverted_index: HashMap<String, Vec<(u32, u8)>>,
   version: String,
}

#[derive(Archive, Debug, Serialize)]
struct Entry {
   trad: String,
   simp: String,
   pinyin: String,
   definitions: Vec<String>,
   frequency: u32,
}

Rkyv is about 32MiB and sede_cbor about 24MiB.

Totally fine if that's just how it is, but I was wondering if you had any idea why cbor is smaller for this particular use case, given that rkyv was always smaller in the bench.

Thanks!!

@dvc94ch
Copy link

dvc94ch commented Mar 11, 2021

In this particular case I wonder if the hashmap has something to do with it. Have you tried comparing the outputs after compression?

@djkoloski
Copy link
Collaborator Author

The hashmap implementation rkyv uses is compress-hash-displace, which has the tradeoff of taking longer to serialize but (usually) using less memory on-disk while still being a usable hashmap. I believe that serde_cbor uses a list of pairs style for serialization, which means that it doesn't have any overhead on-disk.

One option for solving this problem is making a wrapper type (e.g. struct SerializeAsMap<K, V>(HashMap<K, V>)) that acts like a hashmap but serializes as a sorted list. This would make it slower to perform lookups in the archived format but would likely save a bit of memory.

@djkoloski
Copy link
Collaborator Author

Linking #68

@DenialAdams
Copy link

DenialAdams commented Mar 12, 2021

In this particular case I wonder if the hashmap has something to do with it. Have you tried comparing the outputs after compression?

Thanks for pointing out the compression bit - When compressed with gzip, serde_cbor is still smaller by 2MiB.

The hashmap implementation rkyv uses is compress-hash-displace, which has the tradeoff of taking longer to serialize but (usually) using less memory on-disk while still being a usable hashmap. I believe that serde_cbor uses a list of pairs style for serialization, which means that it doesn't have any overhead on-disk.

One option for solving this problem is making a wrapper type (e.g. struct SerializeAsMap<K, V>(HashMap<K, V>)) that acts like a hashmap but serializes as a sorted list. This would make it slower to perform lookups in the archived format but would likely save a bit of memory.

Thanks, I'll give this a shot and see how it goes!

Also, I tried commenting the HashMap out of the struct -- without that field, CBOR is actually still smaller, but only by about ~0.5 MiB (1 MiB after compression.) So maybe there's something else at play too...

@xobs
Copy link

xobs commented Mar 14, 2021

We're using rkyv for IPC in our operating system.

The microkernel is based around shoveling memory pages back and forth. A memory page is allocated on a multiple of 4 kB, which we fill with data. On the other side, we use rkyv to support turning it back into something that is usable.

An example of something like that is the Opcode pattern, which you can see here: https://github.com/betrusted-io/xous-core/blob/master/services/gam/src/api.rs#L33-L70

This then gets bundled into a Buffer, which can be lent, sent, or mutably borrowed across processes.

We're still working to get libstd working in our operating system, so we're still working on how to work with String and Vec. For now we're using heapless. I have a terrible example of how to do this for heapless::Vec<u8> up at https://github.com/xobs/rkyv-example/blob/main/src/vecu8.rs, but I do wish it were more ergonomic.

Also, with min_const_generics becoming part of stable in 1.51 (in two weeks), it would be nice to have support for that enabled without the need for nightly, if possible.

@djkoloski
Copy link
Collaborator Author

Thanks for trying out rkyv! It's exciting to see it getting used in xous, especially since I think it will bring focus to a lot of aspects that aren't exercised by most uses.

I definitely agree that writing archive implementations for containers is not ergonomic, and that's something I'd be interested in looking at improving over time. And even if improving the ergonomics isn't possible, reaching out to other crates like heapless and adding implementations there would save downstream users the headache of wrapping types and implementing Archive like you've had to.

min_const_generics is supported right now through the const_generics feature, and it should be updated to be ready for the 1.51 stabilization as of c5daf35. I think it would be good to have that feature on by default, so 0.5 will probably also include a breaking change that makes it a default feature.

@pickfire
Copy link
Contributor

Is stream something to look into later? Maybe we should do a feature comparison between rkyv and flatbuffers and capn' proto, to know if rkyv supports something others wanted?

@djkoloski
Copy link
Collaborator Author

I think streaming and async would be cool, but I hesitate to add them without any use cases in mind. Making all the serializers async would add a lot of complexity and could impact performance. I'm not totally convinced of what benefit it would have either.

That said, Cap'n Proto does support reading data before the whole message has arrived. It can do that because it writes outer objects before inner objects, which is the opposite of how rkyv (and FlatBuffers?) works. Right now rkyv always writes inner objects before outer ones. We might be able to switch that by changing the serializer to write backwards (i.e. position starts at zero and decreases with every write), but there's more thinking to do here. To move forward on streaming, we need some use cases and some ideas for placing outer objects before inner ones.

I think doing feature comparisons is a great idea. You can't beat your competitors if you can't compete!

Schema evolution is a big one that rkyv doesn't have, but there are definitely more. I've been skimming through the Cap'n Proto site as a place to start. If you spot any major feature gaps, that would be a great discussion starter.

Another one that I noticed Cap'n Proto has is on-demand validation, which only validates the parts of a message that you actually read. It gives a pretty sizeable advantage in some of the benchmarks, and could be a common use case. I think bytecheck would be the right place to add it, and it could probably support on-demand validation with a little elbow grease and macro magic.

@bunnie
Copy link

bunnie commented Mar 16, 2021

We're using rkyv for IPC in our operating system.

The microkernel is based around shoveling memory pages back and forth. A memory page is allocated on a multiple of 4 kB, which we fill with data. On the other side, we use rkyv to support turning it back into something that is usable.

An example of something like that is the Opcode pattern, which you can see here: https://github.com/betrusted-io/xous-core/blob/master/services/gam/src/api.rs#L33-L70

This then gets bundled into a Buffer, which can be lent, sent, or mutably borrowed across processes.

We're still working to get libstd working in our operating system, so we're still working on how to work with String and Vec. For now we're using heapless. I have a terrible example of how to do this for heapless::Vec<u8> up at https://github.com/xobs/rkyv-example/blob/main/src/vecu8.rs, but I do wish it were more ergonomic.

Also, with min_const_generics becoming part of stable in 1.51 (in two weeks), it would be nice to have support for that enabled without the need for nightly, if possible.

Following up on @xobs comments, we were just trying to integrate rkyv 0.4.1 into our latest version of the OS and got blocked by the fact that Deserialize requires std, but we operate in a nostd environment. I think a lot of the new features in 0.4.1 are awesome, but without the ability to deserialize we're going to have to stick with 0.3.1 for now...

@djkoloski
Copy link
Collaborator Author

Following up on @xobs comments, we were just trying to integrate rkyv 0.4.1 into our latest version of the OS and got blocked by the fact that Deserialize requires std, but we operate in a nostd environment. I think a lot of the new features in 0.4.1 are awesome, but without the ability to deserialize we're going to have to stick with 0.3.1 for now...

You shouldn't need std to deserialize your types, could you be more specific about what blocked you? This is going to get a bit in the weeds: the Deserialize trait has the deserializer as a type argument and all sized types only require that the deserializer is Fallible (which doesn't require std). Even unsized types can be deserialized with no_std if you can provide a deserializer that implements Deserializer. You might need to make a custom deserializer and implement only Fallible for it, not Deserializer.

So there might be a bit of confusion here around whether you need to implement Deserializer to deserialize non-allocating types. That should not be the case.

If you have some code available that demonstrates your problem I can also take a look at it and see if I can help. 🙂

@bunnie
Copy link

bunnie commented Mar 16, 2021

Following up on @xobs comments, we were just trying to integrate rkyv 0.4.1 into our latest version of the OS and got blocked by the fact that Deserialize requires std, but we operate in a nostd environment. I think a lot of the new features in 0.4.1 are awesome, but without the ability to deserialize we're going to have to stick with 0.3.1 for now...

You shouldn't need std to deserialize your types, could you be more specific about what blocked you? This is going to get a bit in the weeds: the Deserialize trait has the deserializer as a type argument and all sized types only require that the deserializer is Fallible (which doesn't require std). Even unsized types can be deserialized with no_std if you can provide a deserializer that implements Deserializer. You might need to make a custom deserializer and implement only Fallible for it, not Deserializer.

So there might be a bit of confusion here around whether you need to implement Deserializer to deserialize non-allocating types. That should not be the case.

If you have some code available that demonstrates your problem I can also take a look at it and see if I can help. slightly_smiling_face

Sure...I could completely believe we are doing this the wrong way, but basically, what is the right way to do the equivalent of what was .unarchive() in 0.3.1? This is the code for 0.4.1 that doesn't work right now:

https://github.com/betrusted-io/xous-core/blob/f9439b48d66818f0e12f24cf8738faa376c97bf5/services/gam/src/lib.rs#L46

Obviously, this doesn't work because we don't specify a deserializer, but when I looked here:

https://github.com/djkoloski/rkyv/blob/master/rkyv/src/de/deserializers.rs

I noticed that every type of deserializer provided required std, so I figured that was the end of the story and reverted the refactor.

FYI this is how we do it in 0.3.1, and it works:

https://github.com/betrusted-io/xous-core/blob/1dda24e24e5b557b5c6707b71863d331339e9abe/services/gam/src/lib.rs#L46

All of our types are non-allocating types, as we don't have alloc and it's all stack-based right now. I actually don't follow what you mean by the requirements of the Deserializer requiring Fallible. If you could please provide a really simple example of how to implement one, I can use that as a template to make our own; but more or less I'm just trying to copy what's put in the examples given at https://docs.rs/rkyv/0.4.1/rkyv/trait.Archive.html.

rkyv does magical things that I don't understand at a deep level; I understand just enough to do copypasta and be dangerous to myself and possibly others.

@djkoloski
Copy link
Collaborator Author

Ah, that makes sense! This is something that the book covers under the probably poorly-named Extensions page.

Some background as to the Unarchive/Deserialize change: the 0.4 release added support for shared pointers. In an archive, any number of shared pointers could point to the same archived object. When they deserialize into their proper in-memory counterparts, they should all point to the same deserialized object. That requires some coordination, which is why Deserialize got a context (the deserializer) like Serialize already had (formerly the writer, now the serializer).

The way that Deserialize (and Serialize/CheckBytes) is set up, the archived type says what kinds of deserializers it needs to deserialize by using impl bounds on a per-type basis:

// This type can be deserialized with any deserializer
impl<D: Fallible + ?Sized> Deserialize<MyType, D> for ArchivedMyType {
    // ...
}

// This type needs deserializers that implement Deserialize because it needs to allocate memory
impl<D: Deserialize + ?Sized> Deserialize<MyAllocType, D> for ArchivedMyAllocType {
    // ...
}

All deserializers (and serializers) must declare what errors they can produce upfront by implementing Fallible so we can use it in the type signatures of serialize and deserialize. All the non-allocating types have Deserialize implementations that only require Fallible.

With that out of the way, as for actually solving your issue: you just have to make your own Fallible deserializer and pass that in.

Deserializer:

pub struct XousDeserializer;

// Unreachable enum pattern, swap out for the never type (!) whenever that gets stabilized
pub enum XousUnreachable {}

impl rkyv::Fallible for XousDeserializer {
    type Error = XousUnreachable;
}

Usage:

result.deserialize(&XousDeserializer);

This might be worth providing a BasicDeserializer for to prevent any confusion, so I'll do that for 0.5 (#79).

rkyv does magical things that I don't understand at a deep level; I understand just enough to do copypasta and be dangerous to myself and possibly others.

This is actually what I want; don't be afraid of zero-copy deserialization! 🙂

@bunnie
Copy link

bunnie commented Mar 17, 2021

Ah! ahah. Yes, it was not clear to me that something as simple as that was necessary. Fortunately I put this all in a branch and I'll try picking up the refactor once again. Thanks again for the help explaining that. And yes, a BasicDeserializer would go a long way toward helping out a newcomer like me. It was not obvious (to me) from pattern matching that the null element was in the set of valid deserializers. I did even stumble on the Extensions page but it didn't register to me what you were saying, because also I had not encountered Fallible as a type yet, I've only used vanilla Result Ok/Error patterns in my code to date.

Whenever I encounter macros in Rust my eyes glaze over, so anything that involves a Derive is basically a black box to me even if it's quite trivial. I'll get used to Rust macro syntax someday; it's very powerful and I see why they had to do it, but it's almost a different language from non-macro Rust unto itself.

@erichutchins
Copy link

I've been experimenting with rkyv in a fairly narrow use case: an immutable, on-disk key value store. At my dayjob, we use discodb extensively. What caught my eye about your project was that you use the same perfect hashing algorithm as disco.

Querying a single key from a serialized hashmap is basically instantaneous. Super impressive! Hyperfine has a tough time scoring it it's so fast.

Disco uses a huffman encoding for compression and my rust skills can't beat it in speed. I did implement zstd block compression on each value (resulting in smaller size on disk), but I need to find a faster way of iteratively decoding the blocks. (I saw your answer in #48, but compressing the whole obj negates the benefit of memory map reads afaict)

Disco also supports multiple values per key. I implemented this with an indexset to assign an id per unique value and a hashmap of keys pointing to a vector of ids. I collect the indexset into a vector for serialization.

In short, rkyv let a novice rust coder build a mostly faster kv store!

My feature requests are: first class functions for reading and writing to disk and compression

@djkoloski
Copy link
Collaborator Author

That's awesome! These are the sorts of use cases that I'm looking for inspiration from.

When you're talking about performing compression on the values, did you compress the values and store them in the hashmap? I think there are some interesting possibilities for compression like object interning and compression contexts. These are things I'd be keen on implementing in some accessory packages (e.g. rkyv_intern, rkyv_compress) but having the ideas in mind ahead of time will make it easier to plan support for them.

For reading and writing to disk, I think some helpers in a new crate (rkyv_mmap?) would be nice to get memory mapped files working out of the box. Were there any sorts of things that you were looking for related to reading/writing to disk in particular?

@erichutchins
Copy link

Yup! I was rather pleased with the compression. I sample the vector and build a zstd training dict before block compressing each vec entry in an iter_mut. The struct to rkyv (as a verb?) has the hashmap, vec, and zstd training dict.

For keys with small numbers of values, this approach is faster and smaller on disk. I think I just need to think smarter about my query/decompression loop when there are hundreds of values.

I like the rkyv_mmap idea. Just functions to handle saving all the necessary components to a path and reading from a path. Eg my read/load is to read the first two bytes for the pos and then mmap from offset 2 for the rest.

@djkoloski djkoloski changed the title Request for feedback: v0.5 and beyond Request for feedback: v0.6 and beyond Apr 5, 2021
@sffc
Copy link

sffc commented Apr 6, 2021

Hello!

Thanks for your work on Rkyv!

I'm from the team developing ICU4X, a library for internationalization written in Rust to target multiple platforms and lower-resource, client-side devices.

We are looking into Rkyv as an option for delivery of locale data to power our APIs. These data are highly structured, consisting of mostly strings, numbers, and vectors of strings and numbers. Our high-level needs include:

  1. Small code size and data size
  2. Friendly when used across FFI and on different platforms (endianness)
  3. Efficient unpacking of large vectors, preferably with zero-copy
  4. Ability to plug in human-readable data (JSON) as well as machine-readable
  5. Respect users' rights to privacy and security

I made a small writeup comparing Rkyv with Postcard and Bincode, which you can find at unicode-org/icu4x#78 (comment). Rkyv Archive is extremely fast and small code, validating the design goals. Postcard wins on data size by a fair margin, consistent with your findings in https://davidkoloski.me/blog/rkyv-is-faster-than/.

I'd say the main puzzle I'm trying to solve right now is Serde compatibility. Right now, it seems that Rkyv does not play very well with data structs that have lifetime parameters. For example, to achieve zero-copy string deserialization in JSON, Postcard, or Bincode, one writes a data struct such as

#[derive(serde::Serialize, serde::Deserialize)]
pub struct DataStruct<'s> {
    message: &'s str,
}

However, the above struct does not compile with Rkyv.

#[derive(rkyv::Serialize, rkyv::Archive, rkyv::Deserialize)]
pub struct DataStruct<'s> {
    message: &'s str,
}
Example errors
error[E0277]: the trait bound `&str: Serialize>` is not satisfied
  --> utils/litemap/benches/bin/litemap_rkyv_deserialize.rs:52:16
   |
52 |     serializer.serialize_value(&d).expect("failed to archive test");
   |                ^^^^^^^^^^^^^^^ the trait `Serialize>` is not implemented for `&str`
   |
   = note: required because of the requirements on the impl of `Serialize>` for `DataStruct<'_>`

error[E0277]: the trait bound &str: Archive is not satisfied
--> utils/litemap/benches/bin/litemap_rkyv_deserialize.rs:55:29
|
55 | let archived = unsafe { archived_root::(&buf) };
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the trait Archive is not implemented for &str
|
= note: required because of the requirements on the impl of Archive for DataStruct<'_>

error[E0277]: the trait bound &str: Archive is not satisfied
--> utils/litemap/benches/bin/litemap_rkyv_deserialize.rs:55:29
|
55 | let archived = unsafe { archived_root::(&buf) };
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^ the trait Archive is not implemented for &str
|
::: /home/sffc/.cargo/registry/src/github.com-1ecc6299db9ec823/rkyv-0.5.0/src/util/mod.rs:39:32
|
39 | pub unsafe fn archived_root<T: Archive + ?Sized>(bytes: &[u8]) -> &T::Archived {
| ------- required by this bound in archived_root
|
= note: required because of the requirements on the impl of Archive for DataStruct<'_>

error[E0599]: the method deserialize exists for reference &ArchivedDataStruct<'_>, but its trait bounds were not satisfied
--> utils/litemap/benches/bin/litemap_rkyv_deserialize.rs:56:33
|
40 | #[derive(rkyv::Serialize, rkyv::Archive, rkyv::Deserialize)]
| ------------- doesn't satisfy _: Deserialize<DataStruct<'_>, _>
...
56 | let deserialized = archived.deserialize(&mut AllocDeserializer).unwrap();
| ^^^^^^^^^^^ method cannot be called on &ArchivedDataStruct<'_> due to unsatisfied trait bounds
|
= note: the following trait bounds were not satisfied:
&str: Archive
which is required by ArchivedDataStruct<'_>: Deserialize<DataStruct<'_>, _>

Similarly, the Rkyv Archive data structs being special types with RelPtr is very cool, but it makes it harder to pass it in the same places that the normal struct would have been used. For example, if we want to take a &DataStruct argument to a function, Rkyv Archive structs cannot be used. Read more in #101.

Note that serialization speed is unimportant for us, because the data are serialized during the build process. We are happy to take hits in serialization performance in order to reduce data size or achieve other goals.

@djkoloski
Copy link
Collaborator Author

Thanks for looking at rkyv! I think it could be a good fit for your use case.

First to clarify, rkyv is not serde-compatible because serde only supports partial zero-copy deserialization. This means that while you can borrow large parts of your data from the source buffer, you will still need to create a new object to hold references to the borrowed memory. On the other hand, rkyv supports total zero-copy deserialization. This means that no extra memory is required aside from the data buffer.

More concretely:

  • Deserializing a DataStruct<'s> with serde will make a new DataStruct with a reference to a str from the serialized data
  • Accessing an ArchivedDataStruct with rkyv will find the position of the root object in the byte buffer, then just cast a pointer to that position to an &ArchivedDataStruct. The ArchivedDataStruct was already in the byte buffer, we just needed to get a reference to it.

You can read more about how rkyv compares with serde in the book.

rkyv doesn't provide serialization capabilities for references out of the box for a couple of reasons:

  • The referenced data isn't owned by the object being serialized
  • It's not clear whether referenced data should be serialized like a value or like a shared pointer
  • Whatever got archived wouldn't be deserializable since it would create dangling data (since the struct would reference but not own the data)

For these reasons, lifetimes are not supported in archived data. You can still get some of these features using shared pointers, but in general it helps to think of the archived type always having an implicit lifetime parameter in its reference:

&'a ArchivedDataStruct
 ^^ this is the lifetime you'd normally put as a generic parameter for serde

Not 100% sure this answers the questions you have, so please let me know if I missed anything.

@pickfire
Copy link
Contributor

pickfire commented Apr 6, 2021

I do know rkyv does not have serde compatibility, but does that mean we need to have third party to support rkyv (which is a similar case to pyo3), for examples popular crates like chrono only works with serde so putting rkyv means it won't work unless only primitive types are used. Or maybe in the future rkyv is aiming to be a serde replacement so we don't have to fracture the ecosystem like what serde did to rustc-serialize back then?

@djkoloski
Copy link
Collaborator Author

I think of rkyv as a complement to serde, not really as a replacement. For third-party crates, I'm willing to reach out and provide implementations for their types (see #88), but hopefully in the future it's something that crates will provide out of the box. I don't think there's a really great solution unfortunately.

If rkyv support doesn't gain traction among popular crates, I'll have to loop back and rethink how to approach this problem.

@thedodd
Copy link

thedodd commented Apr 13, 2021

I'll just chime in and say that schema evolution would be outstanding.

@djkoloski
Copy link
Collaborator Author

@thedodd Right now, I'm aiming for protoss to fill this gap. Development is slow, but I'd appreciate any feedback you have on the initial design.

@djkoloski
Copy link
Collaborator Author

I think that's reasonable, I'll update the thread name one more time. I think "0.8 to 1.0" gives the right framing: new features and breaking changes are welcome, and so are suggestions for stabilization work.

@djkoloski djkoloski changed the title Request for feedback: pre-1.0 core library and features Request for feedback: 0.8 to 1.0 Jul 9, 2021
@pickfire
Copy link
Contributor

pickfire commented Jul 9, 2021

I just wondering if rkyv should rush to 1.0 because from what I see quite some important features are not available. I think it's better to have few more releases and get more feedbacks until it's stable enough to release 1.0

@pickfire
Copy link
Contributor

Is cross language a goal #144? I am just wondering if it's possible as a consideration for helix plugin system helix-editor/helix#122

@djkoloski
Copy link
Collaborator Author

Cross-language compatibility is possible, but:

  • It will take a while to write
  • It will essentially be a separate codebase to maintain
  • Some languages may only have partial support (e.g. read-only)
  • The languages that could be fully supported may be small

C++ is likely possible, but will probably require concepts to be usable (C++20) because rkyv makes heavy use of generics.

Could you suggest what other languages you'd want supported?

@kirawi
Copy link

kirawi commented Jul 21, 2021

I'm not very familiar with this project nor the issue it tackles, but would C bindings be possible?

@djkoloski
Copy link
Collaborator Author

C bindings can absolutely be offered, and the library actually comes with a strict feature to guarantee C type compatibility. If all you need to do is access data, then rkyv could be used with other languages with just a type library and some re-defined structs. To give a concrete example, to access rkyv data from C++ you'd need:

  • Bindings to whatever rkyv standard library functions you use (rkyv has separate versions of most containers and types that are archive-safe).
  • Redefinitions of whatever types you'd like to use (cbindgen should be able to do most of this)

That would let you access data in a zero-copy fashion. If you wanted to serialize and deserialize data, you may need more advanced capabilities. This would probably require writing a version of rkyv for your language or using rust and using bindings. I'm not extremely well-versed in the capabilities of other languages, but as I understand it C++ would be capable of this with concepts (C++20). Getting a version of rkyv up and running in C++ would take a while.

@ckaran
Copy link

ckaran commented Dec 17, 2021

I'd love to see cyclic object graphs supported (see #214). It would make taking snapshots of running processes MUCH simpler!

@bunnie
Copy link

bunnie commented Jun 7, 2022

Just looping back to this thread -- we're contemplating upgrading rkyv in Xous to use a later version. We don't have a solid timeline for that yet, but wondering what the latest thoughts were about when a 1.0 version of rkyv might be coming about? That might be a good milestone for us to try a major refactor and sync up with the rkyv release train.

@djkoloski
Copy link
Collaborator Author

Thanks for coordinating. 🙂 I'm planning to release 0.8 at some point in the next year, but the timeline is not solid yet. It will likely be a large change, but probably the final revision before 1.0. So 1.0 is on the horizon but not yet in sight.

0.8 is slated to bring some highly requested new features and address some long-standing functionality shortfalls. I think that the gap between 0.8 and 1.0 is likely to be small.

@bunnie
Copy link

bunnie commented Jun 7, 2022

sounds solid to me. I'll keep an eye out for 0.8 then. thanks!

@shepmaster
Copy link

I haven't had a chance to use this crate yet, but was lightly investigating it for some potential embedded usage. However, one thing that turned me off was these features:

size_16 = []
size_32 = []
size_64 = []

This means that I couldn't use one program to communicate with both an Arduino (where I'd want to use size_16) and another desktop-class machine (where I might want to use something bigger). In a broader sense, it also means that I likely couldn't have two usages of rkyv in my entire crate graph.

Data format settings shouldn't be incompatible at this broad of a level.

@djkoloski
Copy link
Collaborator Author

Thanks for the feedback, as part of the work for 0.8 the pointer widths are moving into a generic parameter for most structures. This will allow you to mix and match data formats of different pointer widths, as well as use different pointer widths throughout a single archived object.

rkyv currently supports being imported multiple times under different names. This allows you to set up different rkyv dependencies for different feature choices. You can use the crate argument to the archive attribute to configure the derive macro, see more in the docs for Archive. Example: #[archive(crate = "rkyv_16")].

The reasoning behind having features for relative pointer widths (as well as little/big/native endianness) is explored a bit more in #169. The gist of it is that intermediate dependencies (say, if we depend on indexmap which supports rkyv) need to coordinate their choices of pointer width and endianness with the end user. This is a limitation of the current API, which only associates a single archived type with each "unarchived" type and so doesn't allow for different representations of the same archived type.

@shepmaster
Copy link

rkyv currently supports being imported multiple times under different names. This allows you to set up different rkyv dependencies for different feature choices. You can use the crate argument to the archive attribute

I had noticed that attribute 1, but didn't think it could be used here as Cargo prevents it 2:

[package]
name = "double-use"
version = "0.1.0"
edition = "2021"

[dependencies]
rkyv-but-16 = { package = "rkyv", version = "0.7.38", default-features = false, features = ["size_16"] }
rkyv-but-32 = { package = "rkyv", version = "0.7.38", default-features = false, features = ["size_32"]  }
% cargo build -q
error: the crate `double-use v0.1.0 (/private/tmp/double-use)` depends on crate `rkyv v0.7.38` multiple times with different names

Do you have an example of importing it twice under different names with different features?

Footnotes

  1. The SNAFU crate has a similar feature. It's too bad there's not a universally-accepted name for this, or that Rust itself doesn't help.

  2. I really wanted that when adding similar capabilities to the playground and I had to do some annoying workarounds.

@djkoloski
Copy link
Collaborator Author

Sorry, fell off the wagon on this one. Would this work with an intermediate crate that re-exports different rkyv features in different versions? (i.e. a rkyv_version crate would re-export with size_16 on version 0.0.16 and size_32 on 0.0.32) That's a hacky solution but it seems like it could work. I won't really endorse doing so but if you're really in a pinch it looks possible.

The other crates I've seen have only ever had different versions of rkyv under different names, so I understand the point about the crate argument not helping.

@shepmaster
Copy link

Would this work with an intermediate crate that re-exports different rkyv features in different versions?

I don't believe so:

% cargo tree --no-dedupe --depth 2
double-use v0.1.0 (/private/tmp/double-use)
├── shim v0.0.16 (file:///tmp/double-use/shim?tag=v16#bb3392ef)
│   └── rkyv v0.7.38
└── shim v0.0.32 (file:///tmp/double-use/shim?tag=v32#96e58b50)
    └── rkyv v0.7.38
% cargo build -q
error: "size_16" and "size_32" are mutually-exclusive features. You may need to set `default-features = false` or compile with `--no-default-features`.
   --> /Users/shep/.cargo/registry/src/github.com-1ecc6299db9ec823/rkyv-0.7.38/src/macros.rs:104:1
    |
104 | / core::compile_error!(
105 | |     "\"size_16\" and \"size_32\" are mutually-exclusive features. You may need to set \
106 | |     `default-features = false` or compile with `--no-default-features`."
107 | | );
    | |_^

@djkoloski
Copy link
Collaborator Author

Hmm, that is unfortunate. Sorry about that, and this will change for 0.8. If your use case is blocked on this feature, then we can dig in in a dedicated issue and see what approaches could help. Most of the basic structures (i.e. RawRelPtr and RelPtr) are generic over an offset type, but it's a lot of work to build up complex structures from such a low level.

@shepmaster
Copy link

If your use case is blocked on this feature

Don't worry about me — I don't have an actual use case at the moment. A friend was (successfully!) using this crate and mentioned it to me, so I glanced at the docs and noticed the incompatibility with some pie-in-the-sky future ideas I had. My comments here are just to provide the feedback for future versions.

@praveenperera
Copy link
Contributor

Is it possible to get a feature similar to #[serde(default]?

@djkoloski
Copy link
Collaborator Author

We currently have a Skip wrapper that may do something similar, were you looking for an attribute to default a field when serializing or when deserializing?

@praveenperera
Copy link
Contributor

praveenperera commented Jul 24, 2022

Hey @djkoloski, I'm looking to default a field if its missing when deserializing.

My use case is I have serialized a struct to save as binary into a database. I've added two new fields, so I can't deserialize using the modified struct. Instead of creating versioned copies of the struct it would be much easier for me if I can deserialize the old records with a default value for the new fields which are missing.

Some examples:

Option<T> default to None
Vec<T> default to vec![]

Or defaulting numbers to 0.

@djkoloski
Copy link
Collaborator Author

If you add a new field and add #[with(Skip)] to it then the archived type should remain the same but default the new value on deserialize. "Upgrading" the struct is more difficult since you only have one version of the struct. In that situation I'd recommend you have a new version of the struct and deserialize the old version then convert it to the new version.

@praveenperera
Copy link
Contributor

Yup making the second new version of the struct and converting between the two is the direction I'm headed in. But if there was a default option I could avoid that. Is having a default option possible with rkyv?

@djkoloski
Copy link
Collaborator Author

Unfortunately it's not possible with the tools we have right now. In the future, we may have a protobuf-style solution for this problem in protoss.

@djkoloski djkoloski unpinned this issue Aug 9, 2022
@bunnie
Copy link

bunnie commented Oct 7, 2022

Just a coordination check from my side -- is there a belly feel on when a 0.8 might be on the horizon? We're starting to get interest from other devs in Xous and I'd like to upgrade our rkyv pin if a 0.8 is imminent. If not, no worries -- you're doing great work here!

@djkoloski
Copy link
Collaborator Author

Nothing is imminent, thanks for checking in though!

@ousado
Copy link

ousado commented Jul 10, 2024

Some sort of mechanism to help with DRY with respect to attributes on types would be great, especially in light of rkyv's versatility that's unfortunately resulting in an avalanche of attributes, code repetition and potential for errors (e.g. forgetting a repr(C), an alignment modifier or the like). I think it should be possible to use attribute macros to that end.

@djkoloski
Copy link
Collaborator Author

This issue is pretty stale, and since 0.8 has been released I'm going to close this in favor of opening separate issues. Thanks to everyone who provided feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests