-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write our own serde format #340
Comments
@dtolnay hopefully it's OK to ping you Before I or anyone start working on it, do you think it makes sense or am I overestimating potential perf gains? Tera uses the |
I would guess roughly 3x performance improvement from serializing directly rather than passing everything through serde_json::Value. |
Sounds like a nice win then! |
And thanks for the comment! |
Took the code from serde_json that seems needed to serialize to Value for Tera: https://github.com/Keats/serde-tera (minus dates) I believe |
If you don't use I haven't looked at how the implementation currently works but in the case of: {% for user in users %}
<li><a href="{{ user.url }}">{{ user.username }}</a></li>
{% endfor %} what I would expect for serializing directly with no struct LoopSerializer<W> {
out: W,
body: /* some representation of the loop body */,
}
impl<W: io::Write> serde::Serializer for LoopSerializer<W> {
/* ... */
fn serialize_seq(self, _len: Option<usize>) -> Result<Self::SerializeSeq, Self::Error> {
Ok(self)
}
}
impl<W: io::Write> serde::ser::SerializeSeq for LoopSerializer<W> {
/* ... */
fn serialize_element<T>(&mut self, value: &T) -> Result<(), Self::Error>
where
T: ?Sized + Serialize,
{
/* render `value` according to `self.body` and write to `self.out` */
}
} |
I was mostly curious to see how creating a format looks in practice as I will probably write one for https://github.com/Keats/scl and wanted to see if
Whoa that's interesting, I didn't think of it that way. I'm not sure how that would work in practice since we currently use json pointers and a few other Value things but that's a problem for future me or (hopefully) someone smarter than me. |
Have you guys considered just writing schemas for Cap'n Proto or FlatBuffers? No copying, no parsing, just dumping a binary blob and applying an initial bounds check on the contents. You do already know the data structures you wish to exchange, after all. Those are the two fastest and still very robust serialisation formats I know of. Last time I checked, the Cap'n Proto crates are more mature and FlatBuffers lacks the bounds checking code. On the other hand, if you have the patience for the bounds checking code to land, then note that FlatBuffers has the simpler and more compact wire format. Both of them don't support As a streamable format, something like MessagePack could do the job. It is much more compact and very similar to JSON, has extension type support for low level custom data types, but it is much more complicated to parse than the two above-mentioned formats. Refs:
Direct comparison against the pros/cons:
In Cap'n Proto and FlatBuffers you'd just define your 0815 time struct with seconds and nano seconds. MessagePack recently-ish standardised time stamp extensions of varying precision.
All three formats do this. Also integer and floating-point types of different sizes/precision where needed.
All three formats are binary. MessagePack interleaves the data with type meta data, i.e. it is a self-describing format. Cap'n Proto and Flatbuffers are statically typed, zero copy, and therefore require offset bounds checking to be safe. Can't get faster than that.
Cap'n Proto and FlatBuffers are both insanely fast and still very robust. MessagePack is still much faster than text formats, but it is noticeably slower than the other two formats.
All three require runtime crates. However, the runtime code size for both Cap'n Proto and FlatBuffers is very small. And if you want to be really strict about foreign code: MessagePack and FlatBuffers are very easy to implement.
Again, FlatBuffers has a very small runtime code footprint. I think Cap'n Proto as well.
Depends very much on the amount of data exchanged.
Dunno about that, never heard JSON and pointers in the same sentence. Are they like YAML pointers?
Better early than late. =) |
I don't know YAML pointers x) But JSON pointers are something to access data. For example if we have this JSON: {
foo: 1,
bar: { baz: 2},
qux: [3, 4, 5]
} We can get the first value of the Part of the issue is that I don't want people to to create .protobuf or something related as sometimes the context is very dynamic. For example in https://github.com/Keats/kickstart the context is defined in a .toml file and the user might not have Rust installed at all. Same for https://github.com/getzola/zola MessagePack does look interesting though, is there some benchmark between its serde implementation and serde-json? From what I remember, MessagePack is not much more compact than JSON |
Ah, so basically just some path syntax to walk a JSON data structure.
In that case definitely prefer MessagePack, UBJSON, BSON, and what else they are called. That's where self-describing formats shine.
Not that I know. But
That depends a lot on the kind of data moved around. If a lot of it consists of string identifiers, then the lower memory usage becomes unnoticeable. One could fix that by storing and sending 32-bit FNV1a hashes of identifiers instead. Edit: Still, even if MessagePack would not be much smaller compared to JSON when used in |
Does Tera really need a format? It basically consumes data. Why not to introduce a trait and allow users to implement it? pub trait TeraValue {
fn as_str(&self) -> Option<&str>;
fn as_int(&self) -> Option<i64>;
fn as_uint(&self) -> Option<u64>;
fn as_float(&self) -> Option<f64>;
fn get(&self, index: usize) -> Option<&dyn TeraValue>;
fn get_prop(&self, prop: &str) -> Option<&dyn TeraValue>;
// ...
} What am I missing? |
How does it work when the person defining the schema doesn't have Rust on the machine? Like in Zola or kickstart? The automatic Serde serialization makes it very good from a UX point of view for users, compare with the example in https://github.com/cobalt-org/liquid-rust#usage which would be really really tiring when you have dozens of fields |
(Keep in mind I might be wrong, if you think this can be done without degrading UX please try!) |
Got curious to see your issues :)
Yes, if you have data already in a serde struct, then that is easiest. Some things I have done to improve usability
Taking the data by-reference means clients can control how the data is created and avoid a conversion cost during render. In addition, I've been modifying liquid to instead accept a trait. I still need to iterate on this design more but for now it helps in the case where the user is composing data from multiple sources (e.g. in cobalt, data that is the same for every page vs per-page), you no longer need to put them all in the same liquid Value (instead a struct of liquid Values) I'd like to go a step further with the trait and have a completely custom trait for walking the entire data structure so no conversion is needed except when non-leaf nodes are accessed. This would allow the user to better optimize things. |
If SCL has built-in support for dates, integers, and floats, and serde-scl could go beyond whatever limits serde-json had in its approach to leveraging serde, then what about using that? |
I'm closing this issue as it would be a very welcome feature but it might not use serde, it could be some custom traits. |
serde-json was chosen as the simplest format I could think of but it would be interesting to see what writing our own format would do. We only use serde-json to have easy serialization of user data in the context in
Value
nodes, not really for anything else.Advantages:
serde-json
, all you need would be in TeraCons:
If anyone is interesting in picking this up, please do! I won't have time to touch that for quite some time.
The text was updated successfully, but these errors were encountered: