Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upWrite our own serde format #340
Comments
Keats
added
help wanted
For next major version
labels
Sep 20, 2018
This comment has been minimized.
This comment has been minimized.
|
@dtolnay hopefully it's OK to ping you Before I or anyone start working on it, do you think it makes sense or am I overestimating potential perf gains? Tera uses the |
This comment has been minimized.
This comment has been minimized.
dtolnay
commented
Sep 20, 2018
|
I would guess roughly 3x performance improvement from serializing directly rather than passing everything through serde_json::Value. |
This comment has been minimized.
This comment has been minimized.
|
Sounds like a nice win then! |
This comment has been minimized.
This comment has been minimized.
|
And thanks for the comment! |
This comment has been minimized.
This comment has been minimized.
|
Took the code from serde_json that seems needed to serialize to Value for Tera: https://github.com/Keats/serde-tera (minus dates) I believe |
This comment has been minimized.
This comment has been minimized.
dtolnay
commented
Sep 21, 2018
|
If you don't use I haven't looked at how the implementation currently works but in the case of: {% for user in users %}
<li><a href="{{ user.url }}">{{ user.username }}</a></li>
{% endfor %}what I would expect for serializing directly with no struct LoopSerializer<W> {
out: W,
body: /* some representation of the loop body */,
}
impl<W: io::Write> serde::Serializer for LoopSerializer<W> {
/* ... */
fn serialize_seq(self, _len: Option<usize>) -> Result<Self::SerializeSeq, Self::Error> {
Ok(self)
}
}
impl<W: io::Write> serde::ser::SerializeSeq for LoopSerializer<W> {
/* ... */
fn serialize_element<T>(&mut self, value: &T) -> Result<(), Self::Error>
where
T: ?Sized + Serialize,
{
/* render `value` according to `self.body` and write to `self.out` */
}
} |
This comment has been minimized.
This comment has been minimized.
I was mostly curious to see how creating a format looks in practice as I will probably write one for https://github.com/Keats/scl and wanted to see if
Whoa that's interesting, I didn't think of it that way. I'm not sure how that would work in practice since we currently use json pointers and a few other Value things but that's a problem for future me or (hopefully) someone smarter than me. |
This comment has been minimized.
This comment has been minimized.
Evrey
commented
Nov 19, 2018
Have you guys considered just writing schemas for Cap'n Proto or FlatBuffers? No copying, no parsing, just dumping a binary blob and applying an initial bounds check on the contents. You do already know the data structures you wish to exchange, after all. Those are the two fastest and still very robust serialisation formats I know of. Last time I checked, the Cap'n Proto crates are more mature and FlatBuffers lacks the bounds checking code. On the other hand, if you have the patience for the bounds checking code to land, then note that FlatBuffers has the simpler and more compact wire format. Both of them don't support As a streamable format, something like MessagePack could do the job. It is much more compact and very similar to JSON, has extension type support for low level custom data types, but it is much more complicated to parse than the two above-mentioned formats. Refs:
Direct comparison against the pros/cons:
In Cap'n Proto and FlatBuffers you'd just define your 0815 time struct with seconds and nano seconds. MessagePack recently-ish standardised time stamp extensions of varying precision.
All three formats do this. Also integer and floating-point types of different sizes/precision where needed.
All three formats are binary. MessagePack interleaves the data with type meta data, i.e. it is a self-describing format. Cap'n Proto and Flatbuffers are statically typed, zero copy, and therefore require offset bounds checking to be safe. Can't get faster than that.
Cap'n Proto and FlatBuffers are both insanely fast and still very robust. MessagePack is still much faster than text formats, but it is noticeably slower than the other two formats.
All three require runtime crates. However, the runtime code size for both Cap'n Proto and FlatBuffers is very small. And if you want to be really strict about foreign code: MessagePack and FlatBuffers are very easy to implement.
Again, FlatBuffers has a very small runtime code footprint. I think Cap'n Proto as well.
Depends very much on the amount of data exchanged.
Dunno about that, never heard JSON and pointers in the same sentence. Are they like YAML pointers?
Better early than late. =) |
This comment has been minimized.
This comment has been minimized.
I don't know YAML pointers x) But JSON pointers are something to access data. For example if we have this JSON: {
foo: 1,
bar: { baz: 2},
qux: [3, 4, 5]
}We can get the first value of the Part of the issue is that I don't want people to to create .protobuf or something related as sometimes the context is very dynamic. For example in https://github.com/Keats/kickstart the context is defined in a .toml file and the user might not have Rust installed at all. Same for https://github.com/getzola/zola MessagePack does look interesting though, is there some benchmark between its serde implementation and serde-json? From what I remember, MessagePack is not much more compact than JSON |
This comment has been minimized.
This comment has been minimized.
Evrey
commented
Nov 19, 2018
•
Ah, so basically just some path syntax to walk a JSON data structure.
In that case definitely prefer MessagePack, UBJSON, BSON, and what else they are called. That's where self-describing formats shine.
Not that I know. But
That depends a lot on the kind of data moved around. If a lot of it consists of string identifiers, then the lower memory usage becomes unnoticeable. One could fix that by storing and sending 32-bit FNV1a hashes of identifiers instead. Edit: Still, even if MessagePack would not be much smaller compared to JSON when used in |
This comment has been minimized.
This comment has been minimized.
|
Does Tera really need a format? It basically consumes data. Why not to introduce a trait and allow users to implement it? pub trait TeraValue {
fn as_str(&self) -> Option<&str>;
fn as_int(&self) -> Option<i64>;
fn as_uint(&self) -> Option<u64>;
fn as_float(&self) -> Option<f64>;
fn get(&self, index: usize) -> Option<&dyn TeraValue>;
fn get_prop(&self, prop: &str) -> Option<&dyn TeraValue>;
// ...
}What am I missing? |
This comment has been minimized.
This comment has been minimized.
|
How does it work when the person defining the schema doesn't have Rust on the machine? Like in Zola or kickstart? The automatic Serde serialization makes it very good from a UX point of view for users, compare with the example in https://github.com/cobalt-org/liquid-rust#usage which would be really really tiring when you have dozens of fields |
This comment has been minimized.
This comment has been minimized.
|
(Keep in mind I might be wrong, if you think this can be done without degrading UX please try!) |
This comment has been minimized.
This comment has been minimized.
epage
commented
Dec 7, 2018
|
Got curious to see your issues :)
Yes, if you have data already in a serde struct, then that is easiest. Some things I have done to improve usability
Taking the data by-reference means clients can control how the data is created and avoid a conversion cost during render. In addition, I've been modifying liquid to instead accept a trait. I still need to iterate on this design more but for now it helps in the case where the user is composing data from multiple sources (e.g. in cobalt, data that is the same for every page vs per-page), you no longer need to put them all in the same liquid Value (instead a struct of liquid Values) I'd like to go a step further with the trait and have a completely custom trait for walking the entire data structure so no conversion is needed except when non-leaf nodes are accessed. This would allow the user to better optimize things. |
This comment has been minimized.
This comment has been minimized.
mitchtbaum
commented
Jan 14, 2019
|
If SCL has built-in support for dates, integers, and floats, and serde-scl could go beyond whatever limits serde-json had in its approach to leveraging serde, then what about using that? |
Keats commentedSep 20, 2018
•
edited
serde-json was chosen as the simplest format I could think of but it would be interesting to see what writing our own format would do. We only use serde-json to have easy serialization of user data in the context in
Valuenodes, not really for anything else.Advantages:
serde-json, all you need would be in TeraCons:
If anyone is interesting in picking this up, please do! I won't have time to touch that for quite some time.