-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added support for serializing sequences of things #36
Conversation
Yea this is where the old |
From memory it breaks down for sequences of any primitive type (integers, strings, etc). We talked about it in the first serializer PR (#8 (comment)), but I don't think we reached any conclusions. What do other languages like Java or Python do in this situation? |
src/ser/mod.rs
Outdated
@@ -288,6 +285,26 @@ where | |||
} | |||
} | |||
|
|||
pub struct Seq<'a, W: 'a + Write> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you like to move this into a new file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like a good idea. Does it belong in src/ser/var.rs
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can just create a new src/ser/seq.rs
7157c4c
to
023698a
Compare
@oli-obk I found a "fix" to make sure you don't try and serialize a sequence of primitives. It's really horrible though, so hopefully we can think of something which would be better in the long run. One possibility is to create a helper |
That sounds much better, yes. And it should be possible to optimize it away with good enough const evaluation. |
What are your thoughts on the last commit? It adds quite a few lines, but most of those are just from the boilerplate required to implement |
|
||
#[allow(unused_variables)] | ||
impl<'a> Serializer for WrapSafeDetector<'a> { | ||
type Ok = (); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you just make this bool
instead of ()
and pass the value that way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I initially thought to do.
The problem is not all methods return a Result<Self::Ok, _>
, so you'd end up needing to create your own SerializeSeq
, SerializeMap
, and all the other various helper types. Adding lots of code for not much gain.
I'm not a fan of the mutable state, but considering all this is hidden behind the is_wrapped()
interface, using a mutable reference is probably the easiest option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you just remove the state and use is_ok()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn't fix the issue though. To return an Ok
the type signatures say your Ok
must be of type Ok(Self::SerializeSeq)
(precise type depends on the method). So you still need to create some struct with the 7 impls for SerializeSeq
, SerializeTuple
, SerializeMap
, etc...
What benefit do you think removing the state would have?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What benefit do you think removing the state would have?
It looks ugly and is probably not optimized out.
That doesn't fix the issue though. To return an Ok the type signatures say your Ok must
You misunderstand me. Just return Ok(())
where you set is_wrapped = false
and Err(())
where you set is_wrapped = true
or the other way around. Then your is_wrapped
function becomes thing.serialize(WrapSafeDetector).is_ok()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh, that sounds promising! I'll give it a shot.
@oli-obk, I've pushed up a version of the I was able to skip most of the |
looks much better
I don't quite understand why you can't return |
I'm pretty sure if you treat a sequence as "wrapped", when serializing a All the other types are okay though because you've typically got a struct/enum name and the contents would be wrapped in Put differently, should it be possible to serialize something like |
Thanks! That made total sense. |
|
||
fn serialize_element<T>(&mut self, elem: &T) -> Result<Self::Ok, Self::Error> | ||
where T: Serialize + ?Sized { | ||
if is_wrapped(elem) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So with the new knowledge I have gained by your explanation, I am back with more questions:
Why would we even care about whether the elements are wrapped? Aren't all sequences not wrapped, irrelevant of their elements?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Say I were to serialize vec![1, 2, 3, 4]
, you'd end up calling serialize_element()
on each of the elements. Usually this would be a no-op elem.serialize(&mut *self.parent)
to serialize the element, however if you do this on a sequence of "primitives" (i.e. anything which isn't "wrapped" in some sort of named tag), you end up getting 1234
out. Which isn't what we want.
I could wrap any primitives in an arbitrarily named tag and get something like <value>1</value>...<value>4</value>
, but then that "value" tag is completely arbitrary, not really changeable, and probably wouldn't interop well with other XML libraries from other languages.
My is_wrapped()
check there is aiming to prevent this issue by saying sequences of bare primitives aren't valid XML, so if you want to serialize them you need to explicitly wrap it in a named newtype.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh I remember. Yea I took the opposite way and made sure the sequence deserializer looked for repetitions of e.g. the field name. So multiple <foo></foo>
would be what you'd get for a struct Bar { foo: Vec<String> }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have time in the next weeks to consider sequences all that much... We'll leave this open. Maybe @RReverser has some ideas.
@oli-obk, if it helps you I'll try to come up with tests which cover the more common edge cases I can think of. Working tests are probably going to make the code easier to understand and reason about than anything I can say anyway. Also, I noticed the |
I think we need to start even higher level than that. We need to figure out what kind of sequences we want to support in general and how they are supposed to be represented. |
What did you have in mind? What about composing a list of things which we think are valid and invalid, then try to generalise that into a couple simple rules? Here are some of the things I think we should support and how I'd represent them:
|
I had a bit of a think and after some experimentation came up with a fairly small set of rules... How does this sound?
Being "wrapped" means you are of the form `<$name>...</$name>". There's probably an official term for this but hopefully that'll suffice for now. |
79560f1
to
19e9e92
Compare
19e9e92
to
3a6de9d
Compare
I worry that our |
I'm a bit concerned about that as well. I'm using Because XML is a little different to JSON and the other formats, is the way the |
Is there any progress on this? I would have expected It also appears, that serialization makes things doubly wrapped, i.e for I think at this point it should be settled if tags represent
One doesn't even have to choose only one behavior, since it could be different Serializers or an option to a single Serializer. I can't help but feel that there's some overthinking going on in an attempt to choose "the one true output format", that stalls the entire development. I see no need for complicated checks, if they end with runtime errors and/or bulky At least it doesn't lose separation of elements, so But they also deserve better than being told "Unsupported Operation: serialize_seq" in a runtime panic, after they invested time in using If they want to change the tag, then using a wrapping struct or, barring that, using a trivial If that still is not deemed sufficient, a sequence customization callback/visitor provided by user to De/Serializer instance is a distinct possibility too. E.g something that takes an analogue of And then it decides if the sequence is Nested tag and empty separator support would be most useful, I think, and a simple de/serializer constructor should default to something sane, like nested |
Hi! I’m also interested in this! Can we help? |
We need to revive discussions about this topic, and serialization in general. I will start by reviewing the unit tests. I too would like to strive for as much parity between ser and de, but we should probably focus on enabling the different formats that most users want to serialize to. |
Maybe it becomes conceptually more simple if we plan with serde-like field attributes from the beginning. I have a structure that’s sufficiently XML-like: // Attribute types
#[derive(Serialize)]
struct NameToken(String);
//...
// Attribute containers
#[derive(Serialize, Default)]
struct CommonAttributes {
#[serde(skip_serializing_if = "Option::is_none")]
#[serde_xml_rs(attribute)]
id: Option<NameToken>,
#[serde(skip_serializing_if = "Vec::is_empty")]
#[serde_xml_rs(attribute)]
names: Vec<NameToken>,
//...,
}
//...
// Pseudo elements
#[derive(Serialize)]
struct Document {
children: Vec<Box<SubStructure>>,
}
#[derive(Serialize)]
struct TextNode { text: String }
// Element types
#[derive(Serialize)]
struct Section {
#[serde(flatten)]
common_attributes: CommonAttributes,
children: Vec<Box<SubStructure>>,
}
#[derive(Serialize)]
struct Title {
#[serde(flatten)]
common_attributes: CommonAttributes,
children: Vec<Box<Inline>>,
}
//...
// Element categories
#[derive(Serialize)]
#[serde(rename_all = "snake_case")]
enum SubStructure {
Section(Section),
Title(Title),
//...
}
#[derive(Serialize)]
#[serde(rename_all = "snake_case")]
enum Inline {
#[serde_xml_rs(text_node)]
TextNode(TextNode),
//...
}
//... And I would like to be able to serialize it as it should be: // Wrap element types `into` their element categories
let doc = Document::with_children(vec![
Section::with_children(vec![
// Creates title with ID and children
Title::new("first_chapter", vec![
"First Chapter".to_owned().into(),
]).into(),
]).into(),
]);
assert_equal!(
serde_xml_rs::serialize(tree),
"\
<document>
<section>
<title id="first_chapter">First Chapter<title>
</section>
</document>\
") link to playground where I serialize this to JSON |
I presume this PR can be safely closed since #173 has been merged? |
This is the beginning of sequence serializing. I'm still not sure how we deal with the case when someone wants to serialize
vec![1, 2, 3, 4]
though...