Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for serializing sequences of things #36

Closed
wants to merge 12 commits into from

Conversation

Michael-F-Bryan
Copy link
Contributor

This is the beginning of sequence serializing. I'm still not sure how we deal with the case when someone wants to serialize vec![1, 2, 3, 4] though...

@oli-obk
Copy link
Collaborator

oli-obk commented Aug 22, 2017

Yea this is where the old serde-xml started breaking down... @RReverser what were your plans around sequences?

@Michael-F-Bryan
Copy link
Contributor Author

From memory it breaks down for sequences of any primitive type (integers, strings, etc). We talked about it in the first serializer PR (#8 (comment)), but I don't think we reached any conclusions.

What do other languages like Java or Python do in this situation?

src/ser/mod.rs Outdated
@@ -288,6 +285,26 @@ where
}
}

pub struct Seq<'a, W: 'a + Write> {
Copy link
Contributor

@farodin91 farodin91 Aug 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you like to move this into a new file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a good idea. Does it belong in src/ser/var.rs?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can just create a new src/ser/seq.rs

@Michael-F-Bryan
Copy link
Contributor Author

@oli-obk I found a "fix" to make sure you don't try and serialize a sequence of primitives. It's really horrible though, so hopefully we can think of something which would be better in the long run.

One possibility is to create a helper Serializer which will return true when serializing a primitive and false when serializing anything complex (short-circuiting so we don't end up traversing the entire object and its children).

@oli-obk
Copy link
Collaborator

oli-obk commented Aug 22, 2017

One possibility is to create a helper Serializer which will return true when serializing a primitive and false when serializing anything complex (short-circuiting so we don't end up traversing the entire object and its children).

That sounds much better, yes. And it should be possible to optimize it away with good enough const evaluation.

@Michael-F-Bryan
Copy link
Contributor Author

What are your thoughts on the last commit? It adds quite a few lines, but most of those are just from the boilerplate required to implement Serializer and a bunch of tests to go through the various types you could possibly encounter.


#[allow(unused_variables)]
impl<'a> Serializer for WrapSafeDetector<'a> {
type Ok = ();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you just make this bool instead of () and pass the value that way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I initially thought to do.

The problem is not all methods return a Result<Self::Ok, _>, so you'd end up needing to create your own SerializeSeq, SerializeMap, and all the other various helper types. Adding lots of code for not much gain.

I'm not a fan of the mutable state, but considering all this is hidden behind the is_wrapped() interface, using a mutable reference is probably the easiest option.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just remove the state and use is_ok()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't fix the issue though. To return an Ok the type signatures say your Ok must be of type Ok(Self::SerializeSeq) (precise type depends on the method). So you still need to create some struct with the 7 impls for SerializeSeq, SerializeTuple, SerializeMap, etc...

What benefit do you think removing the state would have?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What benefit do you think removing the state would have?

It looks ugly and is probably not optimized out.

That doesn't fix the issue though. To return an Ok the type signatures say your Ok must

You misunderstand me. Just return Ok(()) where you set is_wrapped = false and Err(()) where you set is_wrapped = true or the other way around. Then your is_wrapped function becomes thing.serialize(WrapSafeDetector).is_ok()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh, that sounds promising! I'll give it a shot.

@Michael-F-Bryan
Copy link
Contributor Author

@oli-obk, I've pushed up a version of the is_wrapped() helper which doesn't rely on mutable state.

I was able to skip most of the Serialize* impls, but still had to write out one or two dummy types to make the type signature happy and make sure all my tests keep passing.

@oli-obk
Copy link
Collaborator

oli-obk commented Aug 22, 2017

I've pushed up a version of the is_wrapped() helper which doesn't rely on mutable state.

looks much better

but still had to write out one or two dummy types to make the type signature happy and make sure all my tests keep passing.

I don't quite understand why you can't return Err in the sequence cases. Can you elaborate?

@Michael-F-Bryan
Copy link
Contributor Author

Michael-F-Bryan commented Aug 22, 2017

I don't quite understand why you can't return Err in the sequence cases. Can you elaborate?

I'm pretty sure if you treat a sequence as "wrapped", when serializing a Vec<Vec<u32>> you'd end up with the original problem of all the elements being mushed together. The argument for tuples would also be the same.

All the other types are okay though because you've typically got a struct/enum name and the contents would be wrapped in <$name></$name>.

Put differently, should it be possible to serialize something like vec![vec![1, 2, 3], vec![4, 5, 6]] or (5, true)? I'm leaning towards saying no because I can't think of a way to serialize the elements seeing as they don't have any useful "name" I can use to wrap them in. So in that sense, sequences and tuples aren't "wrapped".

@oli-obk
Copy link
Collaborator

oli-obk commented Aug 22, 2017

So in that sense, sequences and tuples aren't "wrapped".

Thanks! That made total sense.


fn serialize_element<T>(&mut self, elem: &T) -> Result<Self::Ok, Self::Error>
where T: Serialize + ?Sized {
if is_wrapped(elem) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So with the new knowledge I have gained by your explanation, I am back with more questions:

Why would we even care about whether the elements are wrapped? Aren't all sequences not wrapped, irrelevant of their elements?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Say I were to serialize vec![1, 2, 3, 4], you'd end up calling serialize_element() on each of the elements. Usually this would be a no-op elem.serialize(&mut *self.parent) to serialize the element, however if you do this on a sequence of "primitives" (i.e. anything which isn't "wrapped" in some sort of named tag), you end up getting 1234 out. Which isn't what we want.

I could wrap any primitives in an arbitrarily named tag and get something like <value>1</value>...<value>4</value>, but then that "value" tag is completely arbitrary, not really changeable, and probably wouldn't interop well with other XML libraries from other languages.

My is_wrapped() check there is aiming to prevent this issue by saying sequences of bare primitives aren't valid XML, so if you want to serialize them you need to explicitly wrap it in a named newtype.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh I remember. Yea I took the opposite way and made sure the sequence deserializer looked for repetitions of e.g. the field name. So multiple <foo></foo> would be what you'd get for a struct Bar { foo: Vec<String> }

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have time in the next weeks to consider sequences all that much... We'll leave this open. Maybe @RReverser has some ideas.

@Michael-F-Bryan
Copy link
Contributor Author

I don't have time in the next weeks to consider sequences all that much.

@oli-obk, if it helps you I'll try to come up with tests which cover the more common edge cases I can think of. Working tests are probably going to make the code easier to understand and reason about than anything I can say anyway.


Also, I noticed the is_wrapped() helper could be used to resolve this TODO. After we're happy with sequences I'll make a PR to fix that too.

@oli-obk
Copy link
Collaborator

oli-obk commented Aug 22, 2017

if it helps you I'll try to come up with tests which cover the more common edge cases I can think of. Working tests are probably going to make the code easier to understand and reason about than anything I can say anyway.

I think we need to start even higher level than that. We need to figure out what kind of sequences we want to support in general and how they are supposed to be represented.

@Michael-F-Bryan
Copy link
Contributor Author

We need to figure out what kind of sequences we want to support in general and how they are supposed to be represented.

What did you have in mind? What about composing a list of things which we think are valid and invalid, then try to generalise that into a couple simple rules?

Here are some of the things I think we should support and how I'd represent them:

Element Declaration Example Input Expected Output
struct Foo; vec![Foo, Foo] <Foo></Foo><Foo></Foo>
struct Foo{x: u32} vec![Foo{x: 5}] <Foo><x>5</x></Foo>
struct Foo(u32); vec![Foo(1), Foo(2)] <Foo>1</Foo><Foo>2</Foo>
u32 vec![1, 2] Error
(u32, bool) (5, true) Error
struct Foo; struct Bar(bool); (Foo, Bar(false)) <Foo></Foo><Bar>true</Bar>
&str vec!["foo", "bar"] Error

@Michael-F-Bryan
Copy link
Contributor Author

Michael-F-Bryan commented Aug 22, 2017

I had a bit of a think and after some experimentation came up with a fairly small set of rules... How does this sound?

  • If something is serializable It must be wrapped
  • If it contains a sequence (tuples count as sequences for all intents and purposes) then each element in that sequence must be wrapped (preventing the vec![1, 2, 3] => 123 issue)
  • If you're serializing a struct (or struct variant) then you have a tag with the struct's name, containing its attributes as a sequence of <$attr>$value</$attr>
  • tuple structs or variants can be thought of as tuples wrapped with a name. As such, the tuple rule about all elements being wrapped applies
  • primitives (e.g. numbers, bools, and strings) serialize roughly to their Display representation

Being "wrapped" means you are of the form `<$name>...</$name>". There's probably an official term for this but hopefully that'll suffice for now.

@oli-obk
Copy link
Collaborator

oli-obk commented Aug 24, 2017

I worry that our Serializer and Deserializer diverge. Can you try to make as many tests as possible roundtrip tests?

@Michael-F-Bryan
Copy link
Contributor Author

I'm a bit concerned about that as well. I'm using serde-xml-rs for a project and was finding that the serializing and deserializing aren't symmetric.

Because XML is a little different to JSON and the other formats, is the way the Deserializer works documented anywhere? From what I can see there are loads of tests to verify the output the Serializer generates, but I don't really see any for the Deserializer side to help explain what it's doing so I can make sure the two are similar.

@albel727
Copy link

Is there any progress on this? I would have expected serialize() to be actually able to handle what deserialize() produced, like Vec<Struct>.

It also appears, that serialization makes things doubly wrapped, i.e for field: Struct it will output <field><Struct>...</Struct></field>. Which is not the same thing that Deserialize parses, namely just <field>...</field>. How is one supposed to write data wrapped in a single tag, or serialize the same thing as they deserialized?

I think at this point it should be settled if tags represent

  1. field names (In which case the name of the root tag should probably be accepted as a parameter to serializer, but can be defaulted to that of the top struct name of course), or
  2. type/struct names (less flexible, weird handling around even singular primitive fields), or
  3. both (deserializing most pre-existing documents becomes impossible).

One doesn't even have to choose only one behavior, since it could be different Serializers or an option to a single Serializer. I can't help but feel that there's some overthinking going on in an attempt to choose "the one true output format", that stalls the entire development.

I see no need for complicated checks, if they end with runtime errors and/or bulky <field><Struct></Struct></field> nesting and don't provide any additional flexibility. I think the solution with wrapping Vec<primitive> elements with <value> or <field_name> or <type_name> tags is perfectly valid and good enough to choose under the circumstances.

At least it doesn't lose separation of elements, so serialize(deserialize(x)) round-trip is identity by default, and everyone who tries to serialize a sequence of xml-irrepresentable values deserves no better than that.

But they also deserve better than being told "Unsupported Operation: serialize_seq" in a runtime panic, after they invested time in using serde-xml-rs for serialization and thought they had everything in order when things successfully compiled.

If they want to change the tag, then using a wrapping struct or, barring that, using a trivial [serde(deserialize_with)] function that internally uses a wrapping struct should be simple enough.

If that still is not deemed sufficient, a sequence customization callback/visitor provided by user to De/Serializer instance is a distinct possibility too.

E.g something that takes an analogue of serde_ignored::Path to the currently de/serialized sequence, or is a cut-down visitor that gets {enter/exit}_{field/tuple/sequence/struct} events (so that it doesn't build a Path when it doesn't need it). Plus maybe element_type string and/or needs_wrapping boolean.

And then it decides if the sequence is <nested_tag>element</nested_tag>, or element<separator_tag/>element, or even elementSEPARATOR_TEXT_OR_ENTITY_POSSIBLY_EMPTYelement. And don't guarantee (lossless) de/serialization of the latter two, the user had it coming if they chose that (but they probably should be able to choose that).

Nested tag and empty separator support would be most useful, I think, and a simple de/serializer constructor should default to something sane, like nested <value> tag for primitive sequence values.

@flying-sheep
Copy link

Hi! I’m also interested in this! Can we help?

@punkstarman
Copy link
Collaborator

We need to revive discussions about this topic, and serialization in general.

I will start by reviewing the unit tests.

I too would like to strive for as much parity between ser and de, but we should probably focus on enabling the different formats that most users want to serialize to.

@flying-sheep
Copy link

flying-sheep commented Jan 10, 2019

Maybe it becomes conceptually more simple if we plan with serde-like field attributes from the beginning.

I have a structure that’s sufficiently XML-like:

// Attribute types
#[derive(Serialize)]
struct NameToken(String);
//...


// Attribute containers
#[derive(Serialize, Default)]
struct CommonAttributes {
    #[serde(skip_serializing_if = "Option::is_none")]
    #[serde_xml_rs(attribute)]
    id: Option<NameToken>,
    #[serde(skip_serializing_if = "Vec::is_empty")]
    #[serde_xml_rs(attribute)]
    names: Vec<NameToken>,
    //...,
}
//...


// Pseudo elements
#[derive(Serialize)]
struct Document {
    children: Vec<Box<SubStructure>>,
}
#[derive(Serialize)]
struct TextNode { text: String }


// Element types
#[derive(Serialize)]
struct Section {
    #[serde(flatten)]
    common_attributes: CommonAttributes,
    children: Vec<Box<SubStructure>>,
}
#[derive(Serialize)]
struct Title {
    #[serde(flatten)]
    common_attributes: CommonAttributes,
    children: Vec<Box<Inline>>,
}
//...


// Element categories
#[derive(Serialize)]
#[serde(rename_all = "snake_case")]
enum SubStructure {
    Section(Section),
    Title(Title),
    //...
}
#[derive(Serialize)]
#[serde(rename_all = "snake_case")]
enum Inline {
    #[serde_xml_rs(text_node)]
    TextNode(TextNode),
    //...
}
//...

And I would like to be able to serialize it as it should be:

// Wrap element types `into` their element categories
let doc = Document::with_children(vec![
    Section::with_children(vec![
        // Creates title with ID and children
        Title::new("first_chapter", vec![
            "First Chapter".to_owned().into(),
        ]).into(),
    ]).into(),
]);

assert_equal!(
    serde_xml_rs::serialize(tree),
    "\
<document>
  <section>
    <title id="first_chapter">First Chapter<title>
  </section>
</document>\
")

link to playground where I serialize this to JSON

@ebkalderon
Copy link

I presume this PR can be safely closed since #173 has been merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants