-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Help deserialize mixed tags and string in body $value (html text formatting) #147
Comments
Has this been addressed? And if so, can it be documented somewhere? Being able to quickly process something like say XHTML would be very helpful. |
I think I might have encountered a similar, but different bug. I'm struggling to create a concise test case replicating the error.
It seems to be a bad interaction between At a high level: #[derive(Debug, Serialize, Deserialize, PartialEq)]
enum Requests {
RequestA(RequestA),
RequestB(RequestB),
/* ... */
RequestN(RequestN),
}
#[derive(Debug, Serialize, Deserialize, PartialEq)]
struct RequestA {
ID: u64,
DataAContent: DataAContent,
}
/* elided other struct RequestXX definitions */
#[derive(Debug, Serialize, Deserialize, PartialEq)]
struct DataAContent {
Field1: String,
Field2: u64,
Field3: Email,
Line: Line, // <--- This is the line that may occur multiple times in the XML
}
#[derive(Debug, Serialize, Deserialize, PartialEq)]
Struct Line {
Name: String,
Data: String,
} and my change: Field1: String,
Field2: u64,
Field3: Email,
- Line: Line,
+ #[serde(rename = "$value")]
+ lines: Vec<Line>,
}
Struct Line { Without the change, my unit tests pass successfully. With the change, I get the error above. Am I doing something stupid? Note that I also update the Here's some XML that should match the above: <RequestA ID=1>
<DataAContent>
<Field1>Some text</Field1>
<Field2>1000000</Field2>
<Field3>me@example.com</Field3>
<Line>
<Name>Adjust</Name>
<Data>This ia a quite long string of text</Data>
</Line>
</DataAContent>
</RequestA>
<RequestB ID=2>
<DataBContent>
<!------- elided ------>
</DataBContent>
</RequestB>
<!------ elided -------->
<RequestN ID=72> If I had to guess, It seems like the cursor isn't getting set properly when it encounters only a single item Another Note: I've added test cases within serde-xml-rs to exercise |
Ok, I was able to reproduce the error with a unit test inside of https://github.com/jac-cbi/serde-xml-rs/tree/fix/enum_vec_single_element @punkstarman pls take a look |
If you put the |
Ok, making progress on understanding how this works. My current theory is that data structures containing It seems that the child node inherits state from the parent node src/de/mod.rs:147 impl<'de, R: Read, B: BufferedXmlReader<R>> Deserializer<R, B> {
fn child<'a>(&'a mut self) -> Deserializer<R, ChildXmlBuffer<'a, R>> {
let Deserializer {
buffered_reader,
depth,
is_map_value,
non_contiguous_seq_elements,
..
} = self;
Deserializer {
buffered_reader: buffered_reader.child_buffer(),
depth: *depth,
is_map_value: *is_map_value, // <------ here
non_contiguous_seq_elements: *non_contiguous_seq_elements,
marker: PhantomData,
}
} But then descending into the next child node consumes the state src/de/mod:194 /// If `self.is_map_value`: Performs the read operations specified by `f` on the inner content of an XML element.
/// `f` is expected to consume the entire inner contents of the element. The cursor will be moved to the end of the
/// element.
/// If `!self.is_map_value`: `f` will be performed without additional checks/advances for an outer XML element.
fn read_inner_value<V: de::Visitor<'de>, T, F: FnOnce(&mut Self) -> Result<T>>(
&mut self,
f: F,
) -> Result<T> {
if self.unset_map_value() { // <-------------- Here
debug_expect!(self.next(), Ok(XmlEvent::StartElement { name, .. }) => {
let result = f(self)?;
self.expect_end_element(name)?;
Ok(result)
})
} else {
f(self)
}
} Here's pub fn unset_map_value(&mut self) -> bool {
::std::mem::replace(&mut self.is_map_value, false)
} @punkstarman , am I on the right track? Shouldn't the state have a |
I've added some more unit tests attempting to isolate the bug. It seems that |
I reworked my code to not use The core problem is that nested For now, I'm just going to work around it. If I encounter it again, I'll look into a more proper fix. |
Thanks @jac-cbi for spending so much time and sharing your thoughts and learning. I've been busy with prep for a conference. I'll try to look into this more this weekend. |
So I'm not good at GitHub. But it appears this has diverged significantly from the original issue. I'd like to keep this convo intact, but it should really be a separate issue. If you know how to fix that, pls do so and add me to the new issue. As for the bug itself, I'm def still interested in a fix, as my outer |
I'm trying to deserialize some dictionary defitnitions and came across this one which contains mixed multiple tags with normal string (html text formatting).
I looked around in serde-xml-rs tests and tried this solution which seems to be close but it doesn't quite work
The error I'm getting is:
I can make it work for now by not using
MyEnum
and just usedefinition: Vec<String>
, but then I wouldn't know which text is bold and which is italic.How can I properly deserialize this?
The text was updated successfully, but these errors were encountered: