Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fails to deserialize adjacent enum unit and struct variants #250

Open
chipsenkbeil opened this issue Jun 6, 2020 · 7 comments
Open

Fails to deserialize adjacent enum unit and struct variants #250

chipsenkbeil opened this issue Jun 6, 2020 · 7 comments

Comments

@chipsenkbeil
Copy link

chipsenkbeil commented Jun 6, 2020

Description

I believe that I have encountered some bugs when attempting to convert from msgpack back into a Rust enumeration when using adjacent tagging. This is not a problem for msgpack if I only do inner tagging.

I've attached a fully working, minimal example where the tests can highlight working and failing scenarios: msgpack-rust-bug.zip

I've also highlighted most relevant parts below including the version of serde and rmp-serde used as well as highlighting working adjacent serialization and deserialization with JSON.

Cargo.toml:

[dependencies]
serde = { version = "1.0.111", features = ["derive"] }
serde_json = { version = "1.0.48" }
rmp-serde = "0.14.3"

Example Setup

For code like this:

use serde::{Deserialize, Serialize};

#[derive(Serialize, Deserialize, Clone, Debug, PartialEq, Eq)]
#[serde(tag = "type", content = "payload")]
pub enum Example {
    Unit1,
    Unit2,
    HasValue { x: u32 },
    TupleWithValue(u32, u32),
    InnerValue(SomeInnerValue),
}

#[derive(Serialize, Deserialize, Clone, Debug, PartialEq, Eq)]
pub struct SomeInnerValue {
    pub a: u32,
    pub b: String,
}

Struct Variant Failure

let v = rmp_serde::to_vec(&Example::HasValue { x: 3 }).unwrap();

// Fails unwrap with Syntax("invalid type: sequence,
// expected struct variant Example::HasValue")
let ex: Example = rmp_serde::from_slice(&v).unwrap();
assert_eq!(Example::HasValue { x: 3 }, ex);

Unit Variant Failure

let v = rmp_serde::to_vec(&Example::Unit1).unwrap();

// Fails unwrap with Syntax("invalid length 1,
// expected adjacently tagged enum Example")
let ex: Example = rmp_serde::from_slice(&v).unwrap();
assert_eq!(Example::Unit1, ex);

Success with serde json

let json_text = "{\"type\":\"Unit1\"}";
let ex: Example = serde_json::from_str(json_text).unwrap();
let output = serde_json::to_string(&ex).unwrap();
assert_eq!(json_text, output);

let v = serde_json::to_vec(&Example::HasValue { x: 3 }).unwrap();
let ex: Example = serde_json::from_slice(&v).unwrap();
assert_eq!(Example::HasValue { x: 3 }, ex);

let v = serde_json::to_vec(&Example::TupleWithValue(1, 2)).unwrap();
let ex: Example = serde_json::from_slice(&v).unwrap();
assert_eq!(Example::TupleWithValue(1, 2), ex);
@kornelski
Copy link
Collaborator

Yes, this crate's handling of enums is not as robust as serde-json's. It would be great if you could have a look at the code and find where it's going wrong.

@chipsenkbeil
Copy link
Author

chipsenkbeil commented Jun 7, 2020

@kornelski, cloned the repo to add in the tests and poke around a little bit. So far, I've just been observing from the outside. It seems that the tagging switches communicating unit variants as single element fixmaps with a positive fixint key and nil value to a single-element fixarray with a fixstr representing the variant selected.

My guess is that with the standard version of enums, the type is included externally and thereby with msgpack not used at all. With inner tagging or adjacent tagging, the type is included inside and is being passed as this string version in an array. Given that there's nothing to indicate which form of tagging is being used, I'm assuming this would cause problems when trying to deserialize? Don't have an understanding of how the serde implementation works here yet.

[UPDATE] There is specific variations in serde based on the type of tagging used, defaulting to externally tagged configuration. It seems that a lot of the serialization references don't apply if not using external tagging. As seen here, adjacently tagged for a unit variant uses a struct serialization followed by a field serialization.

In the example below, it appears to invoke:

  • serialize_struct("Enum", 1)
  • serialize_str("V1")
  • serialize_struct("Enum", 1)
  • serialize_str("V2")

serde_cbor is also able to handle adjacent tagging correctly, although I believe this may only be for the unpacked case. You can serialize packed (where it converts variants and fields to integers) and unpacked (where fields/variants are kept as strings). Given that msgpack normally serializes variants as integer indexes, maybe this alternative encoding for adjacent (to try to include the type and content) is causing confusion in what is expected?


Without the tagging:

    #[derive(Serialize)]
    enum Enum {
        V1,
        V2,
    }

    let mut buf = Vec::new();
    Enum::V1.serialize(&mut Serializer::new(&mut buf)).unwrap();
    Enum::V2.serialize(&mut Serializer::new(&mut buf)).unwrap();

    // Expect: {0 => nil} {1 => nil}.
    assert_eq!(vec![0x81, 0x00, 0xC0, 0x81, 0x01, 0xC0], buf);

With the tagging:

    #[derive(Serialize)]
    #[serde(tag = "t", content = "c")]
    enum Enum {
        V1,
        V2,
    }

    let mut buf = Vec::new();
    Enum::V1.serialize(&mut Serializer::new(&mut buf)).unwrap();
    Enum::V2.serialize(&mut Serializer::new(&mut buf)).unwrap();

    // Expect: []
    // 0x91 == single element array
    // 0xA2 == fixed string of 2 characters
    //
    // 0x56 == V
    // 0x31 == 1
    // 0x32 == 2
    assert_eq!(vec![0x91, 0xA2, 0x56, 0x31, 0x91, 0xA2, 0x56, 0x32], buf);

@Newbytee
Copy link

Newbytee commented Jul 24, 2020

@chipsenkbeil Are you currently working on attempting to solve this?

@chipsenkbeil
Copy link
Author

@Newbytee, no, after sharing my debugging, I switched to a different serialization due to time constraints.

@Newbytee
Copy link

@Newbytee, no, after sharing my debugging, I switched to a different serialization due to time constraints.

Thanks for the swift reply. I might look into this in that case. Did you switch to a different serialisation format or a different serialiser? If the latter, which?

@chipsenkbeil
Copy link
Author

@Newbytee, no, after sharing my debugging, I switched to a different serialization due to time constraints.

Thanks for the swift reply. I might look into this in that case. Did you switch to a different serialisation format or a different serialiser? If the latter, which?

I switched to serde_cbor while still using the adjacent tagging as I was able to drop it in quickly and keep moving with my project. :)

@Kobzol
Copy link

Kobzol commented Jun 20, 2021

I looked into this and the problem seems to be rather in the interaction of this library with serde. The problem is that serde has several assumptions about deserializing adjacently tagged enums, and this library breaks them. I created an issue to see if this can be resolved in serde.

In the meantime, as a workaround you can try to use different serialization configs that encode structs as maps and not sequences. The easiest way how to achieve that is to use rmp_serde::to_vec_named instead of rmp_serde::to_vec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants