Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complex Model Serilization Key Is Duplicated Issue With Key Attirubute #1683

Closed
mcaninci opened this issue Sep 22, 2023 · 7 comments
Closed
Labels

Comments

@mcaninci
Copy link

mcaninci commented Sep 22, 2023

Bug description

I have the complex model and I added key attributes for all model properties. But the Baseclass has ID property and all models got inheritance from it. Also, TestMainModel has other sub-models inherited from BaseClass.

As far as I understand all models got an inheritance from the base class that same Key attribute for the Id property. That's why when I tried to serialize the model, I got the Key Is Duplicated Issue.

image

key is duplicated, all members key must be unique. type: ModelBase member:Id

I tried to solve the issue with CustomResolver.But it didn't work for it.

Repro steps

[Serializable]
    [MessagePackObject]
    public class BaseModel : BaseModel<int>
    {

    }

    [Serializable]
    [MessagePackObject]
    public class BaseModel<T> : BaseModelCore
    {
        [Key(99)]
        private T _Id { get; set; }
        [Key(1)]
        public T Id { get { return _Id; } set { this._Id = value; base.PrimaryKey = value; } }
    }

    [Serializable]
    [MessagePackObject]
    public class BaseModelCore
    {

        [Key(98)]
        public object PrimaryKey { get; internal set; }
        [Key(0)]

        public int? InsertUserId { get; set; }

    }

    [MessagePackObject]
    public class TestMainModel : BaseModel
    {
        [Key(0)]
        public string Name { get; set; }
        [Key(1)]
        public int Age { get; set; }
        [Key(2)]
        public TestSub1Model TestSub1Model { get; set; }
        [Key(3)]
        public TestSub2Model TestSub2Model { get; set; }
        
    }

    [MessagePackObject]
    public class TestSub1Model : BaseModel
    {
        [Key(0)]
        public string Name { get; set; }
        [Key(1)]
        public int Age { get; set; }

    }

    [MessagePackObject]
    public class TestSub2Model : BaseModel
    {
        [Key(0)]
        public string Name { get; set; }
        [Key(1)]
        public int Age { get; set; }

    }

    public static void Main(string[] args)
    {
    
        TestMainModel testMainModel = new TestMainModel { Name = "John", Age = 30 };
        testMainModel.Id = 2;
        testMainModel.InsertUserId = 5;
        testMainModel.TestSub1Model=new TestSub1Model { Name = "John", Age = 30,Id=3,InsertUserId=6 };
        testMainModel.TestSub2Model = new TestSub2Model { Name = "John", Age = 30, Id = 8, InsertUserId = 9 };

        byte[] serializedData = null;
        try
        {
          

            serializedData = MessagePackSerializer.Serialize(testMainModel);
           
        }
        catch (Exception ex)
        {
            Console.WriteLine(ex.Message);
        }
   }

Expected behavior

When I set the Key attribute for base class properties, The classes inherited from BaseModel can be used on the same model and don't get any duplicated Id issues.

So If I can define the Key attributes like that below, In my opinion, the messagepack doesn't throw exceptions.

    [MessagePackObject]
    public class BaseModel<T> : BaseModelCore
    {
        [Key(typeof(T).toString()+randomNum)]
        private T _Id { get; set; }
     [Key(typeof(T).toString()+randomNum)]
        public T Id { get { return _Id; } set { this._Id = value; base.PrimaryKey = value; } }
    }
  • Version used:2.1.80
  • Runtime: .Net5 and netstandard2.1
@AArnott
Copy link
Collaborator

AArnott commented Sep 22, 2023

Use of the [Key] attribute with integer arguments mean that that particular field or property will be serialized into that exact slot in an array. You cannot serialize two values into the same array index.
You must therefore not use the same indexes in the derived classes that the base classes use.

Also consider:

  • Don't serialize both a field and its property accessor, as that wastes space and time by serializing the same value twice.
  • Don't skip ahead to index 99 in your base class to avoid conflicts with members defined in derived classes, because that wil guarantee that the written msgpack will have an array that is 99 elements long, even if nearly all of them are empty.

@mcaninci
Copy link
Author

I appreciate for fast reply and good tricks. I got it now.
I am a new guy for the message pack and want to migrate all models to the message pack from JSON and binary serialize.

I have almost 4000 models in my project and I don't want to add the key attribute for all complex models. I think it's a risk for key maching.

So, are there alternative solutions to serialize and deserialize complex models with the base model or without the Key property in the message pack ?

@AArnott
Copy link
Collaborator

AArnott commented Sep 22, 2023

You can use [MessagePackObject(true)] on the class itself and I think that will avoid the requirement for [Key] attributes on every field/property. It will serialize maps of key=value pairs instead of just an array of values, so it'll be a bit slower and the serialized form larger. But it's far easier to maintain and evolve over time.

You could also try the "contractless" approach (search README for contractless), which can serialize classes with no attributes at all. But in my experience it's not particularly good. If it happens to do exactly what you need, then that's great.
If you don't want to attribute every class, and the built-in contractless resolver doesn't do the job for you, then writing your own custom formatter for each model, or create your own dynamic resolver as the final fallback. But a dynamic resolver is intense work, especially if you want it to have high performance.

@mcaninci
Copy link
Author

You could also try the "contractless" approach (search README for contractless), which can serialize classes with no attributes at all.

I tried KeyAttirubute (with the use of duplicated key 99 like example code),[MessagePackObject(true)], contractless. Also, I disabled the double attribute and deleted the serializable attribute at your suggestion.

Considering all these changes, I have benchmarks with the same data for different situations.I want to share my observation with you.

  1. Double attributes don't affect much the messagepack serialization. such as serializable and messagepackobject

2.I agree with you. I can use [MessagePackObject(true)] that to ensure a manageable model structure. Also, it didn't affect operation time more than binaryformatter.

3.I noticed that for the 10 megabyte sample data generated recursively, the messagepack spends more operation time than the binary formatter for both [Key] and [MessagePackObject(true)].

According to the benchmark data, what do you think about that? Has the messagepack chunk feature or other options for big data ?

What would be your suggestion? I appreciate your support and experience

KeyAttirubute Benchmark

Model Data
################# Data Size 10 megabyte#################
BinaryFormatter ObjectToByte Süresi: 1078 ms
BinaryFormatter ByteToObject Süresi: 1091 ms
MessagePackSerializer ObjectToByte Süresi: 1402 ms
MessagePackSerializer ByteToObject Süresi: 2262 ms
Dictionary Data
################# Data Size 10 megabyte#################
BinaryFormatter ObjectToByte Süresi: 726 ms
BinaryFormatter ByteToObject Süresi: 688 ms
MessagePackSerializer ObjectToByte Süresi: 1033 ms
MessagePackSerializer ByteToObject Süresi: 4354 ms

Model Data
################# Data Size 4 megabyte#################
BinaryFormatter ObjectToByte Süresi: 505 ms
BinaryFormatter ByteToObject Süresi: 244 ms
MessagePackSerializer ObjectToByte Süresi: 59 ms
MessagePackSerializer ByteToObject Süresi: 63 ms
Dictionary Data
################# Data Size 4 megabyte#################
BinaryFormatter ObjectToByte Süresi: 624 ms
BinaryFormatter ByteToObject Süresi: 292 ms
MessagePackSerializer ObjectToByte Süresi: 23 ms
MessagePackSerializer ByteToObject Süresi: 68 ms

[MessagePackObject(true)] Benchmark

Model Data
################# Data Size 10 megabyte#################
BinaryFormatter ObjectToByte Süresi: 1133 ms
BinaryFormatter ByteToObject Süresi: 927 ms
MessagePackSerializer ObjectToByte Süresi: 3251 ms
MessagePackSerializer ByteToObject Süresi: 3364 ms
Dictionary Data
################# Data Size 10 megabyte#################
BinaryFormatter ObjectToByte Süresi: 723 ms
BinaryFormatter ByteToObject Süresi: 1087 ms
MessagePackSerializer ObjectToByte Süresi: 2816 ms
MessagePackSerializer ByteToObject Süresi: 10717 ms

Model Data
################# Data Size 4 megabyte#################
BinaryFormatter ObjectToByte Süresi: 509 ms
BinaryFormatter ByteToObject Süresi: 310 ms
MessagePackSerializer ObjectToByte Süresi: 161 ms
MessagePackSerializer ByteToObject Süresi: 108 ms
Dictionary Data
################# Data Size 4 megabyte#################
BinaryFormatter ObjectToByte Süresi: 923 ms
BinaryFormatter ByteToObject Süresi: 283 ms
MessagePackSerializer ObjectToByte Süresi: 300 ms
MessagePackSerializer ByteToObject Süresi: 209 ms

@AArnott
Copy link
Collaborator

AArnott commented Sep 25, 2023

The binary formatter mostly just copies memory, whereas msgpack prescribes a variety of formats that are interoperable, so it's understandable that msgpack may be slower than binary formatter. If you want raw speed (something more akin to BinaryFormatter), you may want to check out https://github.com/Cysharp/MemoryPack.

@mcaninci
Copy link
Author

mcaninci commented Oct 4, 2023

Thanks for your all support. I checked and tested the memory pack and it's faster than the binary formatter. But as far as I understood it doesn't support some types such as dynamic objects etc and I couldn't run memorypack with .net 5. Also, The memory pack implementation is harder than the message pack for me. Therefore I going to continue with the memory pack.

I have last one question about the message pack. I noticed that the message pack's first running time was longer than the other running times. I have working time with the same data and shared it below.

4 MB
###First Time###
MessagePack Serialize operation time:565.2917 ms
MessagePack Deserialize operation time:423.4199 ms
###Second Time###
MessagePack Serialize operation time:152.7772 ms
MessagePack Deserialize operation time:141.5665 ms
###Third Time###
MessagePack Serialize operation time:52.6492 ms
MessagePack Deserialize operation time:56.6608 ms
###Fourth Time###
MessagePack Serialize operation time:50.3348 ms
MessagePack Deserialize operation time:71.215 ms

What is the reason for the differences in the operation times in the messagepack?
Is the memory package caching some configurations? model mapping or model array structure, etc. like. If so, is there a way to preload them and make the initial and other runtimes stable?

I cache some data and these data cannot be serialized again after the first run, which causes me to lose the advantage of messagepack.

@AArnott
Copy link
Collaborator

AArnott commented Oct 4, 2023

Two things make it slower the first time:

  1. Dynamically generating formatters based on attributes.
  2. JIT compiling the custom formatters.

Most of the time typically goes to the first of these. You can avoid much of the first time hit by generating formatters ahead of time. Check out our README's AOT sections for how to accomplish this.

@AArnott AArnott closed this as not planned Won't fix, can't repro, duplicate, stale Oct 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants