Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New persistence strategy for better cross-version compatibility #1502

Open
9 tasks done
Elesbaan70 opened this issue Mar 30, 2022 · 58 comments · May be fixed by #1559
Open
9 tasks done

New persistence strategy for better cross-version compatibility #1502

Elesbaan70 opened this issue Mar 30, 2022 · 58 comments · May be fixed by #1559
Assignees
Labels
feature A new distinct feature lifecycle related to the loading process like enabling/disabling/loading/reloading/load order/hot reload serialization load/store data in memory or on disk triage Awaiting issue categorisation

Comments

@Elesbaan70
Copy link
Contributor

Elesbaan70 commented Mar 30, 2022

Note - This issue originally called for the use of Json, which was determined be unfeasible due to the lack of a good full-featured Json serializer that is compatible with Unity.

Describe your idea

In TMPE's current persistence strategy, the introduction of a new type into the persistent data will break backward compatibility. Although old versions are not supported, players reverting from alpha to stable should not lose savegame data needlessly.

The same issue also affects forward compatibility in ways that restrict future development choices. Ideally, the save data would not be bound to specific types. A move away from strongly typed serialization would give everyone more flexibility moving forward.

Requirements

  1. This is an opt-in change. All existing serialization will remain untouched until there is a reason to do it the new way, as determined feature by feature.
  2. A manager-based persistence mechanism in which SerializableDataExtension enumerates a collection of managers and calls LoadData() and SaveData() methods basically like the ones we have today.
  3. SerializableDataExtension will save the data from each manager separately, to isolate them and minimize the risks associated with changes to persistent data.
  4. A manager will have a means of identifying other managers upon which its data depends. This will determine the enumeration order during load and save.
  5. Linq to XML will be used for persistence.

TrafficLightSimulationManager will be the first converted to the new persistence strategy, since displaced lane support was the breaking change that prompted this issue. Its conversion to the new persistence strategy will be delivered separately from and prior to displaced lane support.

Tasks

  • Persistence framework
  • Add a DOM to the serialized data and make a mechanism for opt-in transition away from the old serialization strategy (SerializableDataExtension)
  • Update timed traffic lights to be saved into the DOM (half-baked persistable model)
  • Update junction restrictions to be saved into the DOM (fully persistable model)
  • Compress XML data, with fallback to uncompressed if it fails (especially on some Linux distros)
  • Test versioning in a relevant feature branch such as https://github.com/Elesbaan70/TMPE/tree/lane-grouping
  • Final code cleanup
  • Wiki articles for future development
  • Mark New persistence strategy #1559 ready for review
@Elesbaan70 Elesbaan70 added feature A new distinct feature triage Awaiting issue categorisation labels Mar 30, 2022
@originalfoo originalfoo added the Settings Road config, mod options, config xml label Mar 30, 2022
@kianzarrin
Copy link
Collaborator

is json better than xml?

@kianzarrin
Copy link
Collaborator

binary serialization is possible in theory but is clunky in practice specially with this old version of mono.

@Elesbaan70
Copy link
Contributor Author

Elesbaan70 commented Apr 2, 2022

is json better than xml?

Json's simplicity tends to mean less setup and tweaking to get the results you need, which I think makes it better for small applications like TMPE. For large and complex data models I tend to lean toward XML, but in smaller applications like this, XML done right could end up being one of the most complex parts of the code - perhaps an overstatement, but it could take on complexity that seems out of proportion to the amount of data we're working with.

I also prefer XML for files intended to be manually edited. Json can be manually edited, but in my opinion it is a slightly less friendly format for that.

(If you're wondering, I think XML is a better fit for AN, both because your long-term plans involve much more complex data, and because it's intended to be manually editable.)

binary serialization is possible in theory but is clunky in practice specially with this old version of mono.

There's actually a binary version of Json, unsurprisingly called Bson, and the version of mono is a non-issue as far as that goes. But text serialization is easier to troubleshoot, and I don't think we're talking about enough data for file size to be a concern.

@Elesbaan70
Copy link
Contributor Author

Elesbaan70 commented Apr 2, 2022

is json better than xml?

Also, System.Xml.Serialization has a lot of weird limitations, like you can't use it to serialize dictionaries. So we'd have to either live with the limitations, or else find a third-party XML serializer. Newtonsoft.Json is technically third-party, but it's a very pervasive standard, especially on older .NET/Mono versions.

@kianzarrin
Copy link
Collaborator

Json sounds great.
We already use XML in TMPE to store global config. i hope using several different standards does not cause confusion.

@kianzarrin
Copy link
Collaborator

in AN ppl manually modify pops so I think XML is good

@krzychu124
Copy link
Member

XML allows saving more complex data structures which will be way more readable for the user if he wants to edit something.
The main problem with the current implementation.
I'll remind that we are talking about serialization of all TM:PE data in the savegame not just simple configs. The problem now is we store everything into one single binary which cannot be deserialized if you don't have fully compatible classes. Just like @Elesbaan70 mentioned, literally any change in the data model makes it unloadable when you try to use previous version of the mod.
Example: when you add boolean field to one of serialized classes (you don't need to even use it anywhere) the new data after serialization will be 100% incompatible with previous version of the mod because of how binary serialization work - chunk of bytes, requires a class which work as a form of template describing how data should be read.

Having a JSON or even XML will make it at least editable -> you can dump the binary data into (JSON/XML) string and modify it at worst case or prepare small migration function which could automatically fix/transform things in case we make a mistake somewhere.
With binary you can't even see what's inside. Also one bonus of JSON/XML or any string based serialization -> you can easily write data migration code and most importantly unit/integration tests for that (some time ago I tried loading TMPE binary data and deserialize in unit test - impossible).

We are limited with size of binary array which can be saved by SerializableDataExtension (~16MB), so splitting data by managers will solve that problem.

Anyways I asked @Elesbaan70 about other ways for serialization since I also want to introduce changes to TMPE savegame data.
I need to store "airplane size restrictions" somehow, so just an addition to what we have without changing anything else. When I push that change to my draft PR it will break compatibility because non of previous version will be able to load data once saved with new version. Since it would be additional thing I don't see why it couldn't be completely skipped when loading while running previous version of the mod (data itself has no use if there is no code which could interpret it and do something later).

I hope migration out of binary serialization will solve that problem at least partially. Obviously the current (de)serialization code needs to stay where it is to allow loading data from older savegames.

@Elesbaan70
Copy link
Contributor Author

We already use XML in TMPE to store global config.

Completely different because it's a standalone file and theoretically editable. Although in practice it's not a good implementation for manual editing--you can't even tell which switches are which, except by counting from the top.

XML should be considered for any future standalone TMPE files.

@Elesbaan70
Copy link
Contributor Author

the new data after serialization will be 100% incompatible with previous version of the mod because of how binary serialization work - chunk of bytes, requires a class which work as a form of template describing how data should be read.

It's worth noting, in case it's relevant to the choices we're making, that binary serialization doesn't have to be this way. Bson doesn't behave this way. It's a question of how the binary serialization is accomplished.

System.Runtime.Serialization's basic serialization is strongly typed and wants to deserialize and figure out what to do with every element, and so any change breaks backward compatibility. This can probably be overcome with custom serialization, but there's no good reason to make it so complicated when alternatives are available.

@Elesbaan70
Copy link
Contributor Author

@aubergine10 The "Settings" label is incorrect for this. It should be labeled "serialization" and "lifecycle". I would fix it, but I don't have any permissions here. :-)

@originalfoo originalfoo added serialization load/store data in memory or on disk lifecycle related to the loading process like enabling/disabling/loading/reloading/load order/hot reload and removed Settings Road config, mod options, config xml labels Apr 2, 2022
@krzychu124
Copy link
Member

It's worth noting, in case it's relevant to the choices we're making, that binary serialization doesn't have to be this way. Bson doesn't behave this way. It's a question of how the binary serialization is accomplished.

Yes I know, but I meant BInaryFormatter we currently use for (de)serialization. I think it won't make much of a difference if we convert JSON string as binary array or do similar with Bson. Whatever we select it need to end up as binary array.

@kianzarrin
Copy link
Collaborator

you can't even tell which switches are which, except by counting from the top.

tell me about it! I do that all the time! I was thinking about overriding the Serializer but I think its easier to just add something in the maintenance tab.

@Elesbaan70
Copy link
Contributor Author

I think it won't make much of a difference if we convert JSON string as binary array or do similar with Bson. Whatever we select it need to end up as binary array.

As I'm currently implementing it (a bit different from the demonstration code I showed you before), manager data is saved in a Dictionary<string, string> where the key is the full name of the saved data type (not of the manager type), and the value is the Json-serialized data. (The key is not strictly type-bound, it's only a naming convention.) I chose text serialization for easier troubleshooting.

The dictionary itself is also serialized as Json, which is converted to a byte array using StreamWriter. It could be serialized as Bson instead, but there wouldn't be much savings since the data is text anyway, and staying with Json saves us from having to bring in a second external dependency for the Bson library.

@Elesbaan70
Copy link
Contributor Author

After all this discussion, it turns out that we will have to go with XML anyway. The Unity port of Newtonsoft.Json doesn't really work, and it hasn't been maintained in several years. *sigh*

@kianzarrin
Copy link
Collaborator

kianzarrin commented Apr 3, 2022

Lets use a port of XML that is less problematic. I suspect this old version of mono does not have very good XML. I see macsurgey's mods have provided their own System.Xml.Linq.dll

@Elesbaan70
Copy link
Contributor Author

Linq to XML is a DOM, not a serializer. I haven't look at it in a while, so I'll take a fresh look, but working with a DOM basically means you're assembling and interpreting the XML elements manually, so it ends up being a lot of work.

Maybe I'm just being naïve, but I'm not very worried about Mono's System.Xml.Serialization. It's just that it's a pain to work with, but still probably better than doing it all ourselves.

@kianzarrin
Copy link
Collaborator

It's just that it's a pain to work with

If I recall properly lists are a pain to work with.

but still probably better than doing it all ourselves.

doing it ourselves? why not to just use 3rd party library?

@Elesbaan70
Copy link
Contributor Author

If I recall properly lists are a pain to work with.

I guess I don't have context for that, because I'm not sure what you're referring to.

doing it ourselves? why not to just use 3rd party library?

I meant "doing it ourselves" in the sense of processing all the elements manually, like MacSergey is apparently doing since he uses Linq to XML.

If you know of third-party serialization that will work under Unity and has a solid track record and active development, I'm all for it. But the Unity port of Newtonsoft.Json being nonfunctional and abandoned is an example of where this can eventually lead if we're not careful.

@kianzarrin
Copy link
Collaborator

I don't know I have to do some tests. i think I had to convert list to array for it to work.

@kianzarrin
Copy link
Collaborator

I was struggling with xml here: https://github.com/Quboid/CS-MoveIt/blob/da6d67d770ff013162f931f01b3e88949ee7b3cc/MoveIt/Moveable/Instance.cs#L120

I don't know if there was an issue with dictionary or list or both but I had to use array.

@Elesbaan70
Copy link
Contributor Author

Yes, that's one of the annoying limitations I mentioned in System.Xml.Serialization is that dictionaries are not supported. Another is that it's hard to make enums backward-compatible.

After sleeping on it, I'm starting to warm up to the idea of using Linq to XML like MacSergey appears to be doing (I'm assuming this based on the presence of System.Xml.Linq.dll.) I'm going to try going down that path for a little bit, and see how it plays out. (Since TMPE already converts to/from separate objects for the serializable model, and I was continuing that practice, I'm not sure that processing the elements directly is really that much more work.)

@Elesbaan70 Elesbaan70 changed the title Untyped serialization for better cross-version compatibility New persistence strategy for better cross-version compatibility Apr 8, 2022
@Elesbaan70
Copy link
Contributor Author

If anyone wants to follow the progress in code, it is here: https://github.com/Elesbaan70/TMPE/tree/xml-persistence

@Elesbaan70
Copy link
Contributor Author

Yup, but do we want to require a manual install of Mono? I think Linux users would generally be more comfortable with this kind of requirement than on other platforms, but it still seems less than ideal.

@kianzarrin
Copy link
Collaborator

whats the point of using xml if we are going to compress it? its no longer human readable.

@Elesbaan70
Copy link
Contributor Author

Elesbaan70 commented May 8, 2022

  1. The primary purposes of this project are a persistence strategy less prone to breaking when enhanced, and smoother versioning when a breaking change is necessary. To the extent that there is human readability, it is an added bonus.
  2. It is human-readable in logging
  3. The compression can be disabled for debugging
  4. Human readability in the final savegame is only theoretical anyway, since it's a binary format. I mean, you can see the XML in there, but it gets weird reading it that way.
  5. XML and other text-based formats are easier to develop and maintain because they represent data as a hierarchy of key-value pairs.* It's more than just human readability.

* XML does more than that, but that remains the essence of what XML serialization is doing.

@kianzarrin
Copy link
Collaborator

@Elesbaan70 we don't need to compress the global config though.

@Elesbaan70
Copy link
Contributor Author

I haven't even touched the global config yet.

@Elesbaan70
Copy link
Contributor Author

I mean, we're talking about global config, so it's not part of this anyway, right?

@kianzarrin
Copy link
Collaborator

yeah

@krzychu124
Copy link
Member

@Elesbaan70 we don't need to compress the global config though.

I think you didn't notice why compression might be required. The reason is simple. We still save data as byte[] in the savegame and there is a limit (~16MB IIRC) for a value.
Global config is stored on disk and has known size, there is no dynamic data which can vary between versions unlike TM:PE data where almost all data is stored as collections (mostly limited to things we changed skipping default values to save memory)

@Elesbaan70
Copy link
Contributor Author

JunctionRestrictionsManager has been updated to use the new persistence framework. @krzychu124, I think you will find this to be a much simpler and more straightforward case than traffic lights! 😅

https://github.com/Elesbaan70/TMPE/blob/xml-persistence/TLM/TLM/Manager/Impl/JunctionRestrictionsManager.Persistence.cs

It uses a super-lightweight serializer I built on top of Linq to XML, that lets you serialize stable types but doesn't assume you've built a serializable model of your entire dataset. It's not intended to do that kind of heavy lifting, so it's stupid-simple and fits well in how Linq to XML does things.

@Elesbaan70
Copy link
Contributor Author

Keeping the compression is definitely the right move. Looks like for 2 timed traffic lights and junction restrictions on 4 nodes, compression saves us about 8k. That's enough to make one's head spin.

image

@krzychu124
Copy link
Member

@Elesbaan70 in case you don't know, the problem with compression on Linux won't just fail or throw an exception...
It will crash the game if you try to use it and there is no mono (required for some distros to work properly). 😬

With regards to size, I think we will/could follow the idea of splitting "managers" into separate containers, so theoretically the limit of 16MB would be per manager, not whole data set.

Anyways code for Junction Restriction is much more readable and understandable for me, compared to ugly from the Traffic Lights 😅

@Elesbaan70
Copy link
Contributor Author

compression on Linux won't just fail or throw an exception... It will crash the game if you try to use it and there is no mono (required for some distros to work properly). 😬

Yikes. So....

  1. How do I detect whether mono is installed?
  2. How do I detect whether it is a distro for which mono is required?

I think we will/could follow the idea of splitting "managers" into separate containers

Okay, that's an easy enough change.

Anyways code for Junction Restriction is much more readable and understandable for me, compared to ugly from the Traffic Lights 😅

Yes! And yet, I suspect you will find it easiest to use the half-baked model approach I used in TTL. A full model like I created in the junction restrictions might require too much change to the manager's implementation. I chose Junction Restrictions for the full model example because I identified it as the easiest place to make a change of that kind.

When I say "half-baked model," I mean that TTL implements a set of interfaces for its model, and the save code writes XML from those interfaces. But from the TTL implementation's point of view, that is a one-way trip. You can't hand a model to the TTL to set up traffic lights; the load code has to convert the incoming XML data into method calls that TTL understands for setting up the traffic lights.

@Elesbaan70
Copy link
Contributor Author

I think we will/could follow the idea of splitting "managers" into separate containers

This change is now complete.

@kianzarrin
Copy link
Collaborator

@Elesbaan70 what does that mean? did you move lane arrow flags inside lane arrow manager?

@kianzarrin
Copy link
Collaborator

to detect if you are in linux you can check UnityEngine.Application.platform == LinuxPlayer

@Elesbaan70
Copy link
Contributor Author

@Elesbaan70 what does that mean? did you move lane arrow flags inside lane arrow manager?

Huh? This issue is about how we store things in a savegame.

to detect if you are in linux you can check UnityEngine.Application.platform == LinuxPlayer

Cool, now how do I know if mono is installed? Or better yet, how do I detect the condition that causes the game crash?

I will not knowingly merge code that causes someone's game to crash. A workaround that can be implemented in code is absolutely necessary.

@kianzarrin
Copy link
Collaborator

@Elesbaan70

I think we will/could follow the idea of splitting "managers" into separate containers
This change is now complete.

what change is complete? what does it mean to separate containers?

@kianzarrin
Copy link
Collaborator

Cool, now how do I know if mono is installed? Or better yet, how do I detect the condition that causes the game crash?

you can look at original harmonys code. it has checks for mono version and stuff. which version of mono crashes? which version of mono works? are we talking bout dnSpy mono?

@Elesbaan70
Copy link
Contributor Author

Elesbaan70 commented May 21, 2022

@Elesbaan70

I think we will/could follow the idea of splitting "managers" into separate containers
This change is now complete.

what change is complete? what does it mean to separate containers?

The change history is on the draft PR associated with this case.

The code for this issue isolates the logic for each manager in such a way that failing to load one manager's data does not prevent the others from loading. (That's one of the problems with the current code.) From my point of view, this was just a logic problem, but @krzychu124 understood it to mean that I was actually putting each manager's data in a separate container under SerializableData. But that was not initially the case. They were just separate elements in a single XML document.

But then @krzychu124 pointed out that it may be useful to use separate containers, since the container has a size limit. So now I've changed it to work the way he expected.

@Elesbaan70
Copy link
Contributor Author

Cool, now how do I know if mono is installed? Or better yet, how do I detect the condition that causes the game crash?

you can look at original harmonys code. it has checks for mono version and stuff. which version of mono crashes? which version of mono works? are we talking bout dnSpy mono?

On some Linux distros, the Mono bundled with the game does a hard crash if you try to use compression. Installing Mono independently of the game resolves this.

@krzychu124
Copy link
Member

Exactly, more info about crashing mono and solution you can find here: MacSergey/NodeMarkup#96 (already mentioned in this thread)
Honestly I have no idea how to detect if it's installed or not. It's not C# library but external one (let's name it: "native").

Since we have limited size to be stored, @Elesbaan70 have you measure worst case scenario, how much data we really need to store - assuming we use container per manager? We usually use arrays of known size so we could calculate if generated xml will fit without compression or not

@kianzarrin
Copy link
Collaborator

we can ask in the harmony discord about this. they probably know something.

@Elesbaan70
Copy link
Contributor Author

Since we have limited size to be stored, @Elesbaan70 have you measure worst case scenario, how much data we really need to store - assuming we use container per manager?

Two-step light on a 4-way node without complex lane groupings: 2309 bytes, or about 7,000 total stoplights of this configuration.

The more I think about this, the more I realize it's all or nothing. Otherwise we end up with savegames that some people can't read.

Here's a possible ugly way of addressing the issue:

  1. Immediately before doing a compression operation, write something into the global options that says we're doing compression stuff.
  2. Clear it in a finally block.
  3. Check this flag at startup, and if it is set, explain to the user what happened and provide a link to the installer they need.

@Elesbaan70
Copy link
Contributor Author

This code is basically ready to go. But the Linux compression issue needs to be worked out.

@Elesbaan70
Copy link
Contributor Author

I am reluctant to rely on an external library for a core feature. Also, a third-party compression library will be managed code, and so it will be slower, and compression is the kind of thing where that matters.

So, what I'm looking at for the compression issue is:

  1. Run some tests to verify zlib's compatibility with .net's built-in compression.
  2. Add an option that let's you pick native (.net) or managed (zlib) compression, with managed being the default on Linux, and native the default on all other platforms.
  3. Test that both work and are fully compatible with each other.
  4. Have someone verify that zilb works on Linux.

@kianzarrin
Copy link
Collaborator

its not like we have a better option.

@Elesbaan70
Copy link
Contributor Author

Elesbaan70 commented May 31, 2022

@krzychu124 finally got around to mentioning that there's a compression library distributed with CS. Wish I'd known that earlier! LOL

@Elesbaan70
Copy link
Contributor Author

Here is the wiki page. I'll improve this based on feedback.

Loading and Saving Data

@krzychu124
Copy link
Member

@krzychu124 finally got around to mentioning that there's a compression library distributed with CS. Wish I'd known that earlier! LOL

I thought you've tried "obvious" solution and it hadn't worked😂 I haven't tried using it for other things than collecting files from disk to create a zip file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A new distinct feature lifecycle related to the loading process like enabling/disabling/loading/reloading/load order/hot reload serialization load/store data in memory or on disk triage Awaiting issue categorisation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants