-
-
Notifications
You must be signed in to change notification settings - Fork 93
Save and read from new serialization where possible #3566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Maybe it'll be clearer to me after more reading, but does this prepare us for a case where the shape of or maybe we don't anticipate those types to change? if we're relying on that somehow, let's add a note at least. |
Yeah, I should add some notes and some tests around this. Currently I have no idea what happens if we change the Op or Toplevel or Expr types. In the OCaml serializers, we had to be very careful, and were basically only able to add extra variants to existing sumtypes. And so we had a few checks:
I think I can get a similar level of confidence with the following:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, apart from the comment I made earlier:
Let's say we want to add some property to ProgramTypes.Op.CreateDB
in a few months. I think right now with JSON, we wouldn't have any issue deserializing old data into the new type. Does that hold true with MessagePack? I did some quick searches of binary serializers and msgpack specifically, (just a minute) and didn't see an obvious conclusion.
Edit, reading, just saw your response
That sounds like a good plan |
Right now we don't use JSON for serialization - we use the bin_io serializers. We also can't use JSON as it's much too slow. The existing bin_io serializers that we use also have this problem. The solution we use is to never add properties to I don't know if that holds true for MessagePack, but I'd be very surprised if it doesn't. Certainly I will be testing. This makes me realize I should also add exception handling during the new binary serialization, and add a fallback to use the old serializers. The current version of the C# serializer supports a string key serializer, which is a bit more resilient to versioning changes. It's the default right now, but I think that we would be better with very explicit changes and tests rather than relying on this behaviour. It's also 2x slower with this and generates data 2x bigger, so I'd prefer to change away from this now. |
Remaining TODO list:
|
|
27a6431
to
efb2c24
Compare
Changes since last review (starts at 6bebf89):
|
@StachuDotNet Ready for another review. |
let serializeOplistToJson (oplist : PT.Oplist) : string = | ||
MessagePack.MessagePackSerializer.SerializeToJson(oplist, optionsWithZip) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we mark this fn as test-only in some way? i.e. Do we anticipate using this outside of tests at all in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll make a "Test" module within the file for this (and we can use that pattern elsewhere).
/// The test values below are used to check the exact output of test file. So we need | ||
/// the test inputs to be consistent, which is why we never use `gid ()` below, or | ||
/// FSharpToExpr functions. | ||
let testExpr = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How'd you get to this expression - extract it from a live canvas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I aimed to write one that hit every type of the variant.
[ hop handler ], | ||
PT.Toplevel.TLHandler handler, | ||
Canvas.NotDeleted) | ||
(f.tlid, [ PT.SetFunction f ], PT.Toplevel.TLFunction f, Canvas.NotDeleted) ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find myself asking "... what's 'f' here?". Really easy to lose context in a PR review. Could we rename?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't disagree, but I just hit this from a module rename and didn't touch it otherwise. Would rather not increase scope on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking really great! Code is reading much nicer now, generally.
I like the way stability of the binary serialization is ensured. I wonder if there's a nice way to ensure that the test in BinarySerialization.test.fs
contains all available Ops (i.e. walk the tree and error if any ops aren't included) or if that's worth exploring.
enable compression for oplist serialization. Since there's a lot of duplicate here (eg, if you press two keys you'll get two nearly identical SetHandler ops), this reduces stored size by about 15x.
awesome.
Just a few pending comments/questions, but this looks good to me!
Going to do a roundtrip test of all existing canvases to be sure I didn't fuck this up, then merge. |
These are not used by anything, and are not created by the frontend in practice. There are definitely some of these values in the DB -- during the migration they'll be removed, and will not throw errors.
Co-authored-by: Stachu Korick <hello@stachu.net>
Co-authored-by: Stachu Korick <hello@stachu.net>
Co-authored-by: Stachu Korick <hello@stachu.net>
1c05d1a
to
6d46a1f
Compare
module Test = | ||
let serializeOplistToJson (oplist : PT.Oplist) : string = | ||
MessagePack.MessagePackSerializer.SerializeToJson(oplist, optionsWithZip) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like this pattern!
The migration will be run by the ocaml machines on startup, so it has to be idempotent. As it was, it would fail if the columns had already been added (and of course we are adding the columns by hand before deployment, so they will be there). It is now idempotent. Tested by hand by running the migration, seeing it fail, then fixing it and running it again and seeing it go through without incident.
Does most of #3540.
Since serialization with the legacy server is so slow, both BwdServer and ApiServer give bad experiences. Using the editor is a very bad experience for even medium-sized canvases, as the API call to
initial_load
makes one call to the legacy_server for each toplevel (including deleted toplevels). Calling the BwdServer is also bad, as it needs to load every function and database regardless of whether the handler uses them.The slow part is not actually calling the server or deserializing the binary, but rather decoding the JSON that the server returns. Encoding the JSON isn't super great either but decoding it is extremely bad.
Instead, this PR lets us save handlers using MessagePack.CSharp binary serialization format. This is not as fast as the OCaml binary serializer, but it's much much much much better than using the legacy_server.
This works by adding new columns on the toplevel_oplists table (which is where the data is currently stored). In addition to the two rows where the ocaml-serialized data is stored (data and rendered_oplist_cache), we add two more for the fsharp-serialized data (oplist and oplist_cache).
When saving data in F#, we save all 4 fields. When saving data in OCaml, we save the 2 ocaml-serialized fields, and set the other two to NULL.
When reading data in F#, we read all 4 fields and pick appropriately: if the new ones are NULL we'll deserialize via the legacy server). When reading in OCaml, we only use the two old columns.
In this way, F# writes are still readable from OCaml. This is important because the bwdservers and queueservers are still ocaml.
OCaml writes are also still readable from F#, because they'll delete (overwrite with null) old fsharp-serialized values, and F# will read the new values (deserializing using the legacy server).
I also want to automatically upgrade values on read. However, I'm going to leave that to a future PR as that is a little more sketch and I'd like to get this merged without increasing scope.