Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialization and Marshaling (again) #590

Closed
Affie opened this issue Aug 11, 2020 · 26 comments · Fixed by #962
Closed

Serialization and Marshaling (again) #590

Affie opened this issue Aug 11, 2020 · 26 comments · Fixed by #962

Comments

@Affie
Copy link
Member

Affie commented Aug 11, 2020

Just capturing ideas for discussion.

History

Another thing, we actively decided after a 2 year process not to define the "schema" for serialization. We originally had a protobuf packing approach which was hard schema type dependent and which supported forward-backward multilanguage compatibility. After much pain, we decided that top level packing MUST be JSON, and that the JSON string must at top level include the format in which the data is serialized. This is to mimic how the Internet has standardized it's data exchange format. So StructTypes.jl might be a real problem given the history on serialization.
So to stress, its probably not a good idea going down a path to unilaterally "change everything" on serialization.
In this case, it will be better to understand what the problems are with JSON.jl JSON2.jl JSON3.jl, because clearly they haven't figured it all out either. But we likely won't go down writing JSON4.jl either.
Also remember that the whole point of JSON is that serialization should be marginally or completely available outside Julia. The ZMQ interface for example is about multilanguage support and it uses the same serialization. The DB storage is definitely not a Julia implementation, later we will add a FuseDFG driver, etc, etc. A little while back we came up with the requirement that ZMQ messages should be exactly the same as FileDFG content. The schema discussion becomes a whole issue there. There are 100 different schema's available... So serialization only has 2nd best answers for everything. Whenever you pick the "best" answer for one requirement, then all others get real upset. So we have to accept a 2nd best solution for everybody and JSON is the best bet at this time.

So JSON

Options

JSON.jl + unmarshal.jl

...

JSON2.jl

Deprecated in favor of JSON3

JSON3.jl

See this for the 2 trait options: Struct() or Mutable()
In short Struct() is fast, but the order is important (So has to be preserved if created/modfied from outside).

Example:

julia> using JSON3
julia> using DistributedFactorGraphs
julia> using TimeZones
julia> using UUIDs
julia> bse = BlobStoreEntry(:a, uuid4(), :abc, "", "", "", "", ZonedDateTime("2020-08-11T14:32:27.000+02:00"))
BlobStoreEntry(:a, UUID("a533d151-764a-425d-9e57-a8f194577866"), :abc, "", "", "", "", ZonedDateTime(2020, 8, 11, 14, 32, 27, tz"UTC+02:00"))
julia> JSON3.StructTypes.StructType(::Type{BlobStoreEntry}) = JSON3.StructTypes.Struct()
julia> json3_str = JSON3.write(bse)
"{\"label\":\"a\",\"id\":\"a533d151-764a-425d-9e57-a8f194577866\",\"blobstore\":\"abc\",\"hash\":\"\",\"origin\":\"\",\"description\":\"\",\"mimeType\":\"\",\"createdTimestamp\":\"2020-08-11T14:32:27.000+02:00\"}"
julia> JSON3.read(json3_str, BlobStoreEntry)
BlobStoreEntry(:a, UUID("a533d151-764a-425d-9e57-a8f194577866"), :abc, "", "", "", "", ZonedDateTime(2020, 8, 11, 14, 32, 27, tz"UTC+02:00"))
@dehann
Copy link
Member

dehann commented Aug 11, 2020

To answer your earlier question, here is the first issue:

julia> bse = BlobStoreEntry(:a, uuid4(), :abc, "", "", "", "", ZonedDateTime("2020-08-11T14:32:27.12+02:00"))
ERROR: ArgumentError: Unable to parse string "2020-08-11T14:32:27.12+02:00" using format dateformat"yyyy-mm-ddTHH:MM:SS.ssszzz". Unable to parse date time. Expected directive DatePart(sss) at char 21
Stacktrace:
 [1] macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Dates/src/parse.jl:104 [inlined]
 [2] tryparsenext_core(::String, ::Int64, ::Int64, ::Dates.DateFormat{Symbol("yyyy-mm-ddTHH:MM:SS.ssszzz"),Tuple{Dates.DatePart{'y'},Dates.Delim{Char,1},Dates.DatePart{'m'},Dates.Delim{Char,1},Dates.DatePart{'d'},Dates.Delim{Char,1},Dates.DatePart{'H'},Dates.Delim{Char,1},Dates.DatePart{'M'},Dates.Delim{Char,1},Dates.DatePart{'S'},Dates.Delim{Char,1},Dates.DatePart{'s'},Dates.DatePart{'z'}}}, ::Bool) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Dates/src/parse.jl:38
 [3] macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Dates/src/parse.jl:150 [inlined]
 [4] tryparsenext_internal at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Dates/src/parse.jl:125 [inlined]
 [5] parse at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Dates/src/parse.jl:282 [inlined]
 [6] ZonedDateTime(::String, ::Dates.DateFormat{Symbol("yyyy-mm-ddTHH:MM:SS.ssszzz"),Tuple{Dates.DatePart{'y'},Dates.Delim{Char,1},Dates.DatePart{'m'},Dates.Delim{Char,1},Dates.DatePart{'d'},Dates.Delim{Char,1},Dates.DatePart{'H'},Dates.Delim{Char,1},Dates.DatePart{'M'},Dates.Delim{Char,1},Dates.DatePart{'S'},Dates.Delim{Char,1},Dates.DatePart{'s'},Dates.DatePart{'z'}}}) at /home/dehann/.julia/packages/TimeZones/v0mfN/src/parse.jl:88
 [7] ZonedDateTime(::String) at /home/dehann/.julia/packages/TimeZones/v0mfN/src/parse.jl:87
 [8] top-level scope at REPL[8]:1

EDIT: we can hodge podge this together using something similar to function in #589

@dehann
Copy link
Member

dehann commented Aug 11, 2020

From SLACK

  • Just for clarification - marshaling is converting Dicts to structs and back. Serialization is technically converting the dict to a string, at least the way we talk about it
  • we need to preserve the Dict <-> String behavior
  • the Dict must store as a field inside itself stating what the type of that "object" is. That was the conclusion previously

@Affie
Copy link
Member Author

Affie commented Aug 11, 2020

here is the first issue

That is a TimeZones.jl bug: JuliaTime/TimeZones.jl#83

This is how I temporary patched it in variable timestamp:
Dates.format(v.timestamp, "yyyy-mm-ddTHH:MM:SS.ssszzz")

But we need a more permanent solution...
EDIT: BTW, JSON3.jl uses "yyyy-mm-ddTHH:MM:SS.ssszzz" as format from what I can tell. (thats why it gets my vote)

@Affie Affie changed the title Serialization Serialization and Marshaling Aug 11, 2020
@dehann
Copy link
Member

dehann commented Aug 21, 2020

  • Top level must be JSON (like the Internet)
    • Must be usable outside Julia (multilang)
  • Order is important in dict (problem with JSON2 JSON3),
  • nothings and nulls must be allowed
  • save the type into the structure
    • specific field (TBD)
  • separate marshaling and serialization
    • struct -> dict -> string/binary
  • users should be able to pass packets along
    • not all users need to be able to unpack every packet
  • ROS and (Python + ZMQ) similarly GraffSDK are strong usage examples
  • Keep future FuseDFG driver in mind too (libfuse)

@dehann
Copy link
Member

dehann commented Aug 21, 2020

  • JSON2 and JSON3 has out of order methods, but slower
  • there are issues on checking order

@GearsAD
Copy link
Collaborator

GearsAD commented Aug 21, 2020

Technologies to consider:

These are two things we need to support in the future:

  • We want to serialize UTF8/Unicode, then we can serialize Distributions.jl directly (which we want to do)
  • We want to be able to override serialization types, so that we can natively serialize things like matrices. This will make factor serialization much simpler.
  • We need to support null (this is an omitted field)

@dehann
Copy link
Member

dehann commented Aug 21, 2020

JSON standard bearers

  • NewtonSoft C# opensource (Microsoft)
  • JavaScript is hard
  • Rust likely good

@dehann
Copy link
Member

dehann commented Aug 21, 2020

  • Build JSON object properly
    • Don't overdo nesting of JSON in JSON in JSON...
      • In FileDFG PPE packing as JSON2 string (approx)...
    • Missing a test to ensure Dict of Dict is clean (escapes properly).

@GearsAD
Copy link
Collaborator

GearsAD commented Aug 21, 2020

Javascript is the OG JSON - JSON stands for Java Script Object Notation. So they're arguably the gold standard.

@GearsAD
Copy link
Collaborator

GearsAD commented Aug 21, 2020

The three use-cases we need to address when we refactor serialization packing/unpacking are:

  • Persist to file and back
  • Decorate and persist to Neo4j and back (decorating is important)
  • Stream serialization and deserialization (we need to be able to handle a use-case like building a ROS driver)

@GearsAD
Copy link
Collaborator

GearsAD commented Aug 21, 2020

So next steps:

  • Build simple test cases for these situations (independent of DFG)

@GearsAD
Copy link
Collaborator

GearsAD commented Aug 21, 2020

On @Affie's point - we should updated FileDFG so that it writes independent elements to subfolders. It will then resemble the Neo4J structure. This minimizes the stringification of subelements, but it also allows us to easily make a Fuse driver.

@Affie
Copy link
Member Author

Affie commented Aug 21, 2020

EDIT DF, just to be clear we are avoiding Julia specific serialization at this time. see future suggestion here: #590 (comment)

just for completeness, here is how to save and load with julia's built in serializer.

julia> using Serialization
julia> serialize("dfgname", dfgtosave)
julia> loadeddfg = deserialize("dfgname")

@dehann
Copy link
Member

dehann commented Aug 22, 2020

DF EDIT, just to be clear the top level was decided to be JSON, but there is scope to have different internal content according to standardized _type / mimeType field that tells the user what is going on inside the "serialized object".


Hi, just to follow-up on earlier comments. I'm adding these references for context, we have been at this for a while: Existing multilang interfaces motivated JSON over:

g2o/proto/proto3/flat-buffers/LCM/ZCM/MsgPack/BSON/HDF5/JLD/JLD2/JuliaBinary/PySynchrony (ROS is similar to JSON)

in the first place. GraffSDK became the stake in the ground, so I'd like us to stick nearby this decision for a while.

Punchline is we are not going back to a world of more than one serialization standard. If something about JSON is slow in Julia here at 3Q20, then that is the way it is. Later, and with more resources, we can worry about ultra-uber-performance on the serialization, but for now we have a hard fought consensus with plain "Julia high-performance" JSON and that should be good enough. Versatility and commonality is much, much, much, more important at this stage; and we need to focus some on good sanitation/robustness of the serialized objects. Minor quality improvements is okay for the short term. What we need now is miles on the most recent code, so we can catalog failures, problems, issues, shortcomings, and unpleasantness: please comment them here in 590 and we can revisit in the medium term.

Also see:

cc @Affie , @jim-hill-r , @GearsAD

@dehann
Copy link
Member

dehann commented Sep 29, 2020

Also keep #668 in mind for future developments

@dehann dehann added this to the vX.0.0 milestone Sep 29, 2020
@dehann
Copy link
Member

dehann commented Nov 9, 2020

Given some new code for serializing distributions, I'm thinking we should do away with the DFG assumption that "Packed*Type" exists on the packed type name. Instead, each variable and factor packing and unpacking converter should deal with that, and defiitely store the packed and unpacked as mimeType information inside the serialized object. I.e. the a serialized object MUST have the

  • mimeType = application/json/PackedPose2Pose2, or perhaps
  • mimeType = application/jld2/PackedPose3 etc.

That makes the provider of PackedPose2Pose2 also responsible for having the pack and unpack convert functions. The unpack convert(<:InferenceType, ::PackedPose2Pose2)::Pose2Pose2 function is then responsible for "stripping away the Packed" portion of the name. Similarly when packing. This way DFG does not force the "Packed*Type" assumption down on users, but serialization will still look quite similar to how it does today -- just better.

Maybe worth doing this for both Variables and Factors, which means something about the abstraction should be perhaps FunctorInferencedType <: InferenceType and also InferenceVariable <: InferenceType (whatever the precise abstract names end up being).

@dehann
Copy link
Member

dehann commented Dec 1, 2020

Proposal for mimeType field in serialized factors or variables (improvement over assuming **Packed**MyFactor... ):
https://github.com/JuliaRobotics/Caesar.jl/blob/e6a7fe1340a3235509275427439a38bfabcfcbc9/src/images/Pose2AprilTag4Corners.jl#L180

In other words, mimeType = /application/JuliaLang/RoME.PackedPose2Pose2, or /application/json/Main.JSONPose2Pose2, or /application/JuliaLang/JLD2/RoME.Pose2Pose2, or whatever works. Also suggest making mimeType the first field in the packed type:
https://github.com/JuliaRobotics/Caesar.jl/blob/e6a7fe1340a3235509275427439a38bfabcfcbc9/src/images/Pose2AprilTag4Corners.jl#L153

cc @GearsAD , @Affie , @jim-hill-r

@dehann
Copy link
Member

dehann commented Jan 22, 2021

Can use https://github.com/JuliaIO/Tar.jl for multiplatform

@dehann dehann changed the title Serialization and Marshaling Serialization and Marshaling (again) Jan 22, 2021
@dehann
Copy link
Member

dehann commented Feb 10, 2021

Following fix to JuliaRobotics/IncrementalInference.jl#1155, there is a new type stability issue when using unpack functions. The trouble starts on this line:
https://github.com/JuliaRobotics/IncrementalInference.jl/blob/c6a845763448af40983755449e245d2a7e8b41dc/src/DispatchPackedConversions.jl#L44

With the non-concrete ::Vector{DFGVariable} constructor.

A difference in type-stability when building IIF.CCW objects can occur. When building FMd, not all Variable type information is properly included, see:
https://github.com/JuliaRobotics/IncrementalInference.jl/blob/c6a845763448af40983755449e245d2a7e8b41dc/src/FactorGraph.jl#L632

This results in subtle differences of type (and worse an unresolved type-instability any time we use de-serialized DFG objects!). The original dfg object has a hard type while constructing a new ccw object:

::FactorMetadata{ 
    Vector{DFGVariable{Pose3}}, 
    Vector{Symbol}, 
    Vector{Matrix{Float64}}, 
    Nothing
}

But at time of writing, tests in RoME:
https://github.com/JuliaRobotics/RoME.jl/blob/7fcc014d7100a1de7fa19bfd80d257fe9669574d/test/testpackingconverters.jl#L156

Results in:

::FactorMetadata{ 
    Vector{DFGVariable}, 
    Vector{Symbol}, 
    Vector{Matrix{Float64}}, 
    Nothing
}

notice non-concrete type Vector{DFGVariable}

@dehann
Copy link
Member

dehann commented Mar 18, 2021

Just linking Core.eval approach used at time of writing:

@dehann
Copy link
Member

dehann commented Jul 28, 2021

Just making sure it is captured... With DFG v0.15 we changed point data to be vnd.val::Vector{P} where P is something that Manifolds.jl can understand as point representations. How this is serialized should be reviewed and likely updated. This all goes in concert with how serialization of new AMP.ManifoldKernelDensity works, e.g.:
https://github.com/JuliaRobotics/IncrementalInference.jl/blob/dd3923bea2167ca3b5b35a506102fedc389bcc39/src/SerializationMKD.jl#L47

This new serialization is still being fleshed out, so stay tuned.

@dehann
Copy link
Member

dehann commented Aug 3, 2021

When unpacking factors and trying to get the FactorGradientsCache! (FGC) built on the first go, the following empty DFGVariable[] vector design assumption causes several headaches:
https://github.com/JuliaRobotics/IncrementalInference.jl/blob/db7ff84225cc848c325e57b5fb9d0d85cb6c79b8/src/DispatchPackedConversions.jl#L46

In utopia, deserialization would only build the DFGFactor once (since it is immutable). The current workaround will likely just make copies of the factor object (to change the type of FGC inside CCW, and then return the new copy as early as possible in the deserialization process.

@dehann
Copy link
Member

dehann commented Jan 31, 2022

One of the new tricks for unordered marshaling is to use keyword constructors on the packed type:

struct MyType
  a
  b
end

MyType(;a=1, b="hi") = MyType(a,b)

Base already does this in automated way, along with optional values:

Base.@kwdef struct MyType
  a::Int = 1
  b::String
end

mt = MyType(b="bye")

Furthermore, with JSON2's use of NamedTuple there is an even easier way to automate all this:

# NamedTuple gets created somewhere, likely a JSON package reading text
nt = (; a=2, b="again!")

mt = MyType(;nt...)

See #848, and JuliaRobotics/IncrementalInference.jl#1476 for early examples.

@dehann
Copy link
Member

dehann commented Mar 19, 2022

Alright, how's this for Unmarshaling with JSON?


Also:
https://github.com/lwabeke/Unmarshal.jl


EDIT, deleted the copy I have here, and just updating the single comment over by Unmarshal.jl 39. Perhaps the content comes back here if we choose to implement it only here rather than upstream.

xref

@GearsAD
Copy link
Collaborator

GearsAD commented Jan 22, 2023

After working on CaesarInternal, I believe I have a good handle on this. My recommendation:

  • Make a PackedVariable struct (same for Factor)
  • Use JSON3 to encode and decode this using either an ordered struct or a unordered struct

Notes:

  • JSON3 we need to omit empties, which will allow for Union{X, Nothing} optional fields.
  • We need to use kwdef structs.

Advantages:

  • Fast
  • Marshalling looks robust and works well(ish) with GQL
    Disadvantages:
  • No intermediate Dict types, it's Object->String and vice versa

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants