Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Revised durability #219
This is a first pass at the work described in #198 .
There are two protocols in
The approach to serialization is documented quite a bit within the
The old functions from this
Looking at this, the current implementation needs to be aware of each type of node and serialize it, and attach additional information to the nodes like the expressions used to create them. This effectively makes all of our node types part of the serialization contract. This might be necessary, but has the downside of making sure we keep the compatible with what previous versions may have serialized with them.
Just for discussion -- either as part of this change or a future one -- should we consider keeping and serializing the beta graph structure defined by this schema, and using a Fressian (or other) serialization of that as our rule base? It keeps the node IDs and all expressions that need to be compiled, which we could cache at deserialization time as well. It seems like it might be simpler to keep and (de-)serialize that rather than plugging into the runtime network.
Perhaps there are some downsides to this, such as performance costs, that make it not worthwhile?
I agree with you that I think this is a better approach in the longer term if we are going to get to a place where serialization from one version of Clara can be deserialized in a different version. The more we can "delay the compilation" process the better off we'd be in that regard as long as the compilation "recipe" (the beta graph here) doesn't need to change in non-passive ways.
This would add added cost to the deserialization of the rulebase however. I think we'd need to
The main drivers behind these changes (from the perspective of how it is implemented currently and our real-world use-case) are for highly optimized deserialization times, with the serialization time a lesser concern. Obviously people's use-cases will vary and we should really strive to make all paths as quick as we can.
I think we also are going to run up to issues with maintaining a passivity on serialization of sessions that were based on the (local only right now) memory representations that we may also change from one version of Clara to another. Again, any sort of added logic we have to do during deserialization time can hurt performance.
Overall, I would like to just take this approach for the rulebase serialization with strong intention of changing it to something like the beta graph as you suggest. I actually went with this upon first pass because it was least invasive to the clara.rules.compiler and the clara.rules.engine. The implementation of the ISessionSerializer is not something that we'd expect to have many customized implementations of, since it will necessarily have a decent level of coupling to Clara's internal structures. So I thought we could evolve it a bit more before starting to claim that rulebase serialization can be passive across versions of Clara.
Let me know of any further concerns you have over this right now though. The feedback is certainly valuable.
I forgot to add that before I went with the Fressian based ISessionSerializer implementation, I had a mostly working branch that required no foreign dependencies and instead relied on
Also, any sort of manipulations done to
I did keep this stuff in a public gist in case you or anyone else was interested to see it (just for fun or learning). I will not claim that it 100% works with the latest version of
I do not plan to maintain this right now though, since like I said, it just wasn't performant enough for our practical use.
Fair enough. I don't mind moving forward with an experimental approach and refactoring it as we go.
As for performance, I'd imagine most use cases would involve (de-)serialization of sessions rather than putting rule bases in the critical path. I'm sure we can optimize our current compiler logic, but in the rule base scenario you'd have to go through the compilation/JIT/JVM hotspot optimization every time you deserialized no matter what.
I agree that the session deserialization is likely to be the most reasonable case to expect to be put on the critical performance path.
You have good points on the compilation of the rulebase issues. A lot of the Clara compiler time is really spent in the Clojure compiler, so there is only so far you can go with some of that (not sharing, trying to share compiled forms a bit more perhaps, etc).