New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revised durability #219

Merged
merged 1 commit into from Sep 12, 2016

Conversation

Projects
None yet
3 participants
@mrrodriguez
Collaborator

mrrodriguez commented Aug 30, 2016

This is a first pass at the work described in #198 .

There are two protocols in clara.rules.durability

  • ISessionSerializer
    • Responsible for serializing and deserializing the session (and rulebase) structure of the rule sessions
    • This one will likely rarely, if ever, need to be customized by a consumer. It has to be tightly coupled to the implementation of a rule session memory, which should generally be advised against
    • Clara is going to provide a default implementation via Fressian in clara.rules.durability.fressian namespace
      • This namespace requires a dependency on org.clojure/data.fressian see https://github.com/clojure/data.fressian
      • This is an "optional" dependency and not included in Clara's explicit dependencies (it's only a :dev dependency for testing).
  • IWorkingMemorySerializer
    • Responsible for serializing and deserializing the "consumer-defined" facts that are contained within working memory
    • This one is expected to be implemented by consumers in a way that makes the most sense for serializing their domain model objects
    • We could provide some helpful defaults here in the future (I hope to have some additions here soon).
      • Many of the serialization functions exposed in clara.rules.durability can be useful concepts or helpful to implement the IWorkingMemorySerializer. The Fressian handlers defined in clara.rules.durability.fressian have been written to be able to serialize/deserialize most Clojure persistent data structures, metadata, and preserve record identity relationships among each other or where they are used in different aggregates. This is useful to implementations of IWorkingMemorySerializer as well, if Fressian is in consideration.

The approach to serialization is documented quite a bit within the clara.rules.durability and clara.rules.durability.fressian namespaces.

The old functions from this clara.rules.durability namespace have been removed since they were not actively maintained and were out of sync with the evolution of Clara's memory and the engine over time. The motivations behind this new approach to durability can be found in #198 .

As the clara.rules.durability namespace doc string states, this namespace should be considered experimental still. The API itself is subject to change and even more importantly, rule sessions or rulebases serialized in one version of Clara are not guaranteed to be able to be deserialized in another version.

@mrrodriguez mrrodriguez referenced this pull request Aug 30, 2016

Closed

Durability Revisited #198

@mrrodriguez mrrodriguez force-pushed the mrrodriguez:dur198pr1 branch from 2d316f9 to 7d24ea5 Aug 31, 2016

@mrrodriguez

This comment has been minimized.

Collaborator

mrrodriguez commented Aug 31, 2016

I re-pushed these changes since I didn't realize I broke my tests in a last minute change yesterday. The tests should be passing now.

@rbrush

This comment has been minimized.

Collaborator

rbrush commented Aug 31, 2016

Looking at this, the current implementation needs to be aware of each type of node and serialize it, and attach additional information to the nodes like the expressions used to create them. This effectively makes all of our node types part of the serialization contract. This might be necessary, but has the downside of making sure we keep the compatible with what previous versions may have serialized with them.

Just for discussion -- either as part of this change or a future one -- should we consider keeping and serializing the beta graph structure defined by this schema, and using a Fressian (or other) serialization of that as our rule base? It keeps the node IDs and all expressions that need to be compiled, which we could cache at deserialization time as well. It seems like it might be simpler to keep and (de-)serialize that rather than plugging into the runtime network.

Perhaps there are some downsides to this, such as performance costs, that make it not worthwhile?

children
join-bindings
(:new-bindings beta-node))
{:accum-expr accum-expr})))

This comment has been minimized.

@rbrush

rbrush Aug 31, 2016

Collaborator

Should we make these items fields on the nodes themselves rather than using metadata? It seems like metadata just obscures things from someone who might be wanting to inspect or debug the nodes themselves, as opposed to just having it on the record.

This comment has been minimized.

@mrrodriguez

mrrodriguez Aug 31, 2016

Collaborator

I think this relates to #219 (comment)

I do think these should be made into first-class attributes of something. I think the metadata is sort of a "hack".

I'll echo the statement I made there

I actually went with this upon first pass because it was least invasive to the clara.rules.compiler and the clara.rules.engine.

The metadata was a minimal addition to get this off the ground. To think about this though, if we did want a non-metadata approach, would putting it on the node really be the correct place? Perhaps that would be useful for debugging (I think I would have had some value from that in the past). However, if we targeted the beta graph for serialization instead, then this wouldn't necessarily be node attributes.

So again, I just went with this as a "getting the implementation off the ground" approach. If you have strong reservations against that though, we can explore our alternatives.

This comment has been minimized.

@rbrush

rbrush Aug 31, 2016

Collaborator

Ah, gotcha. Agreed that if we took the path of serializing the beta graph, we could (and should) remove this from the nodes outright. We can revisit this after considering the beta graph approach, if necessary.

@holder))

(def rulebase-data (java.io.File/createTempFile "rulebase" ".dat"))
(def session-data (java.io.File/createTempFile "session" ".dat"))

This comment has been minimized.

@rbrush

rbrush Aug 31, 2016

Collaborator

Seems like these could break interactive re-runs of tests if changes are being made in a REPL-connected environment. Perhaps we should use ByteArrayOupuStream (and coerce the bytes into an input stream) in the test functions instead?

This comment has been minimized.

@mrrodriguez

mrrodriguez Aug 31, 2016

Collaborator

Good point. I can completely remove these and just have local bindings on ByteArrayInputStream and ByteArrayOutputStream's actually. I'll update this PR with those changes shortly.

@mrrodriguez

This comment has been minimized.

Collaborator

mrrodriguez commented Aug 31, 2016

Looking at this, the current implementation needs to be aware of each type of node and serialize it, and attach additional information to the nodes like the expressions used to create them. This effectively makes all of our node types part of the serialization contract. This might be necessary, but has the downside of making sure we keep the compatible with what previous versions may have serialized with them.

Just for discussion -- either as part of this change or a future one -- should we consider keeping and serializing the beta graph structure defined by this schema, and using a Fressian (or other) serialization of that as our rule base? It keeps the node IDs and all expressions that need to be compiled, which we could cache at deserialization time as well. It seems like it might be simpler to keep and (de-)serialize that rather than plugging into the runtime network.

Perhaps there are some downsides to this, such as performance costs, that make it not worthwhile?

I agree with you that I think this is a better approach in the longer term if we are going to get to a place where serialization from one version of Clara can be deserialized in a different version. The more we can "delay the compilation" process the better off we'd be in that regard as long as the compilation "recipe" (the beta graph here) doesn't need to change in non-passive ways.

This would add added cost to the deserialization of the rulebase however. I think we'd need to
work on tuning the clara.rules.compiler more if it became a more critical path. However, in our scenarios at least, we want to only serialize and deserialize the rulebase a single time and then just rapidly deserialize sessions that "attach" to that single rulebase later. In that sort of workflow, the deserialization of the rulebase execution time isn't as critical.

The main drivers behind these changes (from the perspective of how it is implemented currently and our real-world use-case) are for highly optimized deserialization times, with the serialization time a lesser concern. Obviously people's use-cases will vary and we should really strive to make all paths as quick as we can.

I think we also are going to run up to issues with maintaining a passivity on serialization of sessions that were based on the (local only right now) memory representations that we may also change from one version of Clara to another. Again, any sort of added logic we have to do during deserialization time can hurt performance.

Overall, I would like to just take this approach for the rulebase serialization with strong intention of changing it to something like the beta graph as you suggest. I actually went with this upon first pass because it was least invasive to the clara.rules.compiler and the clara.rules.engine. The implementation of the ISessionSerializer is not something that we'd expect to have many customized implementations of, since it will necessarily have a decent level of coupling to Clara's internal structures. So I thought we could evolve it a bit more before starting to claim that rulebase serialization can be passive across versions of Clara.

Let me know of any further concerns you have over this right now though. The feedback is certainly valuable.

@mrrodriguez

This comment has been minimized.

Collaborator

mrrodriguez commented Aug 31, 2016

I forgot to add that before I went with the Fressian based ISessionSerializer implementation, I had a mostly working branch that required no foreign dependencies and instead relied on print-method. I think this was interesting and more easily available to get started with, but in reality the performance characteristics in both time and space were just really bad and there was no real fix for them. I think print-method + the clojure.lang.LispReader are best kept to just sending "code as data" style data, not huge data structures - especially huge structures with many java.lang.Class style references and constructor calls.

Also, any sort of manipulations done to print-method cause chaos in a nREPL driven world where things like middleware are also trying to use the same multimethod to add features.

I did keep this stuff in a public gist in case you or anyone else was interested to see it (just for fun or learning). I will not claim that it 100% works with the latest version of clara.rules.durability though, but it is close.
https://gist.github.com/mrrodriguez/4c2b3a122cec8cb7327dd224198543ec

I do not plan to maintain this right now though, since like I said, it just wasn't performant enough for our practical use.

@mrrodriguez mrrodriguez force-pushed the mrrodriguez:dur198pr1 branch from 7d24ea5 to f7a0773 Aug 31, 2016

session-serializer
mem-serializer)

(let [rulebase-data (.toByteArray rulebase-baos)

This comment has been minimized.

@mrrodriguez

mrrodriguez Aug 31, 2016

Collaborator

@rbrush
This is the transition to byte array backed streams instead of temp files as you commented on previously.

@rbrush

This comment has been minimized.

Collaborator

rbrush commented Aug 31, 2016

Fair enough. I don't mind moving forward with an experimental approach and refactoring it as we go.

As for performance, I'd imagine most use cases would involve (de-)serialization of sessions rather than putting rule bases in the critical path. I'm sure we can optimize our current compiler logic, but in the rule base scenario you'd have to go through the compilation/JIT/JVM hotspot optimization every time you deserialized no matter what.

@mrrodriguez

This comment has been minimized.

Collaborator

mrrodriguez commented Sep 1, 2016

As for performance, I'd imagine most use cases would involve (de-)serialization of sessions rather than putting rule bases in the critical path. I'm sure we can optimize our current compiler logic, but in the rule base scenario you'd have to go through the compilation/JIT/JVM hotspot optimization every time you deserialized no matter what.

I agree that the session deserialization is likely to be the most reasonable case to expect to be put on the critical performance path.

You have good points on the compilation of the rulebase issues. A lot of the Clara compiler time is really spent in the Clojure compiler, so there is only so far you can go with some of that (not sharing, trying to share compiled forms a bit more perhaps, etc).

@rbrush rbrush merged commit b8f1280 into cerner:master Sep 12, 2016

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@WilliamParker WilliamParker added this to the 0.13.0-RC1 milestone Feb 10, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment