Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow EntityCache to function at any point in the transformer pipeline #32

Closed
pcantrell opened this issue Jan 5, 2016 · 12 comments · Fixed by #64
Closed

Allow EntityCache to function at any point in the transformer pipeline #32

pcantrell opened this issue Jan 5, 2016 · 12 comments · Fixed by #64

Comments

@pcantrell
Copy link
Member

EntityCache currently must store and return only the values at the very end of the transformer pipeline.

This works well when the end result of the pipeline is an easily serializable type such as text or JSON:

siesta pipeline 1

However, it’s problematic when the transform pipeline produces a model, which can be difficult for an EntityCache to store and retrieve:

siesta pipeline 2

Attempts to solve this problem by keeping models in a database quickly become problematic:

  • The database needs to be able to look up models by REST URL, which mixes abstraction layers and leads to hackish workarounds.
  • The EntityCache API is not designed to work with databases. (For example, it wants to pass cached entities across threads, which Realm doesn’t like.)

Suppose, however, that an EntityCache could insert itself at any point in the transformer pipeline and not just the end. In other words, this is currently the only supported structure:

→ JSON transformer → Model transformer → (EntityCache) →

…but suppose the pipeline instead supported this:

→ JSON transformer → (EntityCache) → Model transformer →

When there is a cache hit, the pipeline would pick up immediately after the cache's position in the pipeline. An EntityCache could then work only with its preferred data type:

siesta pipeline 3

…or even:

siesta pipeline 4

This would require some hard thought in the plumbing, but seems to make sense. @annicaburns, would this solve the problems you were having with EntityCache? Would you still want to use Realm even with a mechanism like this in place?

@annicaburns
Copy link

Thanks for giving this so much thought, Paul. What you are suggesting above looks like it would work well for us and would solve the one roadblock we hit attempting to use Siesta and Realm in an integrated way.

We would still want to use Realm because we need database functions beyond persistence. We need to query objects and relationships and perform other business logic (calculations, summaries, rollups) on our data while offline (disconnected from the server).

I'm really excited to hear you are considering making a change like this for v1. Many Thanks.

@pcantrell
Copy link
Member Author

Thanks for giving it a look, Annica. Since writing this up, I also started wondering if I shouldn’t just make all EntityCache implementations work at the beginning of the pipeline, i.e. NSData only. Your scenario answers that: yes, there is potential value to exposing models to caches too.

In that case, would the change I’m proposing here help you? Maybe. You could do something like this:

siesta pipeline

…where the Siesta → Realm flow is unidirectional: the Realm cache’s writeEntity updates Realm every time new data arrives, but its readEntity never returns anything. That means you don’t have to stuff URLs in the database. (There’s still the main thread problem, but that’s easily solved.)

In this scheme, Realm is only there for queries and rollups; the app doesn’t rely on Realm to make network requests work offline. When you make a request, Siesta first asks the Realm cache, but that one always says “Not it!” It then checks with the JSON cache, which might actually return something.

Does this make any sense? It seems like it would work if Realm is just a cache too, but it would fall apart if you’re trying to update Realm locally and then push those changes back to the server.


There’s a deeper problem here I’ve been trying to avoid: I don’t want to make Siesta try to handle bidirectional sync. That pushes too deep into API-specific behavior.

Right now, Siesta is strictly a cache, and the server remains the ultimate source of truth. It’s up to the app to manage updates: what they look like, when they happen, what to do with client-side state if they fail. That seems like the right line to draw. However, I think a lot of Siesta client projects (including yours) are going to push up against it, and I don’t have a clear answer. Thus my interest in your case.

@annicaburns
Copy link

I totally agree that Siesta should NOT try to be responsible for syncing data or state back up to the server. We were not expecting or even hoping for that - we are writing our own code manage that process. So I support the line you are drawing there.

However I do have concerns about the evolution you are proposing to your original suggestion. BUT... they may be specific to our requirements and beyond your scope. Because of the security requirements in our industry (Health Care) we cannot save any data to disk that is not encrypted. Any data we persist to disk will need to be inside an encrypted Realm instance so we wouldn't be able to make use of the File-based EntityCache you propose above (unless we can encrypt it somehow). I was depending on us being able to re-populate the EntityCache from the models in our encrypted Realm Instance - which means we would need to save the URL's in the database AND have a way to read the data back into the EntityCache from the data stored in Realm.

And now that I think about it... I guess this is a problem that existed in your original suggestion as well. Any caching we do will need to read/write to and from an encrypted store - which is one of the two reasons I was trying point your PersistantCache at Realm in the first place. The second reason, of course, is the additional processing we need to do on the data we plan to persist: object and relationship queries and rollups.

Hmmmm....

@pcantrell
Copy link
Member Author

I think the need for encrypted caching is widespread. Right now, of course, Siesta doesn’t even provide an implementation of EntityCache for you, so … you can write one that’s encrypted or not. Eventually (1.1?) I do think Siesta should provide default implementation(s) supporting both NSData and JSON input, and storing files both encrypted and unencrypted. For 1.0, though, I just want to stabilize the API.

To that end:

  1. I’m thinking to go ahead with the API changes necessary to support what described above, and
  2. I should also let caches specify their preferred GCD queue, which would solve your Realm problem.

@pcantrell
Copy link
Member Author

And an addendum to that: even though you’re not relying on Siesta to handle the sync problem (yay!), I’m very interested to know about any roadblocks you hit in the course of doing that that do fall squarely on Siesta’s side of the line.

@annicaburns
Copy link

That works for me, Paul. We will plan to write an encrypted implementation of EntityCache for now, and follow the roadmap you laid out above.... and wait patiently for v 1.0 with the solution to our Realm problem. Again... many thanks.

@jyounus
Copy link

jyounus commented Feb 27, 2016

Hey there, was wondering if there has been any progress with the Realm integration and if/when to expect to get a sneak peak? Thanks.

@pcantrell
Copy link
Member Author

@jyounus Well, I don’t have a paying client for Siesta improvements at the moment, so it’s (alas) a spare time project, taking a backseat to the teaching and other urgent things. But this feature is definitely high on my list! I will post to this issue when I have something testable.

@jyounus
Copy link

jyounus commented Feb 29, 2016

No worries, I was just wondering if this got anywhere. Appreciate all the work and efforts that have and are being put into this project! :)

@pcantrell
Copy link
Member Author

pcantrell commented Jun 3, 2016

After some head-scratching and light prototyping, I have a design proposal for this.

Background

The challenge here is that the transformer pipeline currently is just an unstructured (and uninspectable) array, and depends on users writing configuration so that transformers get added in the order they should run, with no opportunity to insert or replace specific ones. This design was already stretched a little thin, but this issue clearly breaks it: clients need to be able to say “add cache X at point Y in the pipeline” where Y is not necessarily the end.

We therefore need some way to say “point Y in the pipeline.” Because transformers are usually structs, using instance equality is not an option. Because they are often closures wrapped in a single ResponseContentTransformer type, using dynamic type information is not an option. There’s no natural way to say “the JSON transformer that’s in the pipeline.”

I originally considered letting clients specify an optional ID when adding a transformer. However, this turns out to be fairly awkward at the point of use, and is also brittle.

Proposal

Concept

The big API change is that the pipeline would have a sequence of identifiable named stages:

siesta pipeline

Clients can customize the set of stages and their order.

Each stage has zero or more transformers:

siesta pipeline 2

…and a cache:

siesta pipeline 3

During response processing, each cache receives an entity for writing after all the stage’s transformers have run.

When reinflating a new resource, Siesta checks each cache in turn starting with the end of the pipeline. If there is a cache hit, Siesta takes whatever the cache returned and runs it through the pipeline’s subsequent stages.

API

The opaque resourceTransformers property would go away. Instead, you would configure transformers by attaching them to a pipeline stage:

service.configure {
    $0.config.pipeline[.parsing].add(SwiftyJSONTransformer, contentTypes: ["*/json"])
    $0.config.pipeline[.cleanup].add(GithubErrorMessageExtractor())
}

The ability to identify transformers by stage creates the new ability to remove or replace specific transformers in the pipeline. This is frequently request feature for model transformers. To this end, Service.configureTransformer(…) will no longer always append the transformer; instead, it will by default replace the transformer at the model stage:

service.configureTransformer("/users/*") {
  User(json: $0.content)
}

// ...is shorthand for:

service.configureTransformer("/users/*", atStage: .model, replaceExisting: true) {
  User(json: $0.content)
}

// ...which is shorthand for:

service.configure("/users/*") {
    $0.config.pipeline[.model].removeTransformers()
    $0.config.pipeline[.model].add(
        ResponseContentTransformer(
            User(json: $0.content)))
}

@MPiccinato, I think this would nicely solve #47?

service.configureTransformer("/products/*") { Product(...) }
service.configureTransformer("/products/filters") { Filter(...) }  // overrides line above

Each stage has an optional cache:

service.configure {
    $0.config.pipeline[.rawData].cache = encryptedFileCache
    $0.config.pipeline[.model].cache = realmCache
}

Clients can create custom stages with an arbitrary order:

service.configure {
    $0.config.pipeline.order = [.munging, .twiddling, .blending, .baking]
}

My tentative list of default stages:

  • rawData: unprocessed data (to attach cache; wouldn’t typically have transformers at this stage)
  • decoding: bytes → bytes: decryption, decompression, etc.
  • parsing: bytes → ADT / general type: JSON, UIImage, etc.
  • model: general object → model
  • cleanup: catch-all for stuff at the end (Maybe “postprocessing” instead?)

Thanks everyone for your patience on this. I have a precious window of time right now to move fast on implementing this, if we can get to a satisfactory design quickly. Please send your reactions and nitpicks!

@pcantrell
Copy link
Member Author

I’ve pushed a draft implementation of this to the structured-pipeline branch. It’s a work in progress, but I think it’s testable if you want to experiment with it.

@Alex293
Copy link
Contributor

Alex293 commented Oct 7, 2016

Does anyone have made some kind of working entity cache with realm ? Or a realm extension ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants