Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow uncached context - tag: 'static' is hard coded in resolver #500

Closed
F-Node-Karlsruhe opened this issue Oct 18, 2022 · 15 comments
Closed

Comments

@F-Node-Karlsruhe
Copy link

F-Node-Karlsruhe commented Oct 18, 2022

For a db based document loader it is not possible to do changes to the context after it was first fetched, as it gets cached in the document loader.

The context resolver https://github.com/digitalbazaar/jsonld.js/blob/20e2286d198ce7d376320306f9df3667f48a4544/lib/ContextResolver.js has the tag flag hard coded and set to 'static' what disallows personal configuration (passing a tag has no effect)

this._cacheResolvedContext({key, resolved, tag: 'static'});

Disclaimer: We are aware that a context should not change, at least not frequently. That's why changing the context in the db is only allowed with strict constraints. Nevertheless one should give the user the option to not cache the context within the resolver.
Possible workaround: implement own document loader from scratch -> not preferred

Edit: this._cacheResolvedContext({key: url, resolved, tag: remoteDoc.tag}); exists, but has no effect

@dlongley
Copy link
Member

The line quoted (which I believe can be seen here: https://github.com/digitalbazaar/jsonld.js/blob/v8.1.0/lib/ContextResolver.js#L70-L76), appears to cache an inline context (not remote) context. The key used in the cache is a JSON stringification of the context object itself, meaning, it is a content-identifier -- and it would change if the context changed.

When a remote context (or really, just a context identified by a URL) is fetched, the code path goes through:

https://github.com/digitalbazaar/jsonld.js/blob/v8.1.0/lib/ContextResolver.js#L45-L51

and then:

https://github.com/digitalbazaar/jsonld.js/blob/v8.1.0/lib/ContextResolver.js#L127

Where the remote document that represents the context may optionally provide a custom tag:

this._cacheResolvedContext({key: url, resolved, tag: remoteDoc.tag});

In other words, if you use a URL to refer to a context (whether you ultimately load the context from the Web, a database, local memory, whatever source you want), then whatever the document loader specifies in the returned remote document will be used when caching. If a context is specified inline, i.e., no URL has been given by which to refer to it, then all we can do is stringify it and use that as the cache key. But, as noted above, if such a context is different in any way from a previously cached context, the cached value will not be used.

Perhaps it is the case that there is some combination of inline and remote contexts that are creating the problem you're seeing, where an inline context refers to a remote context that doesn't use the static tag -- and the cache code isn't smart enough to separate those to allow for changes to that inner context. But that's just a guess.

Could you provide a simple, runnable test case that would allow someone to help confirm and fix the issue?

@F-Node-Karlsruhe
Copy link
Author

F-Node-Karlsruhe commented Oct 18, 2022

Thanks for the fast reply :)

I dug a little deeper and found out that the documents do not actually get stored in this.sharedCache but rather in the this.perOpCachewhich is not configurable afaik. and gets called/set every request. So a change during operation has no effect even with a not 'static' tag because the resolved document is cached anyway.

Is there a way to avoid the perOpCache for certain documents or even disable it completely while using the sharedCache only? The trick would be to not provide a url but just the context itself?

No runnable example yet, but i will provide one if the issue turns out more difficult than what i just described.

Edit: the url which is used as a key in the cache does not get changed, just the content.

@dlongley
Copy link
Member

dlongley commented Oct 18, 2022

Is there a way to avoid the perOpCache for certain documents or even disable it completely while using the sharedCache only?

Hmm, I don't think so. The perOpCache ensures that there's a consistent view of every context during an operation. It would be problematic for contexts to be quantum values that could change each time it was accessed during the same isolated operation. It ensures that, for example, when running a JSON-LD compaction operation, the context values used in the initial and underlying expand transformation will be the same as those used in the following compact transformation -- if the same context URLs are referenced of course. Otherwise, you can get indeterminate, confusing, or unexpected results.

This is very similar to how databases work with properties like "read isolation", ensuring that there's local consistency and dependable invariants for a given query / operation / transaction.

@F-Node-Karlsruhe
Copy link
Author

True that. I guess i will need to write a custom workaround because shutting down the whole service in order to get rid of the perOpCache when a context gets altered is not a viable option :D

@dlongley
Copy link
Member

dlongley commented Oct 18, 2022

@F-Node-Karlsruhe,

What kind of problem is actually caused? It seems like you ought to treat an operation that started prior to the context being updated as one that will also finish before that update is "seen". In other words, in any partitioned system there are going to be these sorts of situations -- why is whatever outcome you're seeing being considered a bug / problem instead of expected behavior?

Once the operation completes and another is run after the context update, that operation sees the update, right?

@F-Node-Karlsruhe
Copy link
Author

F-Node-Karlsruhe commented Oct 18, 2022

When adding a property to a context in order to sign a verifiable credential with that context after having signed one before (context was loaded), the new value of the context is not found, even though it is present in the db and served from the respective context url. The signature library only uses the cached version of the context and naturally throws an error as the new context field is unknown. As far as i understand the perOpCache is kept in memory as long as the application runs (a bit confused what per op means). As an effect the only way to get rid of the perOpCache is to restart the service

Edit: this will only be the case in the beginning, when contexts are created, but we still want allow the users to manage their contexts on their own in a restricted way

@dlongley
Copy link
Member

dlongley commented Oct 18, 2022

As far as i understand the perOpCache is kept in memory as long as the application runs (a bit confused what per op means). As an effect the only way to get rid of the perOpCache is to restart the service.

By default, each time a JSON-LD operation is executed, a new ContextResolver instance is created that has a new perOpCache. After the operation completes, that instance is discarded. So it should be the case that the perOpCache isn't being reused across multiple operations -- if it is, that does sound like a bug that should be addressed.

Could it be some other cache (external to jsonld.js) that is getting in the way? IMO, what you're describing should work with jsonld.js without any changes.

@F-Node-Karlsruhe
Copy link
Author

F-Node-Karlsruhe commented Oct 18, 2022

That sounds reasonable. Otherwise perOpCache would have been just the same as the sharedCache

I'm using pretty much the example straight out of the box. It's unclear where exactly the ContextResolver gets invoked or if there are other instances/caches

Signing with digitalbazaar suite

// @ts-ignore
import { issue } from '@digitalbazaar/vc';
// @ts-ignore
import { Ed25519VerificationKey2020 } from '@digitalbazaar/ed25519-verification-key-2020';
// @ts-ignore
import { Ed25519Signature2020 } from '@digitalbazaar/ed25519-signature-2020';

async signVC(credential: any) {

        const keyPair = await Ed25519VerificationKey2020.from(this.secrets);

        let suite = new Ed25519Signature2020({
            verificationMethod: `${this.did}#${this.didDocument.verificationMethod[0].id}`,
            key: keyPair
        }
        );

        const signedVC = await issue({ credential, suite, documentLoader });

        return signedVC;
    }

Document Loader

const documentLoader: Promise<any> = jsonldSignatures.extendContextLoader(async (url: string) => {

    if (url.startsWith(BASE_CONTEXT_URL)) {

        const context: any = Contexts[url];

        if (context !== undefined) {
            return {
                contextUrl: null,
                documentUrl: url,
                document: context,
                tag: 'static'
            };
        }

        // Lookup custom client context in db -> this gets called every time, but still it obviously uses the cached version instead!
        const customContext: any = await MongoDbService.getContext(url.split(BASE_CONTEXT_URL)[1]);

        if (customContext !== undefined) {
            return {
                contextUrl: null,
                documentUrl: url,
                document: buildCredentialContextDocument(customContext),
                tag: undefined // funnily it caches anything that is not undefined
            };
        }
    }

    console.warn(`Fetched @context from ${url}. Use with care!`)

    const document = await fetch(url);

    return {
        contextUrl: null,
        documentUrl: url,
        document: await document.json(),
        tag: 'static'
    };
});

Error

JsonLdError [jsonld.ValidationError]: Safe mode validation error.
details: {
    event: {
      type: [ 'JsonLdEvent' ],
      code: 'invalid property',
      level: 'warning',
      message: 'Dropping property that did not expand into an absolute IRI or keyword.',
      details: { property: 'newProp', expandedProperty: 'newProp' }
    }
  }

Restarting the service resolves the issue

@F-Node-Karlsruhe
Copy link
Author

F-Node-Karlsruhe commented Oct 18, 2022

Good news. it appears not be a caching issue even though it actually has to be (?). I tracked the cache's content in the ContextResolver and both behave just as expected. Maybe there is another location where the context might be cached? Nevertheless, the issue remains and i will keep searching the root cause, why it cannot expand the new property in the context.

@F-Node-Karlsruhe
Copy link
Author

F-Node-Karlsruhe commented Oct 18, 2022

I found the issue! :) https://github.com/digitalbazaar/jsonld.js/blob/04cdf49b2ed02b4b787a3bdff63dd60118cf2c6c/lib/ContextResolver.js line 76

// context is an object, get/create `ResolvedContext` for it
      const key = JSON.stringify(ctx);
      let resolved = this._get(key);
      if(!resolved) {
        // create a new static `ResolvedContext` and cache it
        resolved = new ResolvedContext({document: ctx});
        this._cacheResolvedContext({key, resolved, tag: 'static'});
      }

When not in cache (_get returns undefined) it creates a new document and caches it with tag: 'static' for some reason. replacing 'static' with undefined solves my issue. But that ain't a fix. is the code faulty or is there a reason it caches static in this case?

Edit: that also explains why i did not find anything when investigating the cache. i only looked into the _get method which comes before the suspicious line

@dlongley
Copy link
Member

So it sounds like we might be back to where we started -- where my guess as to the problem was:

Perhaps it is the case that there is some combination of inline and remote contexts that are creating the problem you're seeing, where an inline context refers to a remote context that doesn't use the static tag -- and the cache code isn't smart enough to separate those to allow for changes to that inner context. But that's just a guess.

Can you confirm that the context that is causing trouble references other contexts in it by URL?

If so, ideally a simple test case can be constructed with an inline context that references another context by URL and a document loader that will load that context but change it the second time it is fetched. Then if two operations are run using the same inline context, the test should ensure that the output is different (in the appropriate way).

A simple fix would be to only treat inline contexts as static per operation (i.e., set tag to undefined), but, of course, that would disable shared cache reuse for inline contexts. A better fix would only mark such contexts static when they only had static dependencies (or no dependencies at all). I'm not sure how much benefit we're getting from treating inline contexts as always static right now -- but it could be that it is significant.

cc: @davidlehn

@F-Node-Karlsruhe
Copy link
Author

F-Node-Karlsruhe commented Oct 18, 2022

I can confirm that. It both is composed of an array where the first context is referenced by URL and it uses @import within the context, where the imported context is referenced by URL as well.

If so, ideally a simple test case can be constructed with an inline context that references another context by URL and a document loader that will load that context but change it the second time it is fetched. Then if two operations are run using the same inline context, the test should ensure that the output is different (in the appropriate way).

This precisely describes my current implementation in this case. I will check the fixes tomorrow in order to confirm the finding :)

Edit: i just realized that these fixes are rather on the library side. Any short term advice for fixes/improvements on my side?

@F-Node-Karlsruhe
Copy link
Author

I fixed it as you proposed by not giving an url and using the stringified version of the context in cache. Thank you :)

@F-Node-Karlsruhe
Copy link
Author

F-Node-Karlsruhe commented Oct 21, 2022

Bad news. Setting documentUrl: url to null or undefined did not fix the issue. I thought it was fixed because the line tag: 'static' was still set to undefined in my local jsonld library, my initial "fix"

I found the issue! :) https://github.com/digitalbazaar/jsonld.js/blob/04cdf49b2ed02b4b787a3bdff63dd60118cf2c6c/lib/ContextResolver.js line 76

// context is an object, get/create `ResolvedContext` for it
      const key = JSON.stringify(ctx);
      let resolved = this._get(key);
      if(!resolved) {
        // create a new static `ResolvedContext` and cache it
        resolved = new ResolvedContext({document: ctx});
        this._cacheResolvedContext({key, resolved, tag: 'static'});
      }

@F-Node-Karlsruhe
Copy link
Author

More specific issue #537

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants