Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle keys dropped from JSON-LD document during expansion (closes #50) #186

Closed
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,9 @@ lib/PyLD.egg-info
profiler
tests/test_caching.py
tests/data/test_caching.json

# Local Python version with `pyenv`
.python-version

# PyCharm & other JetBrains IDEs
.idea
11 changes: 11 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
[submodule "specifications/json-ld-api"]
path = specifications/json-ld-api
url = git@github.com:w3c/json-ld-api.git

[submodule "specifications/json-ld-framing"]
path = specifications/json-ld-framing
url = git@github.com:w3c/json-ld-framing.git

[submodule "specifications/rdf-canon"]
path = specifications/rdf-canon
url = git@github.com:w3c/rdf-canon.git
17 changes: 11 additions & 6 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -190,16 +190,21 @@ Tests
This library includes a sample testing utility which may be used to verify
that changes to the processor maintain the correct output.

To run the sample tests you will need to get the test suite files by cloning
the ``json-ld-api``, ``json-ld-framing``, and ``normalization`` repositories
hosted on GitHub:
To run the sample tests you will need the test suite files provided in the ``json-ld-api``,
``json-ld-framing``, and ``rdf-canon`` repositories hosted on GitHub:

- https://github.com/w3c/json-ld-api
- https://github.com/w3c/json-ld-framing
- https://github.com/json-ld/normalization
- https://github.com/w3c/rdf-canon

If the suites repositories are available as sibling directories of the PyLD
source directory, then all the tests can be run with the following:
They are included beneath ``specifications`` directory of this repository as Git submodules. By default, ``git clone`` does
not retrieve submodules; to download them, please issue the following command:

.. code-block:: bash

git submodule update --init --recursive

If the suites repositories are available then all the tests can be run with the following:

.. code-block:: bash

Expand Down
30 changes: 30 additions & 0 deletions docs/decisions/on-key-dropped-argument.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
---
$id: on-key-dropped-argument
title: Pass on_key_dropped as a named argument to expand()
date: 2023-11-12
author: anatoly-scherbakov
issue: 50
adr:is-blocked-by: on-key-dropped-handler
---

# Pass `on_key_dropped` as a named argument to `expand()`

## Context

We need to pass the value of `on_key_dropped` handler to `jsonld.expand()` somehow.

### :x: Use `options` dictionary

That dictionary contradicts Python conventions.

### :heavy_check_mark: Add a named argument

That's what a Python developer would expect, in most cases.

## Decision

Add as a named argument.

## Consequences

Improve developer experience, even though it is a bit inconsistent.
69 changes: 69 additions & 0 deletions docs/decisions/on-key-dropped.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
---
$id: on-key-dropped-handler
title: Call a customizable handler when a key is ignored during JSON-LD expansion
author: anatoly-scherbakov
date: 2023-11-12
issue: 50
---

# Call a customizable handler when a key is ignored during JSON-LD expansion

## Context

If a key in a JSON-LD document does not map to an absolute IRI then it is ignored. This situation might be valuable to debugging, and silent ignoring is not too good.

Essentially, we need to provide the developer with means to react to each of ignored keys. How?

### Use cases

* Simplify debugging of JSON-LD files,
* Alert about ignored keys in IDEs/editors/linters,
* …

### :x: Raise an `Exception`

```mermaid
graph LR
subgraph any ["JSON-LD documents in the wild"]
arbitrary("Extra key<br>in a JSON-LD document") --implies--> invalid("The document<br>is now invalid")
end

raise{"Raise an <code>Exception</code>"} --> each("On each ignored key") --> any -->
impractical("That is<br>impractical") --> failure("Failure")
```

### :x: Just log every key

Does not support <strong>Alert about ignored keys in IDEs/editors/linters</strong> use case.

### :x: Export the set of ignored keys as part of `expand()` return value

```mermaid
graph LR
when{"When to export?"} --always--> waste("Waste RAM") --> failure("Failure")
when --"only when requested"-->typing("Change return value type<br>based on imputs") --contradicting--> typing-system("Python typing system") --> failure
```

### :x: Export the set of ignored keys in a mutable argument to `expand()`

The author of this document

* believes this approach contradicts Python conventions and practice,
* does not know of any popular Python libraries using such an approach,
* is certain that developers will not praise this API.

### :heavy_check_mark: Call a handler on each ignored key

* This will enable the developer to process each ignored key as they see fit,
* is a common practice (see `map` function, for instance).

Let's call the handler `on_key_ignored`.


## Decision

Pass a `callable` named `on_key_ignored` to `jsonld.expand(…)`.

## Consequences

Simplify debugging and permit custom handling of ignored keys in application code.
5 changes: 5 additions & 0 deletions docs/decisions/resolution.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
adr:resolution:
- decision: on-key-dropped-argument
by: anatoly-scherbakov
date: 2023-11-12
status: accepted
70 changes: 56 additions & 14 deletions lib/pyld/jsonld.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
.. moduleauthor:: Gregg Kellogg <gregg@greggkellogg.net>
"""

import logging
import copy
import hashlib
import json
Expand All @@ -22,6 +23,8 @@
import traceback
import warnings
import uuid
from typing import Optional, Callable

from .context_resolver import ContextResolver
from c14n.Canonicalize import canonicalize
from cachetools import LRUCache
Expand All @@ -32,6 +35,8 @@
from frozendict import frozendict
from pyld.__about__ import (__copyright__, __license__, __version__)

logger = logging.getLogger('pyld.jsonld')

def cmp(a, b):
return (a > b) - (a < b)

Expand Down Expand Up @@ -117,6 +122,19 @@ def cmp(a, b):
# Initial contexts, defined on first access
INITIAL_CONTEXTS = {}


# Handler to call if a key was dropped during expansion
OnKeyDropped = Callable[[Optional[str]], ...]


def log_on_key_dropped(key: Optional[str]):
"""Default behavior on ignored JSON-LD keys is to log them."""
logger.debug(
'Key `%s` was not mapped to an absolute IRI and was ignored.',
key,
)


def compact(input_, ctx, options=None):
"""
Performs JSON-LD compaction.
Expand All @@ -142,7 +160,11 @@ def compact(input_, ctx, options=None):
return JsonLdProcessor().compact(input_, ctx, options)


def expand(input_, options=None):
def expand(
input_,
options=None,
on_key_dropped: OnKeyDropped = log_on_key_dropped,
):
"""
Performs JSON-LD expansion.

Expand All @@ -157,10 +179,17 @@ def expand(input_, options=None):
defaults to 'json-ld-1.1'.
[documentLoader(url, options)] the document loader
(default: _default_document_loader).
:param [on_key_dropped] Callable to invoke for every JSON-LD key that was
ignored.

:return: the expanded JSON-LD output.
"""
return JsonLdProcessor().expand(input_, options)
return JsonLdProcessor(
on_key_dropped=on_key_dropped,
).expand(
input_=input_,
options=options,
)


def flatten(input_, ctx=None, options=None):
Expand Down Expand Up @@ -645,18 +674,23 @@ def unparse_url(parsed):
return rval


class JsonLdProcessor(object):
class JsonLdProcessor:
"""
A JSON-LD processor.
"""

def __init__(self):
def __init__(self, on_key_dropped: OnKeyDropped = log_on_key_dropped):
"""
Initialize the JSON-LD processor.

:param [on_key_dropped] Callable to invoke for every JSON-LD key that
was ignored.
"""
# processor-specific RDF parsers
self.rdf_parsers = None

self.on_key_dropped = on_key_dropped

def compact(self, input_, ctx, options):
"""
Performs JSON-LD compaction.
Expand Down Expand Up @@ -2191,10 +2225,15 @@ def _compact(self, active_ctx, active_property, element, options):
return element

def _expand(
self, active_ctx, active_property, element, options,
inside_list=False,
inside_index=False,
type_scoped_ctx=None):
self,
active_ctx,
active_property,
element,
options,
inside_list=False,
inside_index=False,
type_scoped_ctx=None,
):
"""
Recursively expands an element using the given context. Any context in
the element will be removed. All context URLs must have been retrieved
Expand Down Expand Up @@ -2234,7 +2273,8 @@ def _expand(
active_ctx, active_property, e, options,
inside_list=inside_list,
inside_index=inside_index,
type_scoped_ctx=type_scoped_ctx)
type_scoped_ctx=type_scoped_ctx,
)
if inside_list and _is_array(e):
e = {'@list': e}
# drop None values
Expand Down Expand Up @@ -2460,10 +2500,11 @@ def _expand_object(
active_ctx, key, vocab=True)

# drop non-absolute IRI keys that aren't keywords
if (expanded_property is None or
not (
_is_absolute_iri(expanded_property) or
_is_keyword(expanded_property))):
if expanded_property is None or (
not _is_absolute_iri(expanded_property)
and not _is_keyword(expanded_property)
):
self.on_key_dropped(expanded_property)
continue

if _is_keyword(expanded_property):
Expand Down Expand Up @@ -3411,7 +3452,8 @@ def _expand_index_map(self, active_ctx, active_property, value, index_key, as_gr
JsonLdProcessor.arrayify(v),
options,
inside_list=False,
inside_index=True)
inside_index=True,
)

expanded_key = None
if property_index:
Expand Down
1 change: 1 addition & 0 deletions specifications/json-ld-api
Submodule json-ld-api added at 6bf9ef
1 change: 1 addition & 0 deletions specifications/json-ld-framing
Submodule json-ld-framing added at c01b17
1 change: 1 addition & 0 deletions specifications/rdf-canon
Submodule rdf-canon added at 0503fa
Loading