Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON-LD schema.org parsing fails with JSONDecodeError("Expecting value", s, err.value) from None #1781

Closed
danbri opened this issue Mar 30, 2022 · 3 comments
Labels
format: JSON-LD Related to JSON-LD format.

Comments

@danbri
Copy link

danbri commented Mar 30, 2022

I have been trying different variations to parse a JSON-LD file into a Graph, but they're all failing.

The file seems OK (I tried several) and it parses ok with the JSON-LD playground. I tried a few variations for invoking the parser.

This was after entirely nuking and reinstalling Python/Anaconda, and was in a fresh Conda environment (python=3.8), and with only "pip3 install rdflib", i.e. no ageing version of the plugin version of the parser hanging around.

parsejsonld_A.py

#!/usr/bin/env python3
from rdflib import Graph
if __name__ == '__main__':
    fn = "example1.jsonld"
    g = Graph()
    g.parse(fn, format="json-ld")

parsejsonld_B.py

#!/usr/bin/env python3

from rdflib import Graph
g = Graph().parse("example1.jsonld", format="json-ld")
g.serialize("test-jsonld.nt", format="nt")

parsejsonld_A.py

#!/usr/bin/env python3
from rdflib import Graph
g = Graph()
g.parse(location = "file:feedkgx/example1.jsonld")
print(len(g))

The example file is just taken from Google documentation, see this Gist.

In each case I get this response:

./parsejsonld_A.py

Traceback (most recent call last):
  File "./parsejsonld_A.py", line 8, in <module>
    g.parse(fn, format="json-ld")
  File "/opt/anaconda3/envs/feedkgx/lib/python3.8/site-packages/rdflib/graph.py", line 1258, in parse
    parser.parse(source, self, **args)  # type: ignore[call-arg]
  File "/opt/anaconda3/envs/feedkgx/lib/python3.8/site-packages/rdflib/plugins/parsers/jsonld.py", line 125, in parse
    to_rdf(data, conj_sink, base, context_data, version, generalized_rdf)
  File "/opt/anaconda3/envs/feedkgx/lib/python3.8/site-packages/rdflib/plugins/parsers/jsonld.py", line 144, in to_rdf
    return parser.parse(data, context, dataset)
  File "/opt/anaconda3/envs/feedkgx/lib/python3.8/site-packages/rdflib/plugins/parsers/jsonld.py", line 164, in parse
    context.load(local_context, context.base)
  File "/opt/anaconda3/envs/feedkgx/lib/python3.8/site-packages/rdflib/plugins/shared/jsonld/context.py", line 357, in load
    self._prep_sources(base, source, sources, referenced_contexts)
  File "/opt/anaconda3/envs/feedkgx/lib/python3.8/site-packages/rdflib/plugins/shared/jsonld/context.py", line 381, in _prep_sources
    new_ctx = self._fetch_context(
  File "/opt/anaconda3/envs/feedkgx/lib/python3.8/site-packages/rdflib/plugins/shared/jsonld/context.py", line 413, in _fetch_context
    source = source_to_json(source_url)
  File "/opt/anaconda3/envs/feedkgx/lib/python3.8/site-packages/rdflib/plugins/shared/jsonld/util.py", line 43, in source_to_json
    return json.load(use_stream)
  File "/opt/anaconda3/envs/feedkgx/lib/python3.8/json/__init__.py", line 293, in load
    return loads(fp.read(),
  File "/opt/anaconda3/envs/feedkgx/lib/python3.8/json/__init__.py", line 357, in loads
    return _default_decoder.decode(s)
  File "/opt/anaconda3/envs/feedkgx/lib/python3.8/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/opt/anaconda3/envs/feedkgx/lib/python3.8/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 2 column 1 (char 1)

As far as I can tell this is this something to do with the Schema.org @context URL, and our migration from http://schema.org/ + conneg, to https://schema.org/ and a JSON-LD 1.1-style HTTP header as the discovery mechanism for the context? But the error message is pretty uninformative.

If I change the schema.org context in the files to avoid a remote context, it parses.

The context lives here:

curl -s --head https://schema.org/ | grep 'link:'
link: </docs/jsonldcontext.jsonld>; rel="alternate"; type="application/ld+json"

Would a PR be welcomed on this?

e.g.

  • clearer error message
  • treat context as if it were written like this:
    "@context": { "@vocab": "https://schema.org/" },
  • or actually fetch the context doc via the link: header mechanism, as JSON-LD playground seems to do.

Related discussion: schemaorg/schemaorg#2578

@ghost
Copy link

ghost commented Mar 31, 2022

Looks like RDFLib users have encountered this issue before, according to #1423 (comment) and a fix was committed just a couple of weeks ago in #1436 which should actually fetch the context doc via the link.

I just checked this using the latest master branch and got:

def test_jsonld_conneg():
    g = Graph().parse(location="https://gist.githubusercontent.com/danbri/0cc3fc147d6d34945d0f61dcc11bc409/raw/0aa0d1a7574495a8fe7f1297121afe921b048a8f/gistfile1.txt", format="json-ld")
    assert len(g) == 35

So, if that's actually testing your issue (not necessarily the case, given the conneg implications) then please check with the current master branch (all tests passing as of 13 hrs ago at time of response).

@RichardWallis
Copy link
Contributor

Locally checked fix to identified problem in latest master branch and all seems OK.

Presume this will be in 6.2.x when it is released.

@aucampia aucampia added the format: JSON-LD Related to JSON-LD format. label May 17, 2022
@aucampia
Copy link
Member

Closing this as 6.2.0 has been released, please re-open if the issue persists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
format: JSON-LD Related to JSON-LD format.
Projects
None yet
Development

No branches or pull requests

3 participants