Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relative URI $refs returning "unknown url type" #313

Closed
gkholman opened this issue Dec 12, 2016 · 22 comments
Closed

Relative URI $refs returning "unknown url type" #313

gkholman opened this issue Dec 12, 2016 · 22 comments
Labels
Invalid Not a bug, PEBKAC, or an unsupported setup

Comments

@gkholman
Copy link

pythonProblem-gkholman-20161212-2130z.zip

Attached is a ZIP with a JSON instance and two JSON schemas. One uses absolute URIs for $refs (you'll have to edit them for your own environment), the other uses relative URIs. An example documenting the use of relative URIs is at https://code.tutsplus.com/tutorials/validating-data-with-json-schema-part-2--cms-25640

Using jsonschema-2.5.1-py2.7 the schema with relative URIs returns "unknown url type". The schema with absolute URIs runs successfully.

I hope this example helps. Thanks for creating this library!

. . . . . Ken

@derekwallace
Copy link

Hi Ken,
I have the exact same issue.
The docs hint that may be possible to pass in a custom RefResolver to handle this. but its beyond me to understand how to do this. Id need a simple working example.

What ive done is hack the code.
The error comes from this line in validators.py
urlopen(uri).read()

Because the uri is just "", the urlopen creates an exception.
(In Python3 it comes from from urllib.request import urlopen)

This is my hack to work around the issue. Ive just come up with this so no guarantee its correct or robust, but i can now validate a instance against a schema that references other schemas.

        import os
        if os.path.exists(uri):
           with open(uri, encoding='utf-8') as fh:
              result = json.load(fh)            
        else:
           result = json.loads(urlopen(uri).read().decode("utf-8"))

Derek

@gkholman
Copy link
Author

Thank you, Derek!

This patch worked for me after I removed the encoding= argument as I was getting the "'encoding' is an invalid keyword argument for this function" error. Perhaps because I'm running 2.7.

I appreciate the input! I hope a fix of some kind is folded into the package so that I don't have to ask my users to hack their implementation.

. . . . . . Ken

@Julian
Copy link
Member

Julian commented Dec 20, 2016

I'm not sure when I'll have a second to look at this still, would still love some attention from some people with deeper URI knowledge ( #274 is probably relevant) but this specific case is probably a bug in urlparse.urldefrag, and http://bugs.python.org/issue22852 is probably relevant.

E.g.:

>>>> urlparse.urldefrag("file:foo.json#foo")
('file:///foo.json', 'foo')

and

>>>> urlparse.urlunparse(urlparse.urlparse("file:foo.json"))
'file:///foo.json'

which most likely has to do with the fact that file:foo.json refs aren't really valid I'm pretty sure, and just plain "foo.json" has no real meaning after you've loaded the file, unless you specify a base URI (perhaps a file-based one). But yeah needs more investigation.

Comments welcome.

@gkholman
Copy link
Author

I agree "file:foo.json" may have no meaning ... I've never been able to get something like that to work.

However, I'm quite confident that without a protocol identifier (e.g. "file:", "http:", "mailto:", etc.) at the beginning of the URI, the URI is considered relative and thus is resolved relative to the URI of the resource that is doing the pointing. Certainly in my work in XML (I founded the XML Conformance Committee in 1997) that has been the case with URI resolution.

In my attached example you will find the URIs all have no protocol identifier so I am expecting the resolution to be relative to the resource doing the pointing.

Certainly the modification suggested by Derek is working for me (with my modification for my environment)! Perhaps that could be folded in to the code in the short term.

Thank you, very kindly, for having thought about this issue. While I personally think it very important, as it was for Derek, I'm confident others will as well.

@Julian
Copy link
Member

Julian commented Dec 20, 2016

Just to clarify -- I agree about being relative to the pointing thing, my point was that once you've loaded it out of a file, there is no more scheme, you've just got an in memory thing with no reference to what scheme it was referred to by, which is why being "relative" to that is slightly ill defined

@gkholman
Copy link
Author

Good point ... thanks for that clarification. Please forgive my oversight.

But isn't the "in memory thing" an amalgam of its parts? Is there utility in maintaining its fragmentation? Does referencing happen after the amalgam has been built?

Have my tests been too simple not to reveal a future problem?

I have been testing handcrafted segments that will mimic the referencing of a library of 2574 declarations. This library is referenced from 81 "base" schemas.

So, for me and my users, this referencing fragmentation facility is very important.

The paper that I'm writing is referenced from and described in this post: https://lists.oasis-open.org/archives/ubl-dev/201612/msg00001.html

I will soon be publishing the results and I need to point readers to a JSON validator that supports the fragmentation. For now I can work with Derek's hack during my development.

@derekwallace
Copy link

I backed out the hack i did and went with the workaround in this thread.
#274

I ended up with this code.

with open(sInstFile, encoding='utf-8') as fh:
dInst = json.load(fh)

with open(sSchemaFile, encoding='utf-8') as fh:
dSchema = json.load(fh)

2 lines to workaround the issue.

sSchemaDir = os.path.dirname(os.path.abspath(sSchemaFile))
oResolver = jsonschema.RefResolver(base_uri = 'file://' + sSchemaDir + '/', referrer = dSchema)

jsonschema.validate(dInst, dSchema, resolver = oResolver)

@gkholman
Copy link
Author

Thank you, very kindly! That works for me as well and now I can ship a validator that works with Julian's distribution.

I appreciate all of the input from both of you! Thank you. And happy holidays!

. . . . . . . Ken

@gkholman
Copy link
Author

In my work today I discovered the workaround breaks local relative references as in:

    "Invoice": {
        "description": "An invoice",
        "type": "array",
        "minItems": 1,
        "maxItems": 1,
        "items": { "$ref": "#/definitions/Invoice" }
    }
},
"definitions": {
 "Invoice":
 {
   "title": "Invoice",

There is an example at https://spacetelescope.github.io/understanding-json-schema/reference/combining.html#allof where the use of "#/definitions/address" is used to reference a local declaration.

I'll try to conceive a patch, but thought I would post this first in case anyone immediately sees what is needed.

@gkholman
Copy link
Author

This appears to support both internal and relative URIs:

schema = json.loads(schemaHandle.read())
schemaAbs = 'file://' + os.path.abspath(schemaName)

class fixResolver( jsonschema.RefResolver):
    def __init__( self ):
      jsonschema.RefResolver.__init__( self,
                                       base_uri = schemaAbs, 
                                       referrer = None )
      self.store[ schemaAbs ] = schema

newResolver = fixResolver()

jsonschema.validate( instance, schema, resolver=newResolver )

I'm not sure if there is a better patch or not because my tests are quite limited so far. I'll continue with my JSON project and if I find the above doesn't work for me in all my personal situations, I will post accordingly.

I hope this input is considered helpful.

@gkholman
Copy link
Author

gkholman commented Jan 5, 2017

Here is another puzzler that I can't (yet) wrap my head around:

This is the program running under Mac/BSD:

~/z/data/kendata/dev/ubl/json/raw-20170104-2350z/val $ python jsonvalidate.py ../json-schema/maindoc/UBL-TransportationStatus-2.1.json ../json/MyTransportationStatus.json 
Validation successful
~/z/data/kendata/dev/ubl/json/raw-20170104-2350z/val $ 

This is the program running under VirtualBox/Windows10 from the very same shared directory:

z:\data\kendata\dev\ubl\json\raw-20170104-2350z\val>python jsonvalidate.py ..\json-schema\maindoc\UBL-TransportationStatus-2.1.json ..\json\MyTransportationStatus.json
Resolution error: <urlopen error [Error 2] The system cannot find the file specified: u'\\..\\common\\UBL-CommonBasicComponents-2.1.json'>

z:\data\kendata\dev\ubl\json\raw-20170104-2350z\val>

Because it is a shared directory, I know it is the same copy of the program running on the same data.

So I conclude the resolution error is coming from inside the library and not from my code. The leading "\" is suspect.

Can anyone think of why the resolver isn't working on Windows? Again, I'll look at it myself, but I'm hoping someone will have an "aha!" moment before I get around to fixing this on my own.

Thanks!

. . . . . . Ken

@gkholman
Copy link
Author

gkholman commented Jan 5, 2017

(Edited to expose the resolution_scope value)

Next piece of evidence on this issue is related to this code (my diagnostic print directives added):

def resolve(self, ref):
    print >>sys.stderr,"scope:*"+self.resolution_scope+"*"
    print >>sys.stderr,"ref:*"+ref+"*"
    url = self._urljoin_cache(self.resolution_scope, ref)
    print >>sys.stderr,"url:*"+url+"*"
    return url, self._remote_cache(url)

In Mac/BSD an ancestral relative reference is correctly being resolved:

scope:*file:///Users/admin/z/data/kendata/dev/ubl/json/raw-20170104-2350z/json-schema/maindoc/UBL-TransportationStatus-2.1.json*
ref:*#/definitions/TransportationStatus*
url:*file:///Users/admin/z/data/kendata/dev/ubl/json/raw-20170104-2350z/json-schema/maindoc/UBL-TransportationStatus-2.1.json#/definitions/TransportationStatus*
scope:*file:///Users/admin/z/data/kendata/dev/ubl/json/raw-20170104-2350z/json-schema/maindoc/UBL-TransportationStatus-2.1.json#/definitions/TransportationStatus*
ref:*../common/UBL-CommonBasicComponents-2.1.json#/definitions/CustomizationID*
url:*file:///Users/admin/z/data/kendata/dev/ubl/json/raw-20170104-2350z/json-schema/common/UBL-CommonBasicComponents-2.1.json#/definitions/CustomizationID*

In VirtualBox/Windows10 that same relative reference is not correctly being resolved:

scope:*file://z:\data\kendata\dev\ubl\json\raw-20170104-2350z\json-schema\maindoc\UBL-TransportationStatus-2.1.json*
ref:*#/definitions/TransportationStatus*
url:*file://z:\data\kendata\dev\ubl\json\raw-20170104-2350z\json-schema\maindoc\UBL-TransportationStatus-2.1.json#/definitions/TransportationStatus*
scope:*file://z:\data\kendata\dev\ubl\json\raw-20170104-2350z\json-schema\maindoc\UBL-TransportationStatus-2.1.json#/definitions/TransportationStatus*
ref:*../common/UBL-CommonBasicComponents-2.1.json#/definitions/CustomizationID*
url:*file://z:\data\kendata\dev\ubl\json\raw-20170104-2350z\json-schema\maindoc\UBL-TransportationStatus-2.1.json/../common/UBL-CommonBasicComponents-2.1.json#/definitions/CustomizationID*

Note how the "../" in VirtualBox/Windows10 is not being interpreted as "go up one directory from base directory" the way that it is correctly being interpreted in Mac/BSD.

@lamehost
Copy link

lamehost commented May 10, 2017

Hello,

With the attached patch, this code works for me (on linux) for local and remote references

schema_dir = os.path.dirname(os.path.abspath(schema))
resolver = RefResolver(base_uri = 'file://' + schema_dir + '/', referrer = schema)
validate(obj, schema, resolver = resolver)

patch.txt

xalperte pushed a commit to xalperte/jsonschema that referenced this issue May 21, 2017
@gamesbook
Copy link

Has patch from @xalperte been accepted into this project? I also need relative file references.

@Julian
Copy link
Member

Julian commented Sep 6, 2017

It hasn't, but I'm happy to merge it if someone puts together a PR with tests.

@pradeep-bose
Copy link

@Julian is this fix merged to 3.0.a4 beta release

@pradeep-bose
Copy link

Using the schema directory is a workaround , refering a schema network location would be fixed part of which issue

@Julian
Copy link
Member

Julian commented Jan 10, 2019

It isn't, no, it still needs a fix.

dhermes added a commit to dhermes/bezier that referenced this issue Jun 15, 2019
In the process:

- Added `validate_functional_test_cases` script that does JSON schema
  validation
- Updated foreign references to be to local files (not URLs)
- Fixed JSON bugs in two of the schema files
- Learned a hack (via [1]) for getting `jsonschema` to work correctly
  with `$ref` to a file

Doing this revealed that several of the 3 curve intersections and 46
surface intersections **do not** adhere to the schema.

[1]: python-jsonschema/jsonschema#313
@arturshark
Copy link

Hi everyone,
Little bit late to the party, but since issue is still open.

First of all, thank you for proposed workaround with custom resolver it helped a lot. As it was mentioned above, it doesn't work for Windows though. So, I made it work with next simple change, instead of providing base_url that starts with 'file://' I changed it to 'file:'.
Example:
oResolver = jsonschema.RefResolver(base_uri = 'file:' + sSchemaDir + '/', referrer = dSchema)

It works fine for me on Linux, MacOS, Windows.

@omniproc
Copy link

The mentioned workarounds don't work for me. The custom resolver seems to be ignored or at least on Windows I can't get it to work even with the mentioned fixes for Windows.

Currently this is where the issue with relative local references occures: https://github.com/Julian/jsonschema/blob/master/jsonschema/validators.py#L774

document = self.store[url] will throw an KeyError when trying to store anything not URI formated (e.g. relative local path). And that's it. Next it tries to resolve the relative local path using document = self.resolve_remote(url) which then fails with the unknown url type error we see.

@gkholman-setare
Copy link

Almost three years after reporting this problem I'm back to looking at this again and today I did a "pip install jsonschema" of the latest.

I'm continuing to get an "unknown url type" error with a relative URI:

    raise exceptions.RefResolutionError(exc)
jsonschema.exceptions.RefResolutionError: unknown url type: ../common/UBL-CommonBasicComponents-2.1.json

The project is documented here:
http://docs.oasis-open.org/ubl/UBL-2.1-JSON/v1.0/cnd02/UBL-2.1-JSON-v1.0-cnd02.html
The files are found here piecemeal:
http://docs.oasis-open.org/ubl/UBL-2.1-JSON/v1.0/cnd02/json-schema/maindoc/
The entire project can be downloaded and unzipped from:
http://docs.oasis-open.org/ubl/UBL-2.1-JSON/v1.0/cnd02/UBL-2.1-JSON-v1.0-cnd02.zip

The command I used today (in place of the commands found in the val/ subdirectory) from the json/ subdirectory is:

~/u/cd/artefacts/UBL-2.1-JSON-v1.0-cnd02/json $ jsonschema -i UBL-Invoice-2.1-Example.json ../json-schema/maindoc/UBL-Invoice-2.1.json 

Julian added a commit that referenced this issue Nov 29, 2019
0f344a69 Merge pull request #313 from leadpony/issue309
46c44747 Replace the control escape \\a with \\t
1ffe03e5 Merge pull request #312 from gregsdennis/master
de004798 better descripttions
eea7f249 arrays have characters too
7c02d06d added unevaluatedProperties test file; resolves #310
1899a5aa Merge pull request #308 from aznan2/master
4a5010b3 Update the version list.
37569b13 issue #307 - made test compatible with draft4
e3087307 issue #307 - removed issue reference from description
e13d3275 issue #307 - removed pound sign from description
a3b9f723 issue #307 - test that oneOf handles missing optional property

git-subtree-dir: json
git-subtree-split: 0f344a698f6657441adf4ebf4ceeacd596683422
@Julian Julian added the Bug Something doesn't work the way it should. label Mar 29, 2020
@Julian Julian added the Needs Simplification An issue which is in need of simplifying the example or issue being demonstrated for diagnosis. label Mar 29, 2020
@Julian
Copy link
Member

Julian commented Jul 31, 2022

Hi, sorry this took ages to look at. Both versions seem to work at least now here, though the relative version needs you specifying what base URI you mean to resolve against. E.g. you can use:

    jsonschema.validate(
        instance,
        schema,
        resolver=jsonschema.RefResolver(
            base_uri=f"{Path(__file__).parent.as_uri()}/",
            referrer=schema,
            ),
        )

if you mean to have references resolved relative to the parent of the jsonvalidate.py file.

Feel free to follow up (OP or anyone else) in a new issue if there are any further things which look like bugs (or gaps in the documentation).

@Julian Julian closed this as completed Jul 31, 2022
@Julian Julian added Invalid Not a bug, PEBKAC, or an unsupported setup and removed Bug Something doesn't work the way it should. Needs Simplification An issue which is in need of simplifying the example or issue being demonstrated for diagnosis. labels Jul 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Invalid Not a bug, PEBKAC, or an unsupported setup
Projects
None yet
Development

No branches or pull requests

9 participants