Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relative URIs are resolved incorrectly after redirects #130

Closed
ghost opened this issue Feb 20, 2012 · 2 comments
Closed

Relative URIs are resolved incorrectly after redirects #130

ghost opened this issue Feb 20, 2012 · 2 comments
Labels
bug Something isn't working

Comments

@ghost
Copy link

ghost commented Feb 20, 2012

vfaronov, 2010-09-10T23:23:28.000Z

What steps will reproduce the problem?

  1. Prepare a resource http://example.org/foo serving up an RDF description that contains relative URIs, for example <#frag1>.
  2. Prepare a resource http://example.org/bar that redirects (for example, HTTP 301) to http://example.org/foo.
  3. Use RDFLib's Graph.parse() to parse http://example.org/bar.

What is the expected output? What do you see instead?

I expect the "real" URI http://example.org/foo to be used as the base URI, giving absolute URIs of the form http://example.org/foo#frag1. Instead, RDFLib uses the original requested URI http://example.org/bar as the base, giving http://example.org/bar#frag1.

What version of the product are you using? On what operating system?

RDFLib trunk (r1895) on GNU/Linux.

Please provide any additional information below.

RFC 3986 Uniform Resource Identifier (URI): Generic Syntax
http://tools.ietf.org/html/rfc3986#section-5.1.3

"Note that if the retrieval was the result of a redirected request, the last URI used (i.e., the URI that resulted in the actual retrieval of the representation) is the base URI."

Comment 1 by vfaronov

For a working example, see

 <http://linked-data.ru/example>

which 301s to (RDFa).

<http://linked-data.ru/example/>

Comment 2 by vfaronov

First attempt at a patch.
This changes the base URI resolution logic a bit, and I'm not 100% sure it doesn't break anything.

Index: rdflib/parser.py
===================================================================
--- rdflib/parser.py    (revision 1895)
+++ rdflib/parser.py    (working copy)
@@ -94,9 +94,11 @@
         except HTTPError, e:
             # TODO:
             raise Exception('"%s" while trying to open "%s"' % (e, self.url))
+        self.url = file.geturl()    # in case redirections took place
         self.content_type = file.info().get('content-type')
         self.content_type = self.content_type.split(";", 1)[0]
         self.setByteStream(file)
+        self.setPublicId(self.url)
         # TODO: self.setEncoding(encoding)

     def __repr__(self):
@@ -147,6 +149,8 @@
             else:
                 raise Exception("Unexpected type '%s' for source '%s'" % (type(source), source))

+    absolute_location = None
+    
     if location is not None:
         base = urljoin("file:", "%s/" % pathname2url(os.getcwd()))
         absolute_location = URIRef(location, base=base).defrag()
@@ -155,7 +159,6 @@
             file = __builtin__.file(filename, "rb")
         else:
             input_source = URLInputSource(absolute_location, format)
-        publicID = publicID or absolute_location

     if file is not None:
         input_source = FileInputSource(file)
@@ -168,13 +171,11 @@
     if input_source is None:
         raise Exception("could not create InputSource")
     else:
-        if publicID:
+        if publicID is not None:
             input_source.setPublicId(publicID)
-
-        # TODO: what motivated this bit?
-        id = input_source.getPublicId()
-        if id is None:
-            input_source.setPublicId("")
+        elif input_source.getPublicId() is None:
+            input_source.setPublicId(absolute_location or "")
+        
         return input_source
@ghost
Copy link
Author

ghost commented Feb 20, 2012

Currently working:

>>> graph.parse(location="http://linked-data.ru/example")
<Graph identifier=xNAYeDDA0 (<class 'rdflib.graph.Graph'>)>

@ghost ghost closed this as completed Feb 20, 2012
@ghost ghost reopened this Feb 20, 2012
@ghost ghost self-assigned this Feb 20, 2012
@ghost
Copy link
Author

ghost commented Feb 20, 2012

Test created, fix applied, test passes.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

0 participants