Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Result.serialize() path handling is broken for windows paths and some other cases #2067

Closed
aucampia opened this issue Jul 30, 2022 · 0 comments · Fixed by #2065
Closed

Result.serialize() path handling is broken for windows paths and some other cases #2067

aucampia opened this issue Jul 30, 2022 · 0 comments · Fixed by #2065
Assignees
Labels
bug Something isn't working core Relates to core functionality of RDFLib, i.e. `rdflib.{graph,store,term}`

Comments

@aucampia
Copy link
Member

I have been trying to figure out what is happening with these xfails:

if sys.platform == "win32":
xfails[("csv", DestinationType.STR_PATH, "utf-8")] = pytest.mark.xfail(
raises=FileNotFoundError,
reason="string path handling does not work on windows",
)
xfails[("csv", DestinationType.STR_PATH, "utf-16")] = pytest.mark.xfail(
raises=FileNotFoundError,
reason="string path handling does not work on windows",
)
xfails[("json", DestinationType.STR_PATH, "utf-8")] = pytest.mark.xfail(
raises=FileNotFoundError,
reason="string path handling does not work on windows",
)
xfails[("json", DestinationType.STR_PATH, "utf-16")] = pytest.mark.xfail(
raises=FileNotFoundError,
reason="string path handling does not work on windows",
)
xfails[("xml", DestinationType.STR_PATH, "utf-8")] = pytest.mark.xfail(
raises=FileNotFoundError,
reason="string path handling does not work on windows",
)
xfails[("xml", DestinationType.STR_PATH, "utf-16")] = pytest.mark.xfail(
raises=FileNotFoundError,
reason="string path handling does not work on windows",
)

The problem is with the approach to path handling:

rdflib/rdflib/query.py

Lines 268 to 279 in 1d5f3e7

location = cast(str, destination)
scheme, netloc, path, params, query, fragment = urlparse(location)
if netloc != "":
print(
"WARNING: not saving as location" + "is not a local file reference"
)
return None
fd, name = tempfile.mkstemp()
stream = os.fdopen(fd, "wb")
serializer.serialize(stream, encoding=encoding, **args)
stream.close()
shutil.move(name, path)

The problem with this approach is that file URIs and OS paths are quite different, for one, with windows OS paths, e.g. C:\Users\runneradmin\AppData\Local\Temp\pytest-of-unknown\pytest-0\test_select_result_serialize_p6\file-DestinationType.STR_PATH, the drive letter gets interpreted as the URL scheme:

$ python3 -c 'from urllib.parse import urlparse; print(urlparse(r"C:\Users\runneradmin\AppData\Local\Temp\pytest-of-unknown\pytest-0\test_select_result_serialize_p6\file-DestinationType.STR_PATH"))'
ParseResult(scheme='c', netloc='', path='\\Users\\runneradmin\\AppData\\Local\\Temp\\pytest-of-unknown\\pytest-0\\test_select_result_serialize_p6\\file-DestinationType.STR_PATH', params='', query='', fragment='')

Furthermore, URIs support percent encoding, while OS paths do not.

Here is an example of things going wrong (from here)

  ------------------------------ Captured log call ------------------------------
  2022-07-30T12:11:21.926 ERROR    root         test_result.py:317:test_select_result_serialize_parse destination = C:\Users\runneradmin\AppData\Local\Temp\pytest-of-unknown\pytest-0\test_select_result_serialize_p6\file-DestinationType.STR_PATH
  2022-07-30T12:11:21.926 ERROR    root         test_result.py:318:test_select_result_serialize_parse format = csv
  2022-07-30T12:11:21.926 ERROR    root         test_result.py:319:test_select_result_serialize_parse encoding = utf-16
  ___________ test_select_result_serialize_parse[csv-STR_PATH-utf-8] ____________
  Traceback (most recent call last):
    File "C:\hostedtoolcache\windows\Python\3.7.9\x64\lib\shutil.py", line 566, in move
      os.rename(src, real_dst)
  FileNotFoundError: [WinError 3] The system cannot find the path specified: 'C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\tmpgk0vyq6q' -> '\\Users\\runneradmin\\AppData\\Local\\Temp\\pytest-of-unknown\\pytest-0\\test_select_result_serialize_p7\\file-DestinationType.STR_PATH'
  
  During handling of the above exception, another exception occurred:
  
  Traceback (most recent call last):
    File "D:\a\rdflib\rdflib\test\test_sparql\test_result.py", line 323, in test_select_result_serialize_parse
      encoding=encoding,
    File "D:\a\rdflib\rdflib\rdflib\query.py", line 283, in serialize
      shutil.move(name, path)
    File "C:\hostedtoolcache\windows\Python\3.7.9\x64\lib\shutil.py", line 580, in move
      copy_function(src, real_dst)
    File "C:\hostedtoolcache\windows\Python\3.7.9\x64\lib\shutil.py", line 266, in copy2
      copyfile(src, dst, follow_symlinks=follow_symlinks)
    File "C:\hostedtoolcache\windows\Python\3.7.9\x64\lib\shutil.py", line 121, in copyfile
      with open(dst, 'wb') as fdst:
  FileNotFoundError: [Errno 2] No such file or directory: '\\Users\\runneradmin\\AppData\\Local\\Temp\\pytest-of-unknown\\pytest-0\\test_select_result_serialize_p7\\file-DestinationType.STR_PATH'

I think the best we can do to fix the path handling is to do the same as what happens in Graph.serialize

rdflib/rdflib/graph.py

Lines 1204 to 1218 in 1d5f3e7

if isinstance(destination, pathlib.PurePath):
location = str(destination)
else:
location = cast(str, destination)
scheme, netloc, path, params, _query, fragment = urlparse(location)
if netloc != "":
raise ValueError(
f"destination {destination} is not a local file reference"
)
fd, name = tempfile.mkstemp()
stream = os.fdopen(fd, "wb")
serializer.serialize(stream, base=base, encoding=encoding, **args)
stream.close()
dest = url2pathname(path) if scheme == "file" else location
shutil.move(name, dest)

This will fill relative path handling in some cases also, however it will break relative URI handling.

@aucampia aucampia added the bug Something isn't working label Jul 30, 2022
@aucampia aucampia changed the title Result.serialize() path handling is broken for windows paths Result.serialize() path handling is broken for windows paths and some other cases Jul 30, 2022
@aucampia aucampia self-assigned this Jul 30, 2022
@aucampia aucampia added the core Relates to core functionality of RDFLib, i.e. `rdflib.{graph,store,term}` label Jul 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working core Relates to core functionality of RDFLib, i.e. `rdflib.{graph,store,term}`
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant