Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: small InputSource related issues #2255

Merged
merged 1 commit into from Mar 10, 2023

Conversation

aucampia
Copy link
Member

@aucampia aucampia commented Mar 8, 2023

Summary of changes

I have added a bunch of tests for InputSource handling, checking every kind of input source with most of the parsers. During this, I detected the following issues that I fixed:

  • rdflib.util._iri2uri() should not URL quote the netloc parameter, the idna encoding already takes care of special characters. I removed the URL quoting of netloc
  • HexTuple parsing was handling the input source in a way that would only work for some input sources, and not raising errors for other input sources. I changed the input source handling to be more generic.
  • rdflib.parser.create_input_source() incorrectly uses file.buffer instead of source.buffer when dealing with IO stream sources.

Other changes with no runtime impact include:

  • Changed the HTTP mocking stuff in test slightly to accommodate serving arbitrary files, as I used this in the InputSource tests.
  • Don't use google in tests as we keep getting
    urllib.error.HTTPError: HTTP Error 429: Too Many Requests
    from it.

Checklist

  • Checked that there aren't other open pull requests for
    the same change.
  • Added tests for any changes that have a runtime impact.
  • Checked that all tests and type checking passes.
  • Considered granting push permissions to the PR branch,
    so maintainers can fix minor issues and keep your PR up to date.

@aucampia aucampia force-pushed the iwana-20230307T1924-fix_iri2uri branch from 372a8fc to 0fd7567 Compare March 8, 2023 20:42
@coveralls
Copy link

coveralls commented Mar 8, 2023

Coverage Status

Coverage: 90.771%. Remained the same when pulling ec2e5c6 on aucampia:iwana-20230307T1924-fix_iri2uri into a146e0a on RDFLib:main.

@aucampia aucampia force-pushed the iwana-20230307T1924-fix_iri2uri branch from 0fd7567 to dbd7cb0 Compare March 8, 2023 22:08
@aucampia aucampia marked this pull request as ready for review March 8, 2023 22:13
@aucampia
Copy link
Member Author

aucampia commented Mar 8, 2023

I did this while looking at #1844 - but this does not quite address anything there. I plan to make a separate PR with some security documentation and warnings, and maybe an example of how to use python auditing and urllib.request.install_opener to mitigate the issue. I'm going to consider that as closing the matter.

@aucampia aucampia requested a review from a team March 8, 2023 22:16
@aucampia aucampia force-pushed the iwana-20230307T1924-fix_iri2uri branch 2 times, most recently from 5584d45 to 415669e Compare March 10, 2023 19:39
@aucampia
Copy link
Member Author

Reduced the size of changes a bit, will merge with no review as this is pretty well tested code and the non-test changes are very minor.

@aucampia aucampia force-pushed the iwana-20230307T1924-fix_iri2uri branch 2 times, most recently from a9a2173 to 9060b7c Compare March 10, 2023 22:59
I have added a bunch of tests for `InputSource` handling, checking
every kind of input source with every parser. During this, I detected
the following issues that I fixed:

- `rdflib.util._iri2uri()` was URL quoting the `netloc` parameter, but
  this is wrong and the `idna` encoding already takes care of special
  characters. I removed the URL quoting of `netloc`.

- HexTuple parsing was handling the input source in a way that would
  only work for some input sources, and not raising errors for other
  input sources. I changed the input source handling to be more generic.

- `rdflib.parser.create_input_source()` incorrectly used `file.buffer`
  instead of `source.buffer` when dealing with IO stream sources.

Other changes with no runtime impact include:

- Changed the HTTP mocking stuff in test slightly to accommodate
  serving arbitrary files, as I used this in the `InputSource` tests.
- Don't use google in tests as we keep getting
  `urllib.error.HTTPError: HTTP Error 429: Too Many Requests`
  from it.
@aucampia aucampia force-pushed the iwana-20230307T1924-fix_iri2uri branch from 9060b7c to ec2e5c6 Compare March 10, 2023 23:01
@aucampia aucampia merged commit 2b98507 into RDFLib:main Mar 10, 2023
@aucampia aucampia deleted the iwana-20230307T1924-fix_iri2uri branch April 9, 2023 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants