Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: SPARQL XML result parsing #2044

Merged
merged 1 commit into from Jul 26, 2022
Merged

Conversation

aucampia
Copy link
Member

@aucampia aucampia commented Jul 21, 2022

Summary of changes

Fixed the following problems with the SPARQL XML result parsing:

  • Both the parse method of both the lxml and xml modules does not work
    work well with TextIO objects, the xml module works with TextIO
    objects if the XML encoding is utf-8, but not if it is utf-16, and
    with lxml parse fails for both utf-8 and utf-16. To fix this I
    changed the XML result parser to first convert TextIO to bytes and
    then feed the bytes to parse() using BytesIO.
  • The parser was operating on all elements inside results and
    result elements, even if those elements were not result and binding
    elements respectively. This was causing problems with lxml, as lxml
    also returns comments when iterating over elements. To fix this I added
    a check for the element tags so that only the correct elements are
    considered.

Other changes:

  • Added type hints to rdflib.plugins.sparql.results.xmlresults.
  • Run with lxml one some permutations in the test matrix.
  • Removed rdflib.compat.etree, as this was not very helpful for the
    SPARQL XML Result parser and it was not used elsewhere.
  • Added an lxml environment to tox which installs lxml and
    lxml-stubs.
  • Expanded SPARQL result testing by adding some additional
    parameters.

Related issues:

Checklist

  • Checked that there aren't other open pull requests for
    the same change.
  • Added tests for any changes that have a runtime impact.
  • Checked that all tests and type checking passes.
  • For changes that have a potential impact on users of this project:
    • Considered updating our changelog (CHANGELOG.md).
  • Considered granting push permissions to the PR branch,
    so maintainers can fix minor issues and keep your PR up to date.

Fixed the following problems with the SPARQL XML result parsing:

- Both the parse method of both the lxml and `xml` modules does not work
  work well with `TextIO` objects, the `xml` module works with `TextIO`
  objects if the XML encoding is `utf-8`, but not if it is `utf-16`, and
  with `lxml` parse fails for both `utf-8` and `utf-16`. To fix this I
  changed the XML result parser to first convert `TextIO` to `bytes` and
  then feed the `bytes` to `parse()` using `BytesIO`.
- The parser was operating on all elements inside `results` and
  `result` elements, even if those elements were not `result` and `binding`
  elements respectively. This was causing problems with `lxml`, as `lxml`
  also returns comments when iterating over elements. To fix this I added
  a check for the element tags so that only the correct elements are
  considered.

Other changes:

- Added type hints to `rdflib.plugins.sparql.results.xmlresults`.
- Run with `lxml` one some permutations in the test matrix.
- Removed `rdflib.compat.etree`, as this was not very helpful for the
  SPARQL XML Result parser and it was not used elsewhere.
- Added an `lxml` environment to tox which installs `lxml` and
  `lxml-stubs`.
@aucampia aucampia force-pushed the iwana-20220718T0603-lxml-test branch from c106828 to f011df1 Compare July 21, 2022 01:52
@aucampia aucampia marked this pull request as ready for review July 21, 2022 01:52
@aucampia aucampia requested a review from a team July 21, 2022 01:59
@coveralls
Copy link

Coverage Status

Coverage remained the same at 90.485% when pulling f011df1 on iwana-20220718T0603-lxml-test into 14bb386 on master.

Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@aucampia
Copy link
Member Author

@mielvds tagging you if you have time for a review.

@aucampia aucampia added the bug Something isn't working label Jul 22, 2022
@aucampia
Copy link
Member Author

Just a heads up, I'm considering this somewhat of a last ditch effort to save lxml integration. Seeing as lxml integration was broken for a long time I don't think there is anyone relying on it, and if it is causing more problems and nobody is interested in maintaining it I will remove it.

@aucampia aucampia merged commit 0856ac8 into master Jul 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6.2.0: pytest is failing SPARQL result parsing does not work with lxml
2 participants