Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tests demonstrating forward-slash behaviors in Turtle, JSON-LD, and SPARQL #1872

Merged
merged 14 commits into from May 17, 2022

Conversation

ajnelson-nist
Copy link
Contributor

@ajnelson-nist ajnelson-nist commented Apr 25, 2022

This pull request supports Issue 1871.

A follow-on PR will be required to finish resolving that issue.

Summary of changes

  • Add tests to confirm conformance to SPARQL specification on handling of forward-slash and backslash characters outside of string-literals.
  • Backwards-incompatibility: Current SPARQL engine permits backslash characters where SPARQL specification does not. Revisions that would get these new tests to pass will now raise parsing errors from SPARQL queries that were similarly misaligned with the SPARQL specification. However, this may be preferable, as these queries are likely to be silently failing.

Checklist

  • Checked that there aren't other open pull requests for
    the same change.
  • Added tests for any changes that have a runtime impact.
  • Checked that all tests and type checking passes.
  • For changes that have a potential impact on users of this project:
    • Updated relevant documentation to avoid inaccuracies.
    • Considered adding additional documentation.
    • Considered adding an example in ./examples for new features.
    • Considered updating our changelog (CHANGELOG.md).
  • Considered granting push permissions to the PR branch,
    so maintainers can fix minor issues and keep your PR up to date.

…nd SPARQL

Some correction is needed in the SPARQL engine to get these tests to
pass.

References:
* RDFLib#1871

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
@aucampia
Copy link
Member

aucampia commented May 2, 2022

@ajnelson-nist some tests here are failing, should they be marked as xfail maybe?

https://github.com/RDFLib/rdflib/runs/6161549514?check_suite_focus=true

  FAILED test/test_sparql/test_forward_slash_escapes.py::test_query_prepares_prefixed
  FAILED test/test_sparql/test_forward_slash_escapes.py::test_escapes_and_query_turtle_prefixed
  FAILED test/test_sparql/test_forward_slash_escapes.py::test_escapes_and_query_jsonld_prefixed

@ajnelson-nist
Copy link
Contributor Author

@aucampia No, I'd already designated xfail tests intentionally. The tests that are currently failing are highlighting behaviors that I think rdflib needs to change towards supporting. For instance, test_query_prepares_prefixed() is supposed to fail - the query written in query_string_prefixed is incorrect syntax according to the SPARQL spec, but rdflib currently accepts it.

@aucampia
Copy link
Member

aucampia commented May 4, 2022

@ajnelson-nist appologies for not getting to this yet, will look at this on friday.

@ajnelson-nist
Copy link
Contributor Author

Thanks, @aucampia . I have a hunch this could be something like a 1-line patch to code, but finding that 1 line could take a while.

@aucampia
Copy link
Member

aucampia commented May 6, 2022

  • Backwards-incompatibility: Current SPARQL engine permits backslash characters where SPARQL specification does not. Revisions that would get these new tests to pass will now raise parsing errors from SPARQL queries that were similarly misaligned with the SPARQL specification. However, this may be preferable, as these queries are likely to be silently failing.

To me, if we are not following a spec that we say we comply with, it is a bug, and if someone is relying on a bug they should fix their code, though I can see how this is rather extreme, and maybe especially extreme in this case.

PS: You can just comment with pre-commit.ci autofix to get pre-commit to fix imports and formatting.

@ajnelson-nist
Copy link
Contributor Author

  • Backwards-incompatibility: Current SPARQL engine permits backslash characters where SPARQL specification does not. Revisions that would get these new tests to pass will now raise parsing errors from SPARQL queries that were similarly misaligned with the SPARQL specification. However, this may be preferable, as these queries are likely to be silently failing.

To me, if we are not following a spec that we say we comply with, it is a bug, and if someone is relying on a bug they should fix their code, though I can see how this is rather extreme, and maybe especially extreme in this case.

I agree it's a harsh resolution. However, not warning the user feels a bit like applying the ostrich algorithm to queries that are very likely not working as they should. I continue to think that parse errors are eventually the right answer.

It's probably better to raise a DeprecationWarning on encountering these slashes in a SPARQL query, with a warning that a future major release will raise parse errors.

@ajnelson-nist
Copy link
Contributor Author

PS: You can just comment with pre-commit.ci autofix to get pre-commit to fix imports and formatting.

Oh? Sorry, I didn't realize I'd done something pre-commit would complain about. Update patch coming in a moment if I can confirm...

@aucampia
Copy link
Member

aucampia commented May 6, 2022

I agree it's a harsh resolution. However, not warning the user feels a bit like applying the ostrich algorithm to queries that are very likely not working as they should. I continue to think that parse errors are eventually the right answer.

The worry here is, if we take it as a breaking change the major version must change, and I somewhat want to stagger interface breaking changes so we don't release a new major version every time.

It's probably better to raise a DeprecationWarning on encountering these slashes in a SPARQL query, with a warning that a future major release will raise parse errors.

I could go for that.

PS: You can just comment with pre-commit.ci autofix to get pre-commit to fix imports and formatting.

Oh? Sorry, I didn't realize I'd done something pre-commit would complain about. Update patch coming in a moment if I can confirm...

You did not, just saw you made a commit to fix isort, I would also do the same but we want the isort and black to not be a nuisance to anyone and that is why we added pre-commit.ci, so I just let people know that it can black and isort their code if they want, just in case you did not know..

@ajnelson-nist
Copy link
Contributor Author

You did not, just saw you made a commit to fix isort, I would also do the same but we want the isort and black to not be a nuisance to anyone and that is why we added pre-commit.ci, so I just let people know that it can black and isort their code if they want, just in case you did not know..

I'd picked up on your recent addition of pre-commit. I think I actually initialized pre-commit after my initial patch. Regardless, I'm using it now, no nuisance induced, no worries. I just ran a catch-up merge and checked afterwards with pre-commit, everything seems fine from a formatting front.

@ajnelson-nist
Copy link
Contributor Author

I agree it's a harsh resolution. However, not warning the user feels a bit like applying the ostrich algorithm to queries that are very likely not working as they should. I continue to think that parse errors are eventually the right answer.

The worry here is, if we take it as a breaking change the major version must change, and I somewhat want to stagger interface breaking changes so we don't release a new major version every time.

It's probably better to raise a DeprecationWarning on encountering these slashes in a SPARQL query, with a warning that a future major release will raise parse errors.

I could go for that.

I'm fine with the error being raised only after the next (or another upcoming) major release. I'll adjust the pytest to check rdflib.__version__. I don't know where to put the DeprecationWarning warning, though, and kind of need your (your-plural) help working that out if we all agree on the general path forward.

ajnelson-nist and others added 3 commits May 6, 2022 14:37
`_test_escapes_and_query` accidentally conflated the query "Running" as
both compiling and returning results.  This patch changes the
`query_ran` variable to `query_compiled`.

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
AJN: @aucampia suggested this patch in Pull Request discussion, and I
applied it while fixing a typo.

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
Some tests are now delayed for rdflib 7, as a result of discussion of
balancing bug correction versus backwards compatibility.

Suggested-by: Iwan Aucamp <aucampia@gmail.com>
Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
@ajnelson-nist
Copy link
Contributor Author

I've pushed a delay-until-rdflib-7 patch. The patch just prior should demonstrate from CI logs that these tests will expect parse failures. I realized just now that there is a bit of a coding-style mix, but the pass/fail results are correct.

Could you comment on whether I did the version-7 detection correct, in terms of this project's version stamping policy? It's currently set to turn the tests on once version 7 releases. Thinking about it a bit more, you probably want these tests to turn on once you enter version 7 beta, rather than letting the new production release fall over immediately after stamping. ...having typed that out, I can't see you responding otherwise. Patch coming.

Signed-off-by: Alex Nelson <alexander.nelson@nist.gov>
@ajnelson-nist
Copy link
Contributor Author

Ok, this PR is once again ready for discussion on fixing the bug.

@coveralls
Copy link

coveralls commented May 6, 2022

Coverage Status

Coverage remained the same at 88.423% when pulling 519febc on ajnelson-nist:issue-1871 into 3c72ce1 on RDFLib:master.

@aucampia
Copy link
Member

aucampia commented May 6, 2022

@ajnelson-nist I will have a look, I just tried with jena though and get this:

$ sparql --query <(echo '
> PREFIX ex:    <http://example.org/ontology/>
> CONSTRUCT   { ex:s ex:p ex:\/o }
> WHERE       { }
> 
> ') --data ./test/data/rdfs.ttl
@prefix dc:   <http://purl.org/dc/elements/1.1/> .
@prefix ex:   <http://example.org/ontology/> .
@prefix owl:  <http://www.w3.org/2002/07/owl#> .
@prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

ex:s    ex:p    <http://example.org/ontology//o> .

@aucampia
Copy link
Member

aucampia commented May 6, 2022

Also check RDF4J now:

$ curl --silent 'http://127.0.0.1:8080/rdf4j-server/repositories/a' \
> --data-urlencode 'query=
> PREFIX ex:    <http://example.org/ontology/>
> CONSTRUCT   { ex:s ex:p ex:\/o }
> WHERE       { }
> '
<http://example.org/ontology/s> <http://example.org/ontology/p> <http://example.org/ontology//o> .

@aucampia
Copy link
Member

aucampia commented May 6, 2022

Wikidata's query service also allows it, I think maybe the better option is to somehow implement a strict parsing flag.

@ajnelson-nist
Copy link
Contributor Author

Three more implementations that disagree with me suggest I might not have read SPARQL's grammar correctly.

Re-reviewing the grammar, A.2 and/or A.3 hint that I might not have understood whitespace processing possibly allowing for forward slashes. It appears ex:foo/bar and ex:foo / bar are supposed to parse differently; I'd thought instead that / would always be a path operator.

I don't think I'm at a good energy level on a Friday afternoon to resolve this. If anyone's inclined, fresher eyes are welcome before I can return to this next week.

@aucampia
Copy link
Member

aucampia commented May 6, 2022

Three more implementations that disagree with me suggest I might not have read SPARQL's grammar correctly.

I don't think you have, at least I understand it the same, and I can see even where the difference comes in:

(([%(PN_CHARS)s\\.:]|%(PLX)s)*

If this line was instead:

                     (([%(PN_CHARS)s.:]|%(PLX)s)*

Then it would behave as you expect.

But I still think that given other implementations allow it we should maybe be similarly leniant.

@aucampia
Copy link
Member

Just a checkin on this again, I think the most ideal solution here will be if we could somehow have a strict flag on the query parser, and then if it is set treat this as an error. I'm just not sure what the best way of passing this flag is right now, I guess we can go with *kwargs but will think about it a bit still.

@aucampia aucampia added the backwards incompatible will affect backwards compatibility, this includes things like breaking our interface label May 13, 2022
@aucampia
Copy link
Member

I think the best to do with these test is to merge them with xfail, and then once we have some solution to making the parser strict we can use that.

@ajnelson-nist
Copy link
Contributor Author

I'd remarked on Issue 1890 on how well I think a keyword argument could work. I noticed prepareQuery doesn't have a *args positional-vs-keyword separator, so I'm not sure what the interface implications would be of trying to add a keyword argument. I don't think you'd need to resort to a **kwargs where strict is fished out of it with a membership test. That is, this interface for prepareQuery could work:

#Current:
def prepareQuery(queryString, initNs={}, base=None) -> Query: ...
#After:
def prepareQuery(queryString, initNs={}, base=None, *args: typing.Any, strict: bool=False) -> Query: ...

I'll be honest, Python's keyword vs positional arguments with defaults was confusing to me for many years, and still is when I don't see a *args sitting in the argument list. I'd have to test to see if putting *args anywhere in that list would be a breakage.

@ajnelson-nist
Copy link
Contributor Author

@aucampia I've just done a catch-up merge with master and left a TODO to explore the strict flag.

The sonarcloud service is complaining, to the tune of failing a test, about my usage of http://example.org in prefixes. I'm afraid that's going to complain quite a bit for other namespaces that use the http scheme. Is this a blocking issue? If so, would you prefer I use the urn:example: prefix to avoid making the scheme-review problem worse?

The new test passes (PASS and XFAILs work), so I am otherwise happy with this being merged and/or getting other maintainer feedback.

@sonarcloud
Copy link

sonarcloud bot commented May 16, 2022

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 2 Code Smells

No Coverage information No Coverage information
1.5% 1.5% Duplication

@aucampia
Copy link
Member

@aucampia I've just done a catch-up merge with master and left a TODO to explore the strict flag.

The sonarcloud service is complaining, to the tune of failing a test, about my usage of http://example.org in prefixes. I'm afraid that's going to complain quite a bit for other namespaces that use the http scheme. Is this a blocking issue? If so, would you prefer I use the urn:example: prefix to avoid making the scheme-review problem worse?

The new test passes (PASS and XFAILs work), so I am otherwise happy with this being merged and/or getting other maintainer feedback.

I will process this later this week, hopefully tomorrow.

@aucampia aucampia removed the backwards incompatible will affect backwards compatibility, this includes things like breaking our interface label May 17, 2022
@aucampia
Copy link
Member

@ajnelson-nist I'm going to make some changes to your branch, happy to adjust them based on your feedback, but in summary:

  • Your JSON-LD is not valid according to riot and jsonld-cli, and I think the spec, so I will adjust it slightly so it does become valid: RDFLib JSON-LD processor accepts object's as values to @type #1950
  • I'm going to move the turtle and json-ld content into test/data/variants so that they get processed by test/test_graph/test_variants.py.
  • I'm going to change the skipifs into XFAILs as I the most reasonable change here is to provide a strict flag for query parsing. The other concern is that we ideally want to be aware of these issues independently of the RDFLib version, even if we were to be targeting a specific version to fix it in, skips should ideally be used only in cases where the test needs something from the environment to succeed.

Regarding SonarCloud bot, it still needs tuning, at best it is a guide for potential issues, not something you should consider as authoritative at the moment, and the URI thing is going to be a problem until we figure out how to disable that permanently on SonarCloud.

- I moved the Turtle and JSON-LD tests to `test/data/variants` as this
  allows for further expansion and uses a shared processor which
  somewhat simplifies testing.
- I changed the JSON-LD to be compatible with Jena's riot and jsonld-cli
  as these do not allow for object valued `@type` directives. I also
  think this is in line with the JSON-LD spec.
- I changed the conditional skips to xfails, as I think this is most
  appropriate in this case, we can provide a strict flag before version
  7.0, and the strict flag's default value can be varied from release to
  release.
@aucampia
Copy link
Member

@ajnelson-nist made some minor changes in c387c78

Please let me know if you have any concerns.

Copy link
Member

@aucampia aucampia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, we definitely should provide a way to make processing strict for everything we do, in part because there are many harmful consequences of the "robustness principle" [ref]:

@aucampia aucampia requested a review from a team May 17, 2022 12:58
@aucampia aucampia added the review wanted This indicates that the PR is ready for review label May 17, 2022
@ajnelson-nist
Copy link
Contributor Author

I'm going to change the skipifs into XFAILs as I the most reasonable change here is to provide a strict flag for query parsing. The other concern is that we ideally want to be aware of these issues independently of the RDFLib version, even if we were to be targeting a specific version to fix it in, skips should ideally be used only in cases where the test needs something from the environment to succeed.

I'm fine with this, and agree with the desired usages of skips. If at least you and I have a consensus on the strict flag, I think we should make sure Issue 1871 does NOT close when merging this PR. 1871 should slightly rescope to add a strict flag to prepareQuery and its related functions (potentially including Graph.query).

@aucampia
Copy link
Member

I'm overall fine with this commit. It looks like there should be at least one follow-on commit to fix typographical issues.

Making it now.

Please let me know what commit I should merge into my branch, if I still need to make that merge. (I continue to not understand if checking the "Allow edits and access to secrets by maintainers" box lets you actually do that.)

You don't have to merge anything into your branch, changes occur directly in your branch, so if you pull you will get the changes.

@aucampia
Copy link
Member

I've never seen this construct before, and I don't see where the IANA style of prefixing "x-" is allowed in the JSON-LD spec. I don't work with named graphs, so I don't know what the best way is to attach a comment to a graph. If this doesn't cause breakage, I'm ok with it staying, but I want to log (just in this comment is fine) that I don't know how valid @x-comment is.

It is allowed, but it is also technically reserved for future use and will be ignored by compliant processors (see this note) - I also checked that it works with riot, rdfpipe and jsonld-cli - however it is maybe best to avoid using a reserved keyword, so I will change this to _comment which also works fine with said processors and does not effect the graph obtained from parsing it.

- Use `_comment` instead of `@x-comment` to add a comment to JSONLD.
- Fixed some typographical errors in tests.
- Remove overpinning of output to simplify things a bit.
@aucampia
Copy link
Member

If at least you and I have a consensus on the strict flag, I think we should make sure Issue 1871 does NOT close when merging this PR. 1871 should slightly rescope to add a strict flag to prepareQuery and its related functions (potentially including Graph.query).

To prevent this you will have to change the description of this PR here #1872 (comment) to not contain resolve [Issue 1871](https://github.com/RDFLib/rdflib/issues/1871) - I tried disassociating this PR manually but I can't do that it seems.

@aucampia
Copy link
Member

It looks like there should be at least one follow-on commit to fix typographical issues.

Done now.

@ajnelson-nist
Copy link
Contributor Author

If at least you and I have a consensus on the strict flag, I think we should make sure Issue 1871 does NOT close when merging this PR. 1871 should slightly rescope to add a strict flag to prepareQuery and its related functions (potentially including Graph.query).

To prevent this you will have to change the description of this PR here #1872 (comment) to not contain resolve [Issue 1871](https://github.com/RDFLib/rdflib/issues/1871) - I tried disassociating this PR manually but I can't do that it seems.

I've revised the original comment, and that seems to have mechanically disassociated the issue.

@ajnelson-nist
Copy link
Contributor Author

Is there any reason test/test_sparql/test_datetime_processing.py picked up a whitespace change?

@ajnelson-nist
Copy link
Contributor Author

Aside from the extra unexplained change in test_datetime_processing, this looks good to me to merge.

@aucampia
Copy link
Member

Is there any reason test/test_sparql/test_datetime_processing.py picked up a whitespace change?

In which commit? I don't see any non intended changes here: 519febc

Maybe you are talking about changes in the value of PN_LOCAL_BACKSLASH_XFAIL_REASON - and that is because I re-wrapped the text, reporting output is now:

$ task test -- test/test_sparql/test_forward_slash_escapes.py -rA
task: [test]  .venv/bin/python -m pytest  test/test_sparql/test_forward_slash_escapes.py -rA
============================================================================ test session starts ============================================================================
platform linux -- Python 3.10.4, pytest-7.1.2, pluggy-1.0.0
rootdir: /home/iwana/sw/d/github.com/iafork/rdflib.cleanish, configfile: pyproject.toml
plugins: subtests-0.7.0, cov-3.0.0
collected 6 items                                                                                                                                                           

test/test_sparql/test_forward_slash_escapes.py .x.x.x                                                                                                                 [100%]

================================================================================== PASSES ===================================================================================
========================================================================== short test summary info ==========================================================================
PASSED test/test_sparql/test_forward_slash_escapes.py::test_query_prepares_expanded
PASSED test/test_sparql/test_forward_slash_escapes.py::test_escapes_and_query_turtle_expanded
PASSED test/test_sparql/test_forward_slash_escapes.py::test_escapes_and_query_jsonld_expanded
XFAIL test/test_sparql/test_forward_slash_escapes.py::test_query_prepares_prefixed
  
    Contrary to the ratified SPARQL 1.1 grammar, the RDFlib SPARQL propcessor
    accepts backslashes as part of PN_LOCAL which it treats as escape
    characters.

    There should be a way to instruct the SPARQL parser to operate in strict
    mode, and in strict mode backslashes should not be permitted in PN_LOCAL.

    See https://github.com/RDFLib/rdflib/issues/1871

XFAIL test/test_sparql/test_forward_slash_escapes.py::test_escapes_and_query_turtle_prefixed
  
    Contrary to the ratified SPARQL 1.1 grammar, the RDFlib SPARQL propcessor
    accepts backslashes as part of PN_LOCAL which it treats as escape
    characters.

    There should be a way to instruct the SPARQL parser to operate in strict
    mode, and in strict mode backslashes should not be permitted in PN_LOCAL.

    See https://github.com/RDFLib/rdflib/issues/1871

XFAIL test/test_sparql/test_forward_slash_escapes.py::test_escapes_and_query_jsonld_prefixed
  
    Contrary to the ratified SPARQL 1.1 grammar, the RDFlib SPARQL propcessor
    accepts backslashes as part of PN_LOCAL which it treats as escape
    characters.

    There should be a way to instruct the SPARQL parser to operate in strict
    mode, and in strict mode backslashes should not be permitted in PN_LOCAL.

    See https://github.com/RDFLib/rdflib/issues/1871

======================================================================= 3 passed, 3 xfailed in 0.30s ========================================================================

@ajnelson-nist
Copy link
Contributor Author

ajnelson-nist commented May 17, 2022

Ah, I think I've found it's out of scope of this PR, so no worries.

I reviewed git diff master..issue-1871 and saw this snippet:

diff --git a/test/test_sparql/test_datetime_processing.py b/test/test_sparql/test_datetime_processing.py
index 8cec5cca..41da92be 100644
--- a/test/test_sparql/test_datetime_processing.py
+++ b/test/test_sparql/test_datetime_processing.py
@@ -126,7 +126,7 @@ def test_dateTime_duration_subs():
     WHERE {
         VALUES (?duration ?d) {
             ("P3DT1H15M"^^xsd:dayTimeDuration "2000-10-30T11:12:00"^^xsd:dateTime)
-            ("P3DT1H15M"^^xsd:dayTimeDuration "2000-10-30"^^xsd:date)
+                   ("P3DT1H15M"^^xsd:dayTimeDuration "2000-10-30"^^xsd:date)
         }
     }
     """

But, Github isn't showing it.

When I reviewed the file with a text editor, I found it picked up tabs instead of spaces somewhere in recent work on the master branch.

So, out of scope. I'm fine with this PR going in.

@aucampia aucampia requested a review from a user May 17, 2022 17:39
@aucampia aucampia merged commit ccb9c4a into RDFLib:master May 17, 2022
@ajnelson-nist ajnelson-nist deleted the issue-1871 branch May 19, 2022 16:28
@ghost ghost removed the review wanted This indicates that the PR is ready for review label Jun 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants