Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix turtle serialization of PNames that contain brackets #1678

Merged
merged 1 commit into from
Jan 30, 2022

Conversation

aucampia
Copy link
Member

@aucampia aucampia commented Jan 18, 2022

Brackets in pnames will now be escaped in output.

This fix is based on a suggestion by Niklas Lindström (@niklasl).

This is somewhat of a stopgap fix, there are some problems being
masked and various cases that could still result in questionable Turtle being
written out, however these can be addressed seperately.

Fixes #1661

@aucampia aucampia force-pushed the iwana-20220117T0103-ttl_pnames branch 3 times, most recently from 9e9d072 to d610d5f Compare January 20, 2022 23:41
@aucampia aucampia changed the title [WIP] Fix serialization of PNames in Turtle Fix turtle serialization of PNames that contain brackets Jan 20, 2022
@aucampia aucampia force-pushed the iwana-20220117T0103-ttl_pnames branch from d610d5f to 088d447 Compare January 20, 2022 23:44
@aucampia
Copy link
Member Author

I used str.replace instead of re.sub as it is a lot faster, and in this case I can't really think of problems like we had with #1663

benchmark code is here

--------------------------------------------------------------------------------------------------- benchmark 'data_key=all': 2 tests ----------------------------------------------------------------------------------------------------
Name (time in us)                                       Min                    Max                   Mean                StdDev                 Median                   IQR            Outliers         OPS            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_translate_performance[str_replace-all]        311.8995 (1.0)         772.2255 (1.0)         382.0523 (1.0)         93.8419 (1.0)         369.2673 (1.0)         49.7520 (1.0)       336;362  2,617.4429 (1.0)        3698           2
test_translate_performance[re_sub-all]          13,351.7210 (42.81)    26,107.6400 (33.81)    15,059.2793 (39.42)    1,709.6806 (18.22)    14,577.4450 (39.48)    1,898.8140 (38.17)       29;12     66.4042 (0.03)        314           1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------- benchmark 'data_key=none': 2 tests -----------------------------------------------------------------------------------------
Name (time in us)                                    Min                 Max               Mean            StdDev             Median               IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_translate_performance[str_replace-none]      1.7242 (1.0)        3.8545 (1.0)       2.1638 (1.0)      0.2845 (1.0)       2.0874 (1.0)      0.1091 (1.0)       455;560      462.1546 (1.0)        2868        1000
test_translate_performance[re_sub-none]          15.5722 (9.03)     119.8676 (31.10)    17.1613 (7.93)     2.6847 (9.44)     16.6494 (7.98)     0.9336 (8.56)       70;222       58.2706 (0.13)       3030         100
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Legend:
  Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
  OPS: Operations Per Second, computed as 1 / Mean
============================================================================ 4 passed in 20.20s =============================================================================

@aucampia aucampia marked this pull request as ready for review January 20, 2022 23:48
@aucampia
Copy link
Member Author

Also to clarify, this only fixes the escaping of brackets, the other cases are handled already by other logic:

$ .venv/bin/pytest  test/test_turtle_quoting.py test/test_turtle_quoting.py::test_serialize_roundtrip --log-level DEBUG -rA
============================================================================ test session starts ============================================================================
platform linux -- Python 3.8.12, pytest-6.2.5, py-1.11.0, pluggy-1.0.0
rootdir: /home/iwana/sw/d/github.com/iafork/rdflib, configfile: tox.ini
plugins: subtests-0.5.0, cov-3.0.0, monkeytype-1.1.0
collected 50 items                                                                                                                                                          

test/test_turtle_quoting.py ss...s............................................                                                                                        [100%]

================================================================================== PASSES ===================================================================================
______________________________________________________________________ test_pname_escaping[turtle-x-x] ______________________________________________________________________
----------------------------------------------------------------------------- Captured log call -----------------------------------------------------------------------------
DEBUG    root:test_turtle_quoting.py:185 format = turtle, char = 'x', escaped = 'x', pattern = re.compile('\\segns:propx\\s'), data = @prefix egns: <http://example.com/prefix/> .

egns:John_Doe egns:propx "foo"@en .
_____________________________________________________________________ test_pname_escaping[turtle-(-\\(] _____________________________________________________________________
----------------------------------------------------------------------------- Captured log call -----------------------------------------------------------------------------
DEBUG    root:test_turtle_quoting.py:185 format = turtle, char = '(', escaped = '\\(', pattern = re.compile('\\segns:prop\\\\\\(\\s'), data = @prefix egns: <http://example.com/prefix/> .

egns:John_Doe egns:prop\( "foo"@en .
_____________________________________________________________________ test_pname_escaping[turtle-)-\\)] _____________________________________________________________________
----------------------------------------------------------------------------- Captured log call -----------------------------------------------------------------------------
DEBUG    root:test_turtle_quoting.py:185 format = turtle, char = ')', escaped = '\\)', pattern = re.compile('\\segns:prop\\\\\\)\\s'), data = @prefix egns: <http://example.com/prefix/> .

egns:John_Doe egns:prop\) "foo"@en .
____________________________________________________________________ test_serialize_roundtrip[turtle-A] _____________________________________________________________________
----------------------------------------------------------------------------- Captured log call -----------------------------------------------------------------------------
DEBUG    root:test_turtle_quoting.py:215 format = turtle, char = A, data = @prefix egns: <http://example.com/prefix/> .

egns:John_Doe egns:propA "foo"@en .
____________________________________________________________________ test_serialize_roundtrip[turtle-2] _____________________________________________________________________
----------------------------------------------------------------------------- Captured log call -----------------------------------------------------------------------------
DEBUG    root:test_turtle_quoting.py:215 format = turtle, char = 2, data = @prefix egns: <http://example.com/prefix/> .

egns:John_Doe egns:prop2 "foo"@en .
____________________________________________________________________ test_serialize_roundtrip[turtle-c] _____________________________________________________________________
----------------------------------------------------------------------------- Captured log call -----------------------------------------------------------------------------
DEBUG    root:test_turtle_quoting.py:215 format = turtle, char = c, data = @prefix egns: <http://example.com/prefix/> .

egns:John_Doe egns:propc "foo"@en .
____________________________________________________________________ test_serialize_roundtrip[turtle-_] _____________________________________________________________________
----------------------------------------------------------------------------- Captured log call -----------------------------------------------------------------------------
DEBUG    root:test_turtle_quoting.py:215 format = turtle, char = _, data = @prefix egns: <http://example.com/prefix/> .

egns:John_Doe egns:prop_ "foo"@en .
____________________________________________________________________ test_serialize_roundtrip[turtle-~] _____________________________________________________________________
----------------------------------------------------------------------------- Captured log call -----------------------------------------------------------------------------
DEBUG    root:test_turtle_quoting.py:215 format = turtle, char = ~, data = @prefix egns: <http://example.com/prefix/> .

egns:John_Doe <http://example.com/prefix/prop~> "foo"@en .
____________________________________________________________________ test_serialize_roundtrip[turtle-.] _____________________________________________________________________
----------------------------------------------------------------------------- Captured log call -----------------------------------------------------------------------------
DEBUG    root:test_turtle_quoting.py:215 format = turtle, char = ., data = @prefix egns: <http://example.com/prefix/> .

egns:John_Doe <http://example.com/prefix/prop.> "foo"@en .
____________________________________________________________________ test_serialize_roundtrip[turtle--] _____________________________________________________________________
----------------------------------------------------------------------------- Captured log call -----------------------------------------------------------------------------
DEBUG    root:test_turtle_quoting.py:215 format = turtle, char = -, data = @prefix egns: <http://example.com/prefix/> .

egns:John_Doe egns:prop- "foo"@en .
____________________________________________________________________ test_serialize_roundtrip[turtle-!] _____________________________________________________________________
----------------------------------------------------------------------------- Captured log call -----------------------------------------------------------------------------
DEBUG    root:test_turtle_quoting.py:215 format = turtle, char = !, data = @prefix egns: <http://example.com/prefix/> .

egns:John_Doe <http://example.com/prefix/prop!> "foo"@en .
____________________________________________________________________ test_serialize_roundtrip[turtle-$] _____________________________________________________________________
----------------------------------------------------------------------------- Captured log call -----------------------------------------------------------------------------
DEBUG    root:test_turtle_quoting.py:215 format = turtle, char = $, data = @prefix egns: <http://example.com/prefix/> .

egns:John_Doe <http://example.com/prefix/prop$> "foo"@en .
____________________________________________________________________ test_serialize_roundtrip[turtle-&] _____________________________________________________________________
----------------------------------------------------------------------------- Captured log call -----------------------------------------------------------------------------
DEBUG    root:test_turtle_quoting.py:215 format = turtle, char = &, data = @prefix egns: <http://example.com/prefix/> .

egns:John_Doe <http://example.com/prefix/prop&> "foo"@en .
____________________________________________________________________ test_serialize_roundtrip[turtle-'] _____________________________________________________________________
----------------------------------------------------------------------------- Captured log call -----------------------------------------------------------------------------
DEBUG    root:test_turtle_quoting.py:215 format = turtle, char = ', data = @prefix egns: <http://example.com/prefix/> .

egns:John_Doe <http://example.com/prefix/prop'> "foo"@en .
____________________________________________________________________ test_serialize_roundtrip[turtle-(] _____________________________________________________________________
----------------------------------------------------------------------------- Captured log call -----------------------------------------------------------------------------
DEBUG    root:test_turtle_quoting.py:215 format = turtle, char = (, data = @prefix egns: <http://example.com/prefix/> .

egns:John_Doe egns:prop\( "foo"@en .
____________________________________________________________________ test_serialize_roundtrip[turtle-)] _____________________________________________________________________
----------------------------------------------------------------------------- Captured log call -----------------------------------------------------------------------------
DEBUG    root:test_turtle_quoting.py:215 format = turtle, char = ), data = @prefix egns: <http://example.com/prefix/> .

egns:John_Doe egns:prop\) "foo"@en .
____________________________________________________________________ test_serialize_roundtrip[turtle-*] _____________________________________________________________________
----------------------------------------------------------------------------- Captured log call -----------------------------------------------------------------------------
DEBUG    root:test_turtle_quoting.py:215 format = turtle, char = *, data = @prefix egns: <http://example.com/prefix/> .

egns:John_Doe <http://example.com/prefix/prop*> "foo"@en .
____________________________________________________________________ test_serialize_roundtrip[turtle-+] _____________________________________________________________________
----------------------------------------------------------------------------- Captured log call -----------------------------------------------------------------------------
DEBUG    root:test_turtle_quoting.py:215 format = turtle, char = +, data = @prefix egns: <http://example.com/prefix/> .

egns:John_Doe <http://example.com/prefix/prop+> "foo"@en .
____________________________________________________________________ test_serialize_roundtrip[turtle-,] _____________________________________________________________________
----------------------------------------------------------------------------- Captured log call -----------------------------------------------------------------------------
DEBUG    root:test_turtle_quoting.py:215 format = turtle, char = ,, data = @prefix egns: <http://example.com/prefix/> .

egns:John_Doe <http://example.com/prefix/prop,> "foo"@en .
____________________________________________________________________ test_serialize_roundtrip[turtle-;] _____________________________________________________________________
----------------------------------------------------------------------------- Captured log call -----------------------------------------------------------------------------
DEBUG    root:test_turtle_quoting.py:215 format = turtle, char = ;, data = @prefix egns: <http://example.com/prefix/> .

egns:John_Doe <http://example.com/prefix/prop;> "foo"@en .
____________________________________________________________________ test_serialize_roundtrip[turtle-=] _____________________________________________________________________
----------------------------------------------------------------------------- Captured log call -----------------------------------------------------------------------------
DEBUG    root:test_turtle_quoting.py:215 format = turtle, char = =, data = @prefix egns: <http://example.com/prefix/> .

egns:John_Doe <http://example.com/prefix/prop=> "foo"@en .
____________________________________________________________________ test_serialize_roundtrip[turtle-/] _____________________________________________________________________
----------------------------------------------------------------------------- Captured log call -----------------------------------------------------------------------------
DEBUG    root:test_turtle_quoting.py:215 format = turtle, char = /, data = @prefix egns: <http://example.com/prefix/> .

egns:John_Doe <http://example.com/prefix/prop/> "foo"@en .
____________________________________________________________________ test_serialize_roundtrip[turtle-?] _____________________________________________________________________
----------------------------------------------------------------------------- Captured log call -----------------------------------------------------------------------------
DEBUG    root:test_turtle_quoting.py:215 format = turtle, char = ?, data = @prefix egns: <http://example.com/prefix/> .

egns:John_Doe <http://example.com/prefix/prop?> "foo"@en .
____________________________________________________________________ test_serialize_roundtrip[turtle-#] _____________________________________________________________________
----------------------------------------------------------------------------- Captured log call -----------------------------------------------------------------------------
DEBUG    root:test_turtle_quoting.py:215 format = turtle, char = #, data = @prefix egns: <http://example.com/prefix/> .

egns:John_Doe <http://example.com/prefix/prop#> "foo"@en .
____________________________________________________________________ test_serialize_roundtrip[turtle-@] _____________________________________________________________________
----------------------------------------------------------------------------- Captured log call -----------------------------------------------------------------------------
DEBUG    root:test_turtle_quoting.py:215 format = turtle, char = @, data = @prefix egns: <http://example.com/prefix/> .

egns:John_Doe <http://example.com/prefix/prop@> "foo"@en .

Copy link
Member

@nicholascar nicholascar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aucampia. I'll leave a final review for @niklasl

@aucampia aucampia force-pushed the iwana-20220117T0103-ttl_pnames branch from d845f9a to 1bbe307 Compare January 29, 2022 23:57
Brackets in pnames will now be escaped in output.

This fix is based on a suggestion by Niklas Lindström (@niklasl).

This is somewhat of a stopgap fix, there are some problems being
masked and various cases that could still result in questionable Turtle being
written out, however these can be addressed seperately.
@aucampia aucampia force-pushed the iwana-20220117T0103-ttl_pnames branch from 1bbe307 to 9c1b55c Compare January 30, 2022 00:11
@aucampia
Copy link
Member Author

planning to merge this tomorrow

@nicholascar
Copy link
Member

Yes, I think so.

Copy link
Member

@niklasl niklasl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good and thoroughly tested! Do we want to keep #1661 open to track the remaining characters (the total being ~!$&'()*+,;=/?#@%), or do you think distinct issues are needed for those?

@aucampia
Copy link
Member Author

Looks good and thoroughly tested! Do we want to keep #1661 open to track the remaining characters (the total being ~!$&'()*+,;=/?#@%), or do you think distinct issues are needed for those?

I think it is best we open separate issues for them. Round trip tests for PN_LOCAL_ESC_CHARS = r"_~.-!$&'()*+,;=/?#@" do pass [ref] - they maybe pass for less than ideal reasons, in that they don't get encoded as PNames to being with, but I think that we can take that up separately. There is still the failure relating to encoding brackets into XML [ref], but for me having the xfail marker in the tests for that is enough for now.

@aucampia aucampia merged commit b2cae90 into RDFLib:master Jan 30, 2022
@aucampia aucampia deleted the iwana-20220117T0103-ttl_pnames branch April 9, 2022 14:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Turtle serializing creates invalid PNames
3 participants