Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed the handling of escape sequences in the ntriples and nquads parsers #1663

Merged
merged 3 commits into from Jan 18, 2022

Conversation

aucampia
Copy link
Member

@aucampia aucampia commented Jan 12, 2022

These parsers will now correctly handle strings like "\\r".

The time it takes for these parsers to parse strings with escape
sequences will be increased, and the increase will be correlated with
the amount of escape sequences that occur in a string.

For strings with many escape sequences the parsing speed seems to be
almost 4 times slower.

Also:

  • Add graph variant test scaffolding. Multiple files representing the
    same graph can now easily be tested to be isomorphic by just adding
    them in test/variants.
  • Add more things to testutils.GraphHelper, including some methods that does
    asserts with better messages. Also include some tests for GraphHelper.
  • Add some extra files to test_roundtrip, set the default identifier
    when parsing, and change verbose flag to rather be based on debug
    logging.
  • move one test from test/test_issue247.py to variants.
  • Fix problems with .editorconfig which prevents it from working
    properly.
  • Add xfail tests for a couple of issues
    This includes xfails for the following issues:

Fixes #1655

@aucampia aucampia marked this pull request as draft January 12, 2022 22:33
@aucampia
Copy link
Member Author

@nicholascar this is okay to merge, but I think it has some conflicts with #1656 and #1656 also contains some fixes for xfail tests that I added here, so I will rebase this once that is finished.

The fix will remain as is though, changes will only occur in test/ directory.

@aucampia
Copy link
Member Author

I have done some performance testing in here

In summary, the new function is about the same speed in cases with fewer escape sequences, but when it is processing strings with many string escape sequeces (e.g. \n or \r) it is almost 4 times slower.

The string escape sequences are the escape sequences that were not being handled correctly before, so it would make sense that most of the cost comes in here.

Output of the performance tests can be seen below:

$ poetry run pytest ./tests/test_unescape_performance.py --benchmark-group-by param:test_strings_key
============================================================================ test session starts ============================================================================
platform linux -- Python 3.10.1, pytest-6.2.5, py-1.11.0, pluggy-1.0.0
benchmark: 3.4.1 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/iwana/sw/d/github.com/aucampia/rdflib-scratchpad, configfile: pyproject.toml
plugins: subtests-0.5.0, cov-3.0.0, benchmark-3.4.1, profiling-1.7.0
collected 19 items                                                                                                                                                          

tests/test_unescape_performance.py ...................                                                                                                                [100%]

---------- coverage: platform linux, python 3.10.1-final-0 -----------
Name                                         Stmts   Miss  Cover   Missing
--------------------------------------------------------------------------
src/aucampia/__init__.py                         2      0   100%
src/aucampia/rdflib/__init__.py                  2      0   100%
src/aucampia/rdflib/scratchpad/__init__.py       0      0   100%
src/aucampia/rdflib/scratchpad/_version.py       1      1     0%   1
src/aucampia/rdflib/scratchpad/cli.py           35     35     0%   5-103
src/aucampia/rdflib/scratchpad/unquote.py       65      5    92%   86, 111, 114-117
--------------------------------------------------------------------------
TOTAL                                          105     41    61%



---------------------------------------------------------------------------------- benchmark 'test_strings_key=few_escapes': 3 tests -----------------------------------------------------------------------------------
Name (time in us)                                      Min                 Max               Mean            StdDev             Median               IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_performance[unquote_new-few_escapes]          20.6550 (1.0)       63.6000 (1.0)      23.9169 (1.0)      3.3959 (1.0)      22.9790 (1.0)      2.5099 (2.79)    1505;1214       41.8115 (1.0)       30025           1
test_performance[unquote_old-few_escapes]          31.2369 (1.51)      86.9200 (1.37)     32.7835 (1.37)     4.4484 (1.31)     31.7130 (1.38)     0.9000 (1.0)      613;1130       30.5031 (0.73)      19029           1
test_performance[unquote_validate-few_escapes]     56.5629 (2.74)     136.6610 (2.15)     58.6242 (2.45)     6.3523 (1.87)     57.1000 (2.48)     1.3730 (1.53)      516;791       17.0578 (0.41)      13661           1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------------- benchmark 'test_strings_key=many_escapes': 3 tests --------------------------------------------------------------------------------------
Name (time in us)                                        Min                   Max                Mean              StdDev              Median                IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_performance[unquote_old-many_escapes]          157.8090 (1.0)        335.1220 (1.0)      168.2338 (1.0)       12.4257 (1.0)      165.3350 (1.0)       6.4884 (1.0)       360;458        5.9441 (1.0)        5371           1
test_performance[unquote_new-many_escapes]          181.9211 (1.15)       412.7750 (1.23)     203.8222 (1.21)      27.6217 (2.22)     196.4370 (1.19)     31.3795 (4.84)      259;113        4.9062 (0.83)       3663           1
test_performance[unquote_validate-many_escapes]     527.6541 (3.34)     1,207.7210 (3.60)     599.7562 (3.57)     125.5351 (10.10)    556.6350 (3.37)     90.8774 (14.01)     150;140        1.6673 (0.28)       1797           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------- benchmark 'test_strings_key=many_escapes_narrow': 3 tests ---------------------------------------------------------------------------------------
Name (time in us)                                               Min                   Max                Mean              StdDev              Median                IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_performance[unquote_new-many_escapes_narrow]          186.9510 (1.0)        862.6611 (2.19)     224.9757 (1.13)      61.3487 (3.58)     202.4500 (1.05)     24.0203 (3.75)      567;783        4.4449 (0.89)       4875           1
test_performance[unquote_old-many_escapes_narrow]          190.0180 (1.02)       394.5540 (1.0)      199.5608 (1.0)       17.1156 (1.0)      193.3036 (1.0)       6.4040 (1.0)       461;616        5.0110 (1.0)        3986           1
test_performance[unquote_validate-many_escapes_narrow]     550.5960 (2.95)     2,762.7711 (7.00)     640.1752 (3.21)     118.2362 (6.91)     603.3650 (3.12)     70.2695 (10.97)      100;99        1.5621 (0.31)       1633           1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------- benchmark 'test_strings_key=many_escapes_string': 3 tests ---------------------------------------------------------------------------------------
Name (time in us)                                               Min                   Max                Mean              StdDev              Median                IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_performance[unquote_old-many_escapes_string]           34.5800 (1.0)      2,135.3530 (7.51)      36.3306 (1.0)       15.6064 (1.80)      35.5910 (1.0)       0.9671 (1.44)      170;848       27.5250 (1.0)       18888           1
test_performance[unquote_new-many_escapes_string]          125.9911 (3.64)       284.2019 (1.0)      131.3483 (3.62)       8.6620 (1.0)      129.8441 (3.65)      0.6723 (1.0)      384;2398        7.6133 (0.28)       7209           1
test_performance[unquote_validate-many_escapes_string]     412.5390 (11.93)    1,590.4229 (5.60)     578.3370 (15.92)    175.2262 (20.23)    514.7040 (14.46)    88.4010 (131.50)    314;351        1.7291 (0.06)       2242           1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------- benchmark 'test_strings_key=many_escapes_wide': 3 tests ---------------------------------------------------------------------------------------
Name (time in us)                                             Min                   Max                Mean              StdDev              Median                IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_performance[unquote_new-many_escapes_wide]          199.0810 (1.0)        566.6000 (1.68)     218.3829 (1.05)      35.5424 (3.44)     205.9960 (1.01)     11.8142 (2.01)      269;751        4.5791 (0.95)       4065           1
test_performance[unquote_old-many_escapes_wide]          200.5970 (1.01)       337.2300 (1.0)      207.4374 (1.0)       10.3271 (1.0)      203.2500 (1.0)       5.8776 (1.0)       525;668        4.8207 (1.0)        4545           1
test_performance[unquote_validate-many_escapes_wide]     577.1730 (2.90)     1,395.9260 (4.14)     653.9705 (3.15)     100.4474 (9.73)     629.0320 (3.09)     81.9499 (13.94)       72;60        1.5291 (0.32)       1371           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------------- benchmark 'test_strings_key=no_escapes': 3 tests --------------------------------------------------------------------------------------------
Name (time in ns)                                        Min                    Max                  Mean                StdDev                Median                 IQR             Outliers  OPS (Kops/s)            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_performance[unquote_new-no_escapes]            308.7509 (1.0)       2,293.8999 (1.0)        368.6624 (1.0)         91.9060 (1.0)        337.7034 (1.0)       56.3450 (1.0)    10851;10725    2,712.5091 (1.0)      141744          20
test_performance[unquote_old-no_escapes]            437.9544 (1.42)     27,126.9819 (11.83)      574.8759 (1.56)       173.0340 (1.88)       554.0205 (1.64)      58.9062 (1.05)     5921;8709    1,739.5060 (0.64)     153422           1
test_performance[unquote_validate-no_escapes]     8,968.0543 (29.05)    41,804.0436 (18.22)    9,557.4152 (25.92)    1,250.9006 (13.61)    9,387.9644 (27.80)    118.9765 (2.11)     1348;8067      104.6308 (0.04)      71587           1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Legend:
  Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
  OPS: Operations Per Second, computed as 1 / Mean
============================================================================ 19 passed in 19.46s ============================================================================

@aucampia
Copy link
Member Author

to provide an impression of the scope of the problems with the current function:

$ git checkout upstream/master -- rdflib
$ .venv/bin/pytest test/test_turtle_quoting.py test/test_variants.py test/test_roundtrip.py --tb=line
...
========================================================================== short test summary info ==========================================================================
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote-\\\\u0041-\\u00410] - AssertionError: assert '\\u0041' == 'A'
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote-\\\\u0041-\\u00411] - AssertionError: assert '\\u0041' == 'A'
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote-\\\\U00000041-\\U000000410] - AssertionError: assert '\\U00000041' == 'A'
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote-\\\\U00000041-\\U000000411] - AssertionError: assert '\\U00000041' == 'A'
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote-\\\\u0031-\\u00310] - AssertionError: assert '\\u0031' == '1'
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote-\\\\u0031-\\u00311] - AssertionError: assert '\\u0031' == '1'
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote-\\\\U00000031-\\U000000310] - AssertionError: assert '\\U00000031' == '1'
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote-\\\\U00000031-\\U000000311] - AssertionError: assert '\\U00000031' == '1'
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote-\\\\u0061-\\u00610] - AssertionError: assert '\\u0061' == 'a'
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote-\\\\u0061-\\u00611] - AssertionError: assert '\\u0061' == 'a'
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote-\\\\U00000061-\\U000000610] - AssertionError: assert '\\U00000061' == 'a'
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote-\\\\U00000061-\\U000000611] - AssertionError: assert '\\U00000061' == 'a'
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote-\\\\u005c-\\u005c] - AssertionError: assert '\\u005c' == '\\'
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote-\\\\u005C-\\u005C] - AssertionError: assert '\\u005C' == '\\'
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote-\\\\U0000005c-\\U0000005c] - AssertionError: assert '\\U0000005c' == '\\'
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote-\\\\U0000005C-\\U0000005C] - AssertionError: assert '\\U0000005C' == '\\'
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote-\\\\u000a-\\u000a] - AssertionError: assert '\\u000a' == '\n'
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote-\\\\u000A-\\u000A] - AssertionError: assert '\\u000A' == '\n'
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote-\\\\U0000000a-\\U0000000a] - AssertionError: assert '\\U0000000a' == '\n'
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote-\\\\U0000000A-\\U0000000A] - AssertionError: assert '\\U0000000A' == '\n'
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote-\\\\u00f8-\\u00f8] - AssertionError: assert '\\u00f8' == 'ø'
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote-\\\\u00F8-\\u00F8] - AssertionError: assert '\\u00F8' == 'ø'
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote-\\\\U000000f8-\\U000000f8] - AssertionError: assert '\\U000000f8' == 'ø'
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote-\\\\U000000F8-\\U000000F8] - AssertionError: assert '\\U000000F8' == 'ø'
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote-\\\\u00e6-\\u00e6] - AssertionError: assert '\\u00e6' == 'æ'
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote-\\\\u00E6-\\u00E6] - AssertionError: assert '\\u00E6' == 'æ'
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote-\\\\U000000e6-\\U000000e6] - AssertionError: assert '\\U000000e6' == 'æ'
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote-\\\\U000000E6-\\U000000E6] - AssertionError: assert '\\U000000E6' == 'æ'
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote-\\\\u00e5-\\u00e5] - AssertionError: assert '\\u00e5' == 'å'
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote-\\\\u00E5-\\u00E5] - AssertionError: assert '\\u00E5' == 'å'
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote-\\\\U000000e5-\\U000000e5] - AssertionError: assert '\\U000000e5' == 'å'
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote-\\\\U000000E5-\\U000000E5] - AssertionError: assert '\\U000000E5' == 'å'
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote-\\\\t-\\t] - AssertionError: assert '\\t' == '\\\t'
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote-\\\\b-\\b] - AssertionError: assert '\\b' == '\\\x08'
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote-\\\\n-\\n] - AssertionError: assert '\\n' == '\\\n'
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote-\\\\r-\\r] - AssertionError: assert '\\r' == '\\\r'
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote-\\\\f-\\f] - AssertionError: assert '\\f' == '\\\x0c'
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote_validate-\\u005c-\\] - rdflib.exceptions.ParserError: Illegal escape at: \u005c...
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote_validate-\\\\\\u005c-\\\\] - rdflib.exceptions.ParserError: Illegal escape at: \u005c...
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote_validate-\\U0000005c-\\] - rdflib.exceptions.ParserError: Illegal escape at: \U0000005c...
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote_validate-\\\\\\U0000005c-\\\\] - rdflib.exceptions.ParserError: Illegal escape at: \U0000005...
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote_validate-\\u000a-\n] - rdflib.exceptions.ParserError: Illegal escape at: \u000a...
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote_validate-\\\\\\u000a-\\\n] - rdflib.exceptions.ParserError: Illegal escape at: \u000a...
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote_validate-\\U0000000a-\n] - rdflib.exceptions.ParserError: Illegal escape at: \U0000000a...
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote_validate-\\\\\\U0000000a-\\\n] - rdflib.exceptions.ParserError: Illegal escape at: \U0000000...
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote_validate-\\u00f8-\xf8] - rdflib.exceptions.ParserError: Illegal escape at: \u00f8...
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote_validate-\\\\\\u00f8-\\\xf8] - rdflib.exceptions.ParserError: Illegal escape at: \u00f8...
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote_validate-\\U000000f8-\xf8] - rdflib.exceptions.ParserError: Illegal escape at: \U000000f8...
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote_validate-\\\\\\U000000f8-\\\xf8] - rdflib.exceptions.ParserError: Illegal escape at: \U00000...
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote_validate-\\u00e6-\xe6] - rdflib.exceptions.ParserError: Illegal escape at: \u00e6...
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote_validate-\\\\\\u00e6-\\\xe6] - rdflib.exceptions.ParserError: Illegal escape at: \u00e6...
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote_validate-\\U000000e6-\xe6] - rdflib.exceptions.ParserError: Illegal escape at: \U000000e6...
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote_validate-\\\\\\U000000e6-\\\xe6] - rdflib.exceptions.ParserError: Illegal escape at: \U00000...
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote_validate-\\u00e5-\xe5] - rdflib.exceptions.ParserError: Illegal escape at: \u00e5...
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote_validate-\\\\\\u00e5-\\\xe5] - rdflib.exceptions.ParserError: Illegal escape at: \u00e5...
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote_validate-\\U000000e5-\xe5] - rdflib.exceptions.ParserError: Illegal escape at: \U000000e5...
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote_validate-\\\\\\U000000e5-\\\xe5] - rdflib.exceptions.ParserError: Illegal escape at: \U00000...
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote_validate-\\b-\x08] - rdflib.exceptions.ParserError: Illegal escape at: \b...
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote_validate-\\\\\\b-\\\x08] - rdflib.exceptions.ParserError: Illegal escape at: \b...
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote_validate-\\f-\x0c] - rdflib.exceptions.ParserError: Illegal escape at: \f...
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote_validate-\\\\\\f-\\\x0c] - rdflib.exceptions.ParserError: Illegal escape at: \f...
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote_validate-\\'-'] - rdflib.exceptions.ParserError: Illegal escape at: \'...
FAILED test/test_turtle_quoting.py::test_unquote_correctness[ntriples_unquote_validate-\\\\\\'-\\'] - rdflib.exceptions.ParserError: Illegal escape at: \'...
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\u0041-\\u00410] - AssertionError: assert 'A' == '\\u0041'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\u0041-\\u00411] - AssertionError: assert 'A' == '\\u0041'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\U00000041-\\U000000410] - AssertionError: assert 'A' == '\\U00000041'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\U00000041-\\U000000411] - AssertionError: assert 'A' == '\\U00000041'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\u0031-\\u00310] - AssertionError: assert '1' == '\\u0031'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\u0031-\\u00311] - AssertionError: assert '1' == '\\u0031'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\U00000031-\\U000000310] - AssertionError: assert '1' == '\\U00000031'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\U00000031-\\U000000311] - AssertionError: assert '1' == '\\U00000031'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\u0061-\\u00610] - AssertionError: assert 'a' == '\\u0061'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\u0061-\\u00611] - AssertionError: assert 'a' == '\\u0061'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\U00000061-\\U000000610] - AssertionError: assert 'a' == '\\U00000061'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\U00000061-\\U000000611] - AssertionError: assert 'a' == '\\U00000061'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\u005c-\\u005c0] - AssertionError: assert '\\' == '\\u005c'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\u005C-\\u005C0] - AssertionError: assert '\\' == '\\u005C'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\U0000005c-\\U0000005c0] - AssertionError: assert '\\' == '\\U0000005c'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\U0000005C-\\U0000005C0] - AssertionError: assert '\\' == '\\U0000005C'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\u000a-\\u000a0] - AssertionError: assert '\n' == '\\u000a'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\u000A-\\u000A0] - AssertionError: assert '\n' == '\\u000A'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\U0000000a-\\U0000000a0] - AssertionError: assert '\n' == '\\U0000000a'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\U0000000A-\\U0000000A0] - AssertionError: assert '\n' == '\\U0000000A'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\u00f8-\\u00f80] - AssertionError: assert 'ø' == '\\u00f8'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\u00F8-\\u00F80] - AssertionError: assert 'ø' == '\\u00F8'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\U000000f8-\\U000000f80] - AssertionError: assert 'ø' == '\\U000000f8'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\U000000F8-\\U000000F80] - AssertionError: assert 'ø' == '\\U000000F8'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\u00e6-\\u00e60] - AssertionError: assert 'æ' == '\\u00e6'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\u00E6-\\u00E60] - AssertionError: assert 'æ' == '\\u00E6'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\U000000e6-\\U000000e60] - AssertionError: assert 'æ' == '\\U000000e6'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\U000000E6-\\U000000E60] - AssertionError: assert 'æ' == '\\U000000E6'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\u00e5-\\u00e50] - AssertionError: assert 'å' == '\\u00e5'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\u00E5-\\u00E50] - AssertionError: assert 'å' == '\\u00E5'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\U000000e5-\\U000000e50] - AssertionError: assert 'å' == '\\U000000e5'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\U000000E5-\\U000000E50] - AssertionError: assert 'å' == '\\U000000E5'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\t-\\t0] - AssertionError: assert '\\\t' == '\\t'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\b-\\b0] - AssertionError: assert '\\\x08' == '\\b'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\n-\\n0] - AssertionError: assert '\\\n' == '\\n'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\r-\\r0] - AssertionError: assert '\\\r' == '\\r'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\f-\\f0] - AssertionError: assert '\\\x0c' == '\\f'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\u0041-\\u00412] - AssertionError: assert 'A' == '\\u0041'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\u0041-\\u00413] - AssertionError: assert 'A' == '\\u0041'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\U00000041-\\U000000412] - AssertionError: assert 'A' == '\\U00000041'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\U00000041-\\U000000413] - AssertionError: assert 'A' == '\\U00000041'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\u0031-\\u00312] - AssertionError: assert '1' == '\\u0031'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\u0031-\\u00313] - AssertionError: assert '1' == '\\u0031'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\U00000031-\\U000000312] - AssertionError: assert '1' == '\\U00000031'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\U00000031-\\U000000313] - AssertionError: assert '1' == '\\U00000031'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\u0061-\\u00612] - AssertionError: assert 'a' == '\\u0061'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\u0061-\\u00613] - AssertionError: assert 'a' == '\\u0061'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\U00000061-\\U000000612] - AssertionError: assert 'a' == '\\U00000061'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\U00000061-\\U000000613] - AssertionError: assert 'a' == '\\U00000061'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\u005c-\\u005c1] - AssertionError: assert '\\' == '\\u005c'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\u005C-\\u005C1] - AssertionError: assert '\\' == '\\u005C'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\U0000005c-\\U0000005c1] - AssertionError: assert '\\' == '\\U0000005c'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\U0000005C-\\U0000005C1] - AssertionError: assert '\\' == '\\U0000005C'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\u000a-\\u000a1] - AssertionError: assert '\n' == '\\u000a'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\u000A-\\u000A1] - AssertionError: assert '\n' == '\\u000A'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\U0000000a-\\U0000000a1] - AssertionError: assert '\n' == '\\U0000000a'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\U0000000A-\\U0000000A1] - AssertionError: assert '\n' == '\\U0000000A'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\u00f8-\\u00f81] - AssertionError: assert 'ø' == '\\u00f8'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\u00F8-\\u00F81] - AssertionError: assert 'ø' == '\\u00F8'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\U000000f8-\\U000000f81] - AssertionError: assert 'ø' == '\\U000000f8'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\U000000F8-\\U000000F81] - AssertionError: assert 'ø' == '\\U000000F8'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\u00e6-\\u00e61] - AssertionError: assert 'æ' == '\\u00e6'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\u00E6-\\u00E61] - AssertionError: assert 'æ' == '\\u00E6'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\U000000e6-\\U000000e61] - AssertionError: assert 'æ' == '\\U000000e6'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\U000000E6-\\U000000E61] - AssertionError: assert 'æ' == '\\U000000E6'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\u00e5-\\u00e51] - AssertionError: assert 'å' == '\\u00e5'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\u00E5-\\u00E51] - AssertionError: assert 'å' == '\\u00E5'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\U000000e5-\\U000000e51] - AssertionError: assert 'å' == '\\U000000e5'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\U000000E5-\\U000000E51] - AssertionError: assert 'å' == '\\U000000E5'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\t-\\t1] - AssertionError: assert '\\\t' == '\\t'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\b-\\b1] - AssertionError: assert '\\\x08' == '\\b'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\n-\\n1] - AssertionError: assert '\\\n' == '\\n'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\r-\\r1] - AssertionError: assert '\\\r' == '\\r'
FAILED test/test_turtle_quoting.py::test_parse_correctness[ntriples-\\\\f-\\f1] - AssertionError: assert '\\\x0c' == '\\f'
FAILED test/test_turtle_quoting.py::test_parse_correctness[nquads-\\\\u0041-\\u00410] - AssertionError: assert 'A' == '\\u0041'
FAILED test/test_turtle_quoting.py::test_parse_correctness[nquads-\\\\u0041-\\u00411] - AssertionError: assert 'A' == '\\u0041'
FAILED test/test_turtle_quoting.py::test_parse_correctness[nquads-\\\\U00000041-\\U000000410] - AssertionError: assert 'A' == '\\U00000041'
FAILED test/test_turtle_quoting.py::test_parse_correctness[nquads-\\\\U00000041-\\U000000411] - AssertionError: assert 'A' == '\\U00000041'
FAILED test/test_turtle_quoting.py::test_parse_correctness[nquads-\\\\u0031-\\u00310] - AssertionError: assert '1' == '\\u0031'
FAILED test/test_turtle_quoting.py::test_parse_correctness[nquads-\\\\u0031-\\u00311] - AssertionError: assert '1' == '\\u0031'
FAILED test/test_turtle_quoting.py::test_parse_correctness[nquads-\\\\U00000031-\\U000000310] - AssertionError: assert '1' == '\\U00000031'
FAILED test/test_turtle_quoting.py::test_parse_correctness[nquads-\\\\U00000031-\\U000000311] - AssertionError: assert '1' == '\\U00000031'
FAILED test/test_turtle_quoting.py::test_parse_correctness[nquads-\\\\u0061-\\u00610] - AssertionError: assert 'a' == '\\u0061'
FAILED test/test_turtle_quoting.py::test_parse_correctness[nquads-\\\\u0061-\\u00611] - AssertionError: assert 'a' == '\\u0061'
FAILED test/test_turtle_quoting.py::test_parse_correctness[nquads-\\\\U00000061-\\U000000610] - AssertionError: assert 'a' == '\\U00000061'
FAILED test/test_turtle_quoting.py::test_parse_correctness[nquads-\\\\U00000061-\\U000000611] - AssertionError: assert 'a' == '\\U00000061'
FAILED test/test_turtle_quoting.py::test_parse_correctness[nquads-\\\\u005c-\\u005c] - AssertionError: assert '\\' == '\\u005c'
FAILED test/test_turtle_quoting.py::test_parse_correctness[nquads-\\\\u005C-\\u005C] - AssertionError: assert '\\' == '\\u005C'
FAILED test/test_turtle_quoting.py::test_parse_correctness[nquads-\\\\U0000005c-\\U0000005c] - AssertionError: assert '\\' == '\\U0000005c'
FAILED test/test_turtle_quoting.py::test_parse_correctness[nquads-\\\\U0000005C-\\U0000005C] - AssertionError: assert '\\' == '\\U0000005C'
FAILED test/test_turtle_quoting.py::test_parse_correctness[nquads-\\\\u000a-\\u000a] - AssertionError: assert '\n' == '\\u000a'
FAILED test/test_turtle_quoting.py::test_parse_correctness[nquads-\\\\u000A-\\u000A] - AssertionError: assert '\n' == '\\u000A'
FAILED test/test_turtle_quoting.py::test_parse_correctness[nquads-\\\\U0000000a-\\U0000000a] - AssertionError: assert '\n' == '\\U0000000a'
FAILED test/test_turtle_quoting.py::test_parse_correctness[nquads-\\\\U0000000A-\\U0000000A] - AssertionError: assert '\n' == '\\U0000000A'
FAILED test/test_turtle_quoting.py::test_parse_correctness[nquads-\\\\u00f8-\\u00f8] - AssertionError: assert 'ø' == '\\u00f8'
FAILED test/test_turtle_quoting.py::test_parse_correctness[nquads-\\\\u00F8-\\u00F8] - AssertionError: assert 'ø' == '\\u00F8'
FAILED test/test_turtle_quoting.py::test_parse_correctness[nquads-\\\\U000000f8-\\U000000f8] - AssertionError: assert 'ø' == '\\U000000f8'
FAILED test/test_turtle_quoting.py::test_parse_correctness[nquads-\\\\U000000F8-\\U000000F8] - AssertionError: assert 'ø' == '\\U000000F8'
FAILED test/test_turtle_quoting.py::test_parse_correctness[nquads-\\\\u00e6-\\u00e6] - AssertionError: assert 'æ' == '\\u00e6'
FAILED test/test_turtle_quoting.py::test_parse_correctness[nquads-\\\\u00E6-\\u00E6] - AssertionError: assert 'æ' == '\\u00E6'
FAILED test/test_turtle_quoting.py::test_parse_correctness[nquads-\\\\U000000e6-\\U000000e6] - AssertionError: assert 'æ' == '\\U000000e6'
FAILED test/test_turtle_quoting.py::test_parse_correctness[nquads-\\\\U000000E6-\\U000000E6] - AssertionError: assert 'æ' == '\\U000000E6'
FAILED test/test_turtle_quoting.py::test_parse_correctness[nquads-\\\\u00e5-\\u00e5] - AssertionError: assert 'å' == '\\u00e5'
FAILED test/test_turtle_quoting.py::test_parse_correctness[nquads-\\\\u00E5-\\u00E5] - AssertionError: assert 'å' == '\\u00E5'
FAILED test/test_turtle_quoting.py::test_parse_correctness[nquads-\\\\U000000e5-\\U000000e5] - AssertionError: assert 'å' == '\\U000000e5'
FAILED test/test_turtle_quoting.py::test_parse_correctness[nquads-\\\\U000000E5-\\U000000E5] - AssertionError: assert 'å' == '\\U000000E5'
FAILED test/test_turtle_quoting.py::test_parse_correctness[nquads-\\\\t-\\t] - AssertionError: assert '\\\t' == '\\t'
FAILED test/test_turtle_quoting.py::test_parse_correctness[nquads-\\\\b-\\b] - AssertionError: assert '\\\x08' == '\\b'
FAILED test/test_turtle_quoting.py::test_parse_correctness[nquads-\\\\n-\\n] - AssertionError: assert '\\\n' == '\\n'
FAILED test/test_turtle_quoting.py::test_parse_correctness[nquads-\\\\r-\\r] - AssertionError: assert '\\\r' == '\\r'
FAILED test/test_turtle_quoting.py::test_parse_correctness[nquads-\\\\f-\\f] - AssertionError: assert '\\\x0c' == '\\f'
FAILED test/test_variants.py::test_variants[variants/special_chars] - AssertionError: in both:
========================================================= 175 failed, 1873 passed, 23 xfailed, 34 warnings in 5.61s =========================================================

@aucampia aucampia force-pushed the iwana-20220112T2104-fix_nt_unquote branch 3 times, most recently from 3cffd4d to 47ac111 Compare January 12, 2022 23:01
…sers.

These parsers will now correctly handle strings like `"\\r"`.

The time it takes for these parsers to parse strings with escape
sequences will be increased, and the increase will be correlated with
the amount of escape sequences that occur in a string.

For strings with many escape sequences the parsing speed seems to be
almost 4 times slower.

Also:

- Add graph variant test scaffolding. Multiple files representing the
  same graph can now easily be tested to be isomorphic by just adding
  them in `test/variants`.
- Add more things to `testutils.GraphHelper`, including some methods that does
  asserts with better messages. Also include some tests for GraphHelper.
- Add some extra files to test_roundtrip, set the default identifier
  when parsing, and change verbose flag to rather be based on debug
  logging.
- move one test from `test/test_issue247.py` to variants.
- Fix problems with `.editorconfig` which prevents it from working
  properly.
- Add xfail tests for a couple of issues
  This includes xfails for the following issues:
  - RDFLib#1216
  - RDFLib#1649
@aucampia aucampia force-pushed the iwana-20220112T2104-fix_nt_unquote branch from 47ac111 to 8bb9dbd Compare January 15, 2022 03:04
@aucampia aucampia marked this pull request as ready for review January 15, 2022 03:14
Copy link
Member

@nicholascar nicholascar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good, thanks @aucampia

@aucampia
Copy link
Member Author

aucampia commented Jan 16, 2022

Thanks to KirkMcDonald on irc://libera.chat/python I did a comparison between using a dict lookup and str.translate, and str.translate for this use case (1 char string) is actually about twice as slow, so I'm going to change the str.translate to just dict lookups.

---------------------------------------------------------------------------------------- benchmark: 2 tests ---------------------------------------------------------------------------------------
Name (time in ms)                      Min                Max               Mean            StdDev             Median               IQR            Outliers       OPS            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_translate_performance[dt]      5.4709 (1.0)      13.2681 (1.0)       6.4253 (1.0)      1.7428 (2.19)      5.6425 (1.0)      0.8129 (1.0)         12;13  155.6354 (1.0)         101           1
test_translate_performance[tr]     11.5679 (2.11)     16.4249 (1.24)     12.2162 (1.90)     0.7971 (1.0)      12.0116 (2.13)     0.8868 (1.09)          8;2   81.8584 (0.53)         82           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Test code: https://github.com/aucampia/rdflib-scratchpad/blob/e72d3e3cfcb2edc5763797899f292a865fdb1758/tests/test_translate_performance.py

`str.translate` is not faster than dict lookup for this case, which is a
one char lookup. For more info see
RDFLib#1663 (comment)
@aucampia
Copy link
Member Author

so I'm going to change the str.translate to just dict lookups.

done now.

@aucampia
Copy link
Member Author

Will merge tonight (CET) I think. Planning to add tests for #1664 (which seem to not be an issue anymore) and #1661 on top of this.

@aucampia aucampia merged commit b654142 into RDFLib:master Jan 18, 2022
@aucampia aucampia deleted the iwana-20220112T2104-fix_nt_unquote branch April 9, 2022 14:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Some special characters might be parsed wrongly (?)
2 participants