Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange bug of notation3 serialization that occur with “probability” #1701

Closed
blackwint3r opened this issue Jan 31, 2022 · 5 comments · Fixed by #1858
Closed

Strange bug of notation3 serialization that occur with “probability” #1701

blackwint3r opened this issue Jan 31, 2022 · 5 comments · Fixed by #1858
Assignees
Labels
bug Something isn't working format: N3 Related to N3 format. serialization Related to serialization.

Comments

@blackwint3r
Copy link

blackwint3r commented Jan 31, 2022

Version:
rdflib 6.1.1
python 3.8.10

I have a test.py:

import rdflib
data1='''
@prefix : <http://www.example.com/dev#>.
:a :b :c.
'''
data2='''
@prefix : <http://www.example.com/dev#>.
{:a :b :c}=>{:d :e :f}.
'''
g = rdflib.Graph()
g.parse(data=data1, format = 'n3')
g.parse(data=data2, format = 'n3')
r = g.serialize(format='n3')
print(r)

The output of this test.py has 2 possibilities.
The first one (which is correct):

@prefix : <http://www.example.com/dev#> .

:a :b :c .

{
    :a :b :c .

} => {
        :d :e :f .

    } .

The second one (missing the triple parsed at the first time):

@prefix : <http://www.example.com/dev#> .

{
    :a :b :c .

} => {
        :d :e :f .

    } .

This strange bug appears in a completely random way. In order to count the "probability" of its occurrence, I wrote a loop like this:

num = 10000
error =0
for i in range(num):
    g = rdflib.Graph()
    g.parse(data=data1, format = 'n3')
    g.parse(data=data2, format = 'n3')
    r = g.serialize(format='n3')
    #print(g.serialize(format='n3'))
    index = (r.find(":a :b :c"))
    #print(index)
    if index > 43:  #second case occur
        error +=1
print("error =" ,error)

Each execution gives a different result, like error= 8533, 5693, 437... As the num increase, the error rate do not converge to a specific value. Can't find any patterns.

@aucampia aucampia self-assigned this Apr 9, 2022
@aucampia aucampia added the bug Something isn't working label Apr 9, 2022
@aucampia
Copy link
Member

Seems to be the same issue as #1807

@aucampia
Copy link
Member

This seems to happen every time the first element in subjects_list here is the quoted graph:

subjects_list = self.orderSubjects()

I'm guessing this may be because because the serializer thinks it already serialized the triple as it has been serialized inside the quoted graph. Checking further.

@aucampia
Copy link
Member

Indeed it is, debugging inside master...aucampia:iwana-20220419T2343-n3_serialize_quoted_graph

20220419T234518 iwana@iwana-pc00.coop.no:~/sw/d/github.com/iafork/rdflib.cleanish
$ .venv/bin/python3 -m pytest  'test/test_issues/test_issue1701.py::test_issue1701_a' -rA --log-cli-level DEBUG
============================================================================ test session starts ============================================================================
platform linux -- Python 3.7.13, pytest-7.1.1, pluggy-1.0.0
rootdir: /home/iwana/sw/d/github.com/iafork/rdflib.cleanish, configfile: pyproject.toml
plugins: subtests-0.7.0, md-report-0.2.0, cov-3.0.0
collected 1 item                                                                                                                                                            

test/test_issues/test_issue1701.py::test_issue1701_a 
------------------------------------------------------------------------------- live log call -------------------------------------------------------------------------------
2022-04-19T23:45:23 DEBUG    root         turtle.py:228:serialize entry ...
2022-04-19T23:45:23 DEBUG    root         turtle.py:242:serialize subjects_list = [<Graph identifier=_:Formula3 (<class 'rdflib.graph.QuotedGraph'>)>, rdflib.term.URIRef('http://example.com/a'), rdflib.term.URIRef('http://example.com/d')]
2022-04-19T23:45:23 DEBUG    root         turtle.py:249:serialize subject (isDone=False) = {this rdflib.identifier _:_:Formula3;rdflib:storage [a rdflib:Store;rdfs:label 'Memory']}
2022-04-19T23:45:23 DEBUG    root         turtle.py:228:serialize entry ...
2022-04-19T23:45:23 DEBUG    root         turtle.py:242:serialize subjects_list = [rdflib.term.URIRef('http://example.com/a')]
2022-04-19T23:45:23 DEBUG    root         turtle.py:249:serialize subject (isDone=False) = http://example.com/a
2022-04-19T23:45:23 DEBUG    root         turtle.py:228:serialize entry ...
2022-04-19T23:45:23 DEBUG    root         turtle.py:242:serialize subjects_list = [rdflib.term.URIRef('http://example.com/d')]
2022-04-19T23:45:23 DEBUG    root         turtle.py:249:serialize subject (isDone=False) = http://example.com/d
2022-04-19T23:45:23 DEBUG    root         turtle.py:249:serialize subject (isDone=True) = http://example.com/a
2022-04-19T23:45:23 DEBUG    root         turtle.py:249:serialize subject (isDone=True) = http://example.com/d
2022-04-19T23:45:23 DEBUG    root         test_issue1701.py:42:test_issue1701_a data_s = @prefix : <http://example.com/> .

{
    :a :b :c .

} => {
        :d :e :f .

    } .


XFAIL ()                                                                                                                                                              [100%]

========================================================================== short test summary info ==========================================================================
XFAIL test/test_issues/test_issue1701.py::test_issue1701_a
  
https://github.com/RDFLib/rdflib/issues/1701

============================================================================ 1 xfailed in 0.11s =============================================================================

@aucampia
Copy link
Member

Still digging around here, but I suspect this is wrong:

def preprocessTriple(self, triple):
super(N3Serializer, self).preprocessTriple(triple)
if isinstance(triple[0], Graph):
for t in triple[0]:
self.preprocessTriple(t)
if isinstance(triple[2], Graph):
for t in triple[2]:
self.preprocessTriple(t)

I don't think this should be processing the triples in quoted graphs.

@aucampia
Copy link
Member

This seems to do the trick, will look further tomorrow:

diff --git a/rdflib/plugins/serializers/n3.py b/rdflib/plugins/serializers/n3.py
index f82a08a2..1e91d8f3 100644
--- a/rdflib/plugins/serializers/n3.py
+++ b/rdflib/plugins/serializers/n3.py
@@ -25,13 +25,14 @@ class N3Serializer(TurtleSerializer):
 
     def subjectDone(self, subject):
         super(N3Serializer, self).subjectDone(subject)
-        if self.parent:
-            self.parent.subjectDone(subject)
+        # if self.parent:
+        #     self.parent.subjectDone(subject)
 
     def isDone(self, subject):
-        return super(N3Serializer, self).isDone(subject) and (
-            not self.parent or self.parent.isDone(subject)
-        )
+        return super(N3Serializer, self).isDone(subject)
+        # return super(N3Serializer, self).isDone(subject) and (
+        #     not self.parent or self.parent.isDone(subject)
+        # )
 
     def startDocument(self):
         super(N3Serializer, self).startDocument()
@@ -65,12 +66,12 @@ class N3Serializer(TurtleSerializer):
 
     def preprocessTriple(self, triple):
         super(N3Serializer, self).preprocessTriple(triple)
-        if isinstance(triple[0], Graph):
-            for t in triple[0]:
-                self.preprocessTriple(t)
-        if isinstance(triple[2], Graph):
-            for t in triple[2]:
-                self.preprocessTriple(t)
+        # if isinstance(triple[0], Graph):
+        #     for t in triple[0]:
+        #         self.preprocessTriple(t)
+        # if isinstance(triple[2], Graph):
+        #     for t in triple[2]:
+        #         self.preprocessTriple(t)
 
     def getQName(self, uri, gen_prefix=True):
         qname = None

@aucampia aucampia added serialization Related to serialization. format: N3 Related to N3 format. labels Apr 19, 2022
aucampia added a commit to aucampia/rdflib that referenced this issue Apr 23, 2022
This patch fixes two issues with the N3 serializer:
- The N3 serializer incorrectly considered a subject as already
  serialized if it has been serialized inside a quoted graph.
- The N3 serializer does not consider that the predicate of
  a triple can also be a graph.

Other changes included in this patch:
- Added the N3 test suite from https://github.com/w3c/N3/tree/master/tests
- Added `test/data/fetcher.py` which fetches remote test data.
- Changed `test.testutils.GraphHelper` to support nested graphs.

Fixes:
- RDFLib#1807
- RDFLib#1701

Related:
- RDFLib#1840
aucampia added a commit to aucampia/rdflib that referenced this issue Apr 23, 2022
This patch fixes two issues with the N3 serializer:
- The N3 serializer incorrectly considered a subject as already
  serialized if it has been serialized inside a quoted graph.
- The N3 serializer does not consider that the predicate of
  a triple can also be a graph.

Other changes included in this patch:
- Added the N3 test suite from https://github.com/w3c/N3/tree/master/tests
- Added `test/data/fetcher.py` which fetches remote test data.
- Changed `test.testutils.GraphHelper` to support nested graphs.

Fixes:
- RDFLib#1807
- RDFLib#1701

Related:
- RDFLib#1840
aucampia added a commit to aucampia/rdflib that referenced this issue Apr 23, 2022
This patch adds the N3 test suite from https://github.com/w3c/N3/tree/master/tests
and also adds `test/data/fetcher.py` which fetches remote test data.

Remotes are added for some data in the test data directory, more will be
added later and the data itself will be corrected.

I'm mainly doing this because I want N3 test data to test the fix I'm
making for these issues:
- RDFLib#1807
- RDFLib#1701

Related to:
- RDFLib#1840
aucampia added a commit to aucampia/rdflib that referenced this issue Apr 23, 2022
This patch fixes two issues with the N3 serializer:
- The N3 serializer incorrectly considered a subject as already
  serialized if it has been serialized inside a quoted graph.
- The N3 serializer does not consider that the predicate of
  a triple can also be a graph.

Other changes included in this patch:
- Changed `test.testutils.GraphHelper` to support nested graphs.

Fixes:
- RDFLib#1807
- RDFLib#1701
aucampia added a commit to aucampia/rdflib that referenced this issue Apr 23, 2022
This patch fixes two issues with the N3 serializer:
- The N3 serializer incorrectly considered a subject as already
  serialized if it has been serialized inside a quoted graph.
- The N3 serializer does not consider that the predicate of
  a triple can also be a graph.

Other changes included in this patch:
- Changed `test.testutils.GraphHelper` to support nested graphs.

Fixes:
- RDFLib#1807
- RDFLib#1701
aucampia added a commit to aucampia/rdflib that referenced this issue Apr 23, 2022
This patch fixes two issues with the N3 serializer:
- The N3 serializer incorrectly considered a subject as already
  serialized if it has been serialized inside a quoted graph.
- The N3 serializer does not consider that the predicate of
  a triple can also be a graph.

Other changes included in this patch:
- Changed `test.testutils.GraphHelper` to support nested/quoted graphs.
- Moved the tests from `test/test_n3_formula.py` into
  `test/test_serializers/test_serializer_n3.py`.
- Include positive syntax tests from the N3 test suite that is smaller
  than 1024KB and that is not using new N3 syntax into round trip tests.
  This is mainly to check that there is no regressions after the changes
  made.

Fixes:
- RDFLib#1807
- RDFLib#1701
@aucampia aucampia linked a pull request Apr 23, 2022 that will close this issue
4 tasks
aucampia added a commit to aucampia/rdflib that referenced this issue Apr 24, 2022
This patch fixes two issues with the N3 serializer:
- The N3 serializer incorrectly considered a subject as already
  serialized if it has been serialized inside a quoted graph.
- The N3 serializer does not consider that the predicate of
  a triple can also be a graph.

Other changes included in this patch:
- Changed `test.testutils.GraphHelper` to support nested/quoted graphs.
- Moved the tests from `test/test_n3_formula.py` into
  `test/test_serializers/test_serializer_n3.py`.
- Include positive syntax tests from the N3 test suite that is smaller
  than 1024KB and that is not using new N3 syntax into round trip tests.
  This is mainly to check that there is no regressions after the changes
  made.

Fixes:
- RDFLib#1807
- RDFLib#1701
aucampia added a commit to aucampia/rdflib that referenced this issue Apr 24, 2022
This patch fixes two issues with the N3 serializer:
- The N3 serializer incorrectly considered a subject as already
  serialized if it has been serialized inside a quoted graph.
- The N3 serializer does not consider that the predicate of
  a triple can also be a graph.

Other changes included in this patch:
- Changed `test.testutils.GraphHelper` to support nested/quoted graphs.
- Moved the tests from `test/test_n3_formula.py` into
  `test/test_serializers/test_serializer_n3.py`.
- Include positive syntax tests from the N3 test suite that is smaller
  than 1024KB and that is not using new N3 syntax into round trip tests.
  This is mainly to check that there is no regressions after the changes
  made.

Fixes:
- RDFLib#1807
- RDFLib#1701
aucampia added a commit to aucampia/rdflib that referenced this issue Apr 24, 2022
This patch fixes two issues with the N3 serializer:
- The N3 serializer incorrectly considered a subject as already
  serialized if it has been serialized inside a quoted graph.
- The N3 serializer does not consider that the predicate of
  a triple can also be a graph.

Other changes included in this patch:
- Changed `test.testutils.GraphHelper` to support nested/quoted graphs.
- Moved the tests from `test/test_n3_formula.py` into
  `test/test_serializers/test_serializer_n3.py`.
- Include positive syntax tests from the N3 test suite that is smaller
  than 1024KB and that is not using new N3 syntax into round trip tests.
  This is mainly to check that there is no regressions after the changes
  made.

Fixes:
- RDFLib#1807
- RDFLib#1701
aucampia added a commit to aucampia/rdflib that referenced this issue Apr 30, 2022
This patch fixes two issues with the N3 serializer:
- The N3 serializer incorrectly considered a subject as already
  serialized if it has been serialized inside a quoted graph.
- The N3 serializer does not consider that the predicate of
  a triple can also be a graph.

Other changes included in this patch:
- Changed `test.testutils.GraphHelper` to support nested/quoted graphs.
- Moved the tests from `test/test_n3_formula.py` into
  `test/test_serializers/test_serializer_n3.py`.
- Include positive syntax tests from the N3 test suite that is smaller
  than 1024KB and that is not using new N3 syntax into round trip tests.
  This is mainly to check that there is no regressions after the changes
  made.

Fixes:
- RDFLib#1807
- RDFLib#1701
aucampia added a commit to aucampia/rdflib that referenced this issue May 3, 2022
This patch fixes two issues with the N3 serializer:
- The N3 serializer incorrectly considered a subject as already
  serialized if it has been serialized inside a quoted graph.
- The N3 serializer does not consider that the predicate of
  a triple can also be a graph.

Other changes included in this patch:
- Changed `test.testutils.GraphHelper` to support nested/quoted graphs.
- Moved the tests from `test/test_n3_formula.py` into
  `test/test_serializers/test_serializer_n3.py`.
- Include positive syntax tests from the N3 test suite that is smaller
  than 1024KB and that is not using new N3 syntax into round trip tests.
  This is mainly to check that there is no regressions after the changes
  made.

Fixes:
- RDFLib#1807
- RDFLib#1701
aucampia added a commit to aucampia/rdflib that referenced this issue May 4, 2022
This patch fixes two issues with the N3 serializer:
- The N3 serializer incorrectly considered a subject as already
  serialized if it has been serialized inside a quoted graph.
- The N3 serializer does not consider that the predicate of
  a triple can also be a graph.

Other changes included in this patch:
- Changed `test.testutils.GraphHelper` to support nested/quoted graphs.
- Moved the tests from `test/test_n3_formula.py` into
  `test/test_serializers/test_serializer_n3.py`.
- Include positive syntax tests from the N3 test suite that is smaller
  than 1024KB and that is not using new N3 syntax into round trip tests.
  This is mainly to check that there is no regressions after the changes
  made.

Fixes:
- RDFLib#1807
- RDFLib#1701
aucampia added a commit to aucampia/rdflib that referenced this issue May 12, 2022
This patch fixes two issues with the N3 serializer:
- The N3 serializer incorrectly considered a subject as already
  serialized if it has been serialized inside a quoted graph.
- The N3 serializer does not consider that the predicate of
  a triple can also be a graph.

Other changes included in this patch:
- Changed `test.testutils.GraphHelper` to support nested/quoted graphs.
- Moved the tests from `test/test_n3_formula.py` into
  `test/test_serializers/test_serializer_n3.py`.
- Include positive syntax tests from the N3 test suite that is smaller
  than 1024KB and that is not using new N3 syntax into round trip tests.
  This is mainly to check that there is no regressions after the changes
  made.

Fixes:
- RDFLib#1807
- RDFLib#1701
aucampia added a commit that referenced this issue May 16, 2022
This patch fixes two issues with the N3 serializer:
- The N3 serializer incorrectly considered a subject as already
  serialized if it has been serialized inside a quoted graph.
- The N3 serializer does not consider that the predicate of
  a triple can also be a graph.

Other changes included in this patch:
- Changed `test.testutils.GraphHelper` to support nested/quoted graphs.
- Moved the tests from `test/test_n3_formula.py` into
  `test/test_serializers/test_serializer_n3.py`.
- Include positive syntax tests from the N3 test suite that is smaller
  than 1024KB and that is not using new N3 syntax into round trip tests.
  This is mainly to check that there is no regressions after the changes
  made.

Fixes:
- #1807
- #1701
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working format: N3 Related to N3 format. serialization Related to serialization.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants