-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues/#207 turtle pretty print #290
Issues/#207 turtle pretty print #290
Conversation
Signed-off-by: Peter Ansell <p_ansell@yahoo.com>
Note: trig pretty writer is failing for some tests still at this point Signed-off-by: Peter Ansell <p_ansell@yahoo.com>
TriGWriter wasn't sending statements to be buffered correctly Signed-off-by: Peter Ansell <p_ansell@yahoo.com>
@@ -100,6 +100,32 @@ public void handleStatement(Statement st) | |||
throw new RuntimeException("Document writing has not yet been started"); | |||
} | |||
|
|||
// If we are pretty-printing, all writing is buffered until endRDF is | |||
// called |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid OoM errors, perhaps we can use a configurable buffer size instead of just waiting for endRDF; similar to what is done in BufferedGroupingRDFHandler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trouble with that is that it isn't a full solution and will corrupt some datasets that rely heavily on blank node use if we anonymise any blank nodes at all. For example, there are parts of the OWL to RDF translation which insist on the use of blank nodes and not IRIs which could be corrupted by a buffering/anonymising solution depending on how the statements are delivered. The pure streaming approach and the full buffering approach will never corrupt datasets that don't OOM.
There may need to be another option to switch on/off anonymising blank nodes that could be used to specify whether local grouping/buffering occurred rather than a full buffer.
One issue I have noticed is that Pretty-Printing setting is on by default, and we probably want to switch that in this pull request as I don't see pretty-printing as a production feature for any scalable system, even for the other formats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, hadn't thought of that.
…us blank node notation Signed-off-by: Peter Ansell <p_ansell@yahoo.com>
Signed-off-by: Peter Ansell <p_ansell@yahoo.com>
…o break Signed-off-by: Peter Ansell <p_ansell@yahoo.com>
Signed-off-by: Peter Ansell <p_ansell@yahoo.com>
…ll stack traces get through Signed-off-by: Peter Ansell <p_ansell@yahoo.com>
Signed-off-by: Peter Ansell <p_ansell@yahoo.com>
…ter tests for future use Signed-off-by: Peter Ansell <p_ansell@yahoo.com>
Signed-off-by: Peter Ansell <p_ansell@yahoo.com>
… often Signed-off-by: Peter Ansell <p_ansell@yahoo.com>
… by design or default Signed-off-by: Peter Ansell <p_ansell@yahoo.com>
…with strange IRIs Signed-off-by: Peter Ansell <p_ansell@yahoo.com>
Signed-off-by: Peter Ansell <p_ansell@yahoo.com>
Signed-off-by: Peter Ansell <p_ansell@yahoo.com>
Signed-off-by: Peter Ansell <p_ansell@yahoo.com>
…contexts Signed-off-by: Peter Ansell <p_ansell@yahoo.com>
Signed-off-by: Peter Ansell <p_ansell@yahoo.com>
…heck, so isolate it further Signed-off-by: Peter Ansell <p_ansell@yahoo.com>
Signed-off-by: Peter Ansell <p_ansell@yahoo.com>
Signed-off-by: Peter Ansell <p_ansell@yahoo.com>
…ect/object Signed-off-by: Peter Ansell <p_ansell@yahoo.com>
As they both close the active context, they will both interfere with pretty-printing in some corner cases and need to check the spec to make sure they need to have this effect. Signed-off-by: Peter Ansell <p_ansell@yahoo.com>
…Comment closes context Signed-off-by: Peter Ansell <p_ansell@yahoo.com>
Signed-off-by: Peter Ansell <p_ansell@yahoo.com>
…king pretty printing Signed-off-by: Peter Ansell <p_ansell@yahoo.com>
…d handleComment Signed-off-by: Peter Ansell <p_ansell@yahoo.com>
…ot with it Signed-off-by: Peter Ansell <p_ansell@yahoo.com>
Preparing to move these tests to RDFWriterTest to verify the patterns don't fail parsing for any other parsers other than TriG Signed-off-by: Peter Ansell <p_ansell@yahoo.com>
…tions The patterns shouldn't fail for any writer/parser combinations. Signed-off-by: Peter Ansell <p_ansell@yahoo.com>
…failing tests Signed-off-by: Peter Ansell <p_ansell@yahoo.com>
Signed-off-by: Peter Ansell <p_ansell@yahoo.com>
…r BinaryRDF Signed-off-by: Peter Ansell <p_ansell@yahoo.com>
They needed to be protected while the tests were being bootstrapped in TriGPrettyWriterTest, but not after they were relocated to RDFWriterTest Signed-off-by: Peter Ansell <p_ansell@yahoo.com>
Signed-off-by: Peter Ansell <p_ansell@yahoo.com>
@jeenbroekstra and others, this is ready for more review (after the 2.0 release is out the door). Many regression tests have been added to verify round tripping for the blank node anonymisation for Turtle/TriG, some of which also made visible regressions in the other parsers which have also been fixed as part of this pull request. No attempt was made to pretty-print list structures so far, but this could still be useful on its own. |
Hudson reports a test failure on this PR. See https://hudson.eclipse.org/rdf4j/job/rdf4j-verify-pr/10/ |
I can't replicate that failure locally and I didn't modify any of the query result io code in this pull request so this pull request may not be the cause there. It is hypothetically possible that it may have been related to a previous pull request of mine that did change query resultio code, but the exception is intermittent and NPE isn't the easiest thing to diagnose with multi-threading code :( I thought we had fixed the NPE's from BackgroundQueryResult at some stage in the past as they haven't occurred for a while.
|
One bug report from sesame history that references handleClose and NPE is: |
I also reported a very similar stack trace during the migration to the new httpclient as part of the pull request review: |
This is good to be reviewed/merged from my point of view, as the two intermittent bugs that my testing (and hudson) has been finding are being dealt with separately and are not only happening on this branch. It is a fairly useful form of Turtle pretty-printing with the major exception that it does not include lists so far, and would need quite a bit of work to support lists in general as there are quite a few rabbit holes to encounter on the way to that goal. |
Looks good to me. This is already a useful improvement in itself and it nicely localizes the pretty-printing logic, so that for future extensions (handling lists etc) we need only tweak that part of the code. |
// Cannot shorten this blank node as it is used as the object of a statement somewhere | ||
// so must be written in a non-anonymous form | ||
canShortenSubjectBNode = false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be able to handle this kind of nested blank nodes (such as in anonymous objects, or list structures) we need to build up a dependency mapping of some sort. Or rather: if we have a case where the subject is the object of another statement, we can still shorten it if we first print that other statement.
Anyway, I realize there's more to it than that, so I'm fine with the implementation as-is for now. Just trying to think about future extension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking around future extensions, as you say: it seems to me someway similar to a naive framing algorithm: one could populate a "lazy" collection (stream?) of subject references, then try to resolve them for formatting when needed.
At the same time it could be possible to apply CURIEs or other short representation on the subject uris, as they can improve readability for turtle.
I'm happy with this as-is. Do you wish to merge now or do you have further things you want to put in? |
No more things for now, merge as is. |
This PR addresses GitHub issue: #207
Briefly describe the changes proposed in this PR:
Make sure you've followed the Contributor Guidelines. In particular (please tick to indicate you've taken care of it):
Note that the current structure is not designed to allow pretty-printing of nested anonymous structures, and in particular, does not support pretty printed lists/collections. It is a medium ground between almost no pretty printing, and something that looks simpler to verify compared to arbitrary N-Triples/N-Quads.
Note, this is targeted as master, so does not involve the 2.0 release.