Make parsers CharacterStream aware #1145

ashleysommer · 2020-08-19T02:27:17Z

Went to do a simple fix for #1144 and ended up creating quite a big (and IMHO important) set of changes.
There are two NTriples parsers in RDFLib. One in /plugins/parsers/nt.py called NTParser and another in /plugins/parsers/ntriples.py called NTriplesParser.
The latter is the original reference implementation of the NTriples W3C standard as provided by W3C.
It is a legacy style parser which takes a file which is an open filepointer, and when run it emits triples into a Sink.
The other NTParser in nt.py is a wrapper around the legacy parser, it adds rdflib compatibility, takes an rdflib.InputSource as input and emits triples to a rdflib.Graph.
This PR puts both in the same file, and renames the legacy NTriplesParser to W3CNTriplesParser to avoid confusion.

The most important change in here is adding CharacterStream support to the rdflib InputSource. This allows parsers to read unicode streams directly from the input source, as opposed to reading from the inputsource.ByteStream then converting to str with data.decode(). Often the InputSource was already a string to begin with. PR changes some parsers to prefer reading from the inputsource.CharacterStream if available instead of the ByteStream, this removes many useless string->bytes->bytestream->textstream->string conversions which were happening in the Parser pipelines.

Merged two Ntriples parser files
Changed name of NTriplesParser to W3CNTriplesParser, it is the legacy parser
Populate CharacterStream attr on several types of rdflib InputSource, to provide unicode text stream, in addition to ByteStream
Add support to N3, Trig, NTriples, NQuads parsers to use the CharacterStream instead of the ByteStream where possible
Reduces many useless string->bytes->string conversions in parsers.
Added tests for N-triples parser: reading a file fails without binary mode on Python 3.6 #1144 fix
All tests pass after these changes
Fixes N-triples parser: reading a file fails without binary mode on Python 3.6 #1144

Changed name of NTriplesParser to W3CNTriplesParser, it is the legacy parser Populate CharacterStream attr on several types of rdflib InputSource, to provide unicode text stream, in addition to ByteStream Add support to N3, Trig, NTriples, NQuads parsers to use the CharacterStream instead of the ByteStream where possible Reduces many useless string->bytes->string conversions in parsers.

coveralls · 2020-08-19T02:30:13Z

Coverage decreased (-0.1%) to 75.65% when pulling ceab6b2 on ashleysommer:fix_1144 into 89cb369 on RDFLib:master.

coveralls · 2020-08-19T02:30:13Z

Coverage decreased (-0.1%) to 75.65% when pulling ceab6b2 on ashleysommer:fix_1144 into 89cb369 on RDFLib:master.

coveralls · 2020-08-19T02:30:13Z

Coverage decreased (-0.2%) to 75.627% when pulling ceab6b2 on ashleysommer:fix_1144 into 89cb369 on RDFLib:master.

nicholascar

Good tidy-up and it's closing an issue while passing all tests so an easy approve!

ashleysommer requested a review from nicholascar August 19, 2020 02:30

ashleysommer mentioned this pull request Aug 19, 2020

N-triples parser: reading a file fails without binary mode on Python 3.6 #1144

Closed

nicholascar approved these changes Aug 23, 2020

View reviewed changes

nicholascar merged commit 9429538 into RDFLib:master Aug 23, 2020

osma mentioned this pull request Nov 4, 2020

Work around N-Triples parser issue by using N3 parser instead NatLibFi/Skosify#78

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make parsers CharacterStream aware #1145

Make parsers CharacterStream aware #1145

ashleysommer commented Aug 19, 2020 •

edited

coveralls commented Aug 19, 2020

coveralls commented Aug 19, 2020

coveralls commented Aug 19, 2020 •

edited

nicholascar left a comment

Make parsers CharacterStream aware #1145

Make parsers CharacterStream aware #1145

Conversation

ashleysommer commented Aug 19, 2020 • edited

coveralls commented Aug 19, 2020

coveralls commented Aug 19, 2020

coveralls commented Aug 19, 2020 • edited

nicholascar left a comment

Choose a reason for hiding this comment

ashleysommer commented Aug 19, 2020 •

edited

coveralls commented Aug 19, 2020 •

edited