Dataset cleanup - Store API for graph method #309

gromgull · 2013-06-26T18:26:30Z

Cleaning up Dataset class, adding graph tracking to store API, as
discussed in #307

Summary of changes:

added methods add_graph and remove_graph to the Store
API, implemented these for Sleepycat and IOMemory. A flag,
graph_awareness is set on the store if they methods are
supported, default implementations will raise an exception.
made the dataset require a store with the graph_awareness
flag set.
removed the graph-state kept in the Dataset class directly.
removed dataset.add_quads, remove_quads methods. The
add/remove methods of ConjunctiveGraph are smart enough
to work with triples or quads.
removed the dataset.graphs method - it now does exactly the
same as contexts
added a default_union flag to Graphs, ConjunctiveGraph has this set to True, and Dataset to False
cleaned up a bit more confusion of whether Graph instance or the
Graph identifiers are passed to store methods. (Think about __iadd__, __isub__ etc. for ConjunctiveGraph #225)

Comments:

The use-case where a Dataset exposes only some of the graphs that
exist in the store is not supported.
I have not thought about transactions, and how creating or removing
a graph fit into a transaction. It is sort irrelevant, as we do not
have any transaction-aware stores.

Questions:

Should dataset.context return the default graph?
dataset.quads return None as context for the triples in
the default graph, this would suggest "no". (currently it does)
Do we really need dataset.graph AND dataset.get_context?
get_context return a Graph, but does not create it.
graph creates the graphs AND will also make a new graph
identified by a skolemized bnode if called without arguments.
Should it be possible to disable graph_awareness for a Store?
The only change would be whether a graph is removed once it is
empty. I guess noone relies on this, as it is not implemented
correctly in IOMemory or Sleepycat at the moment :)

gromgull · 2013-06-26T21:07:05Z

And ignore the fact that some of the old unit tests fail, if we agree that this is ok I'll tidy it up

uholzer · 2013-06-27T09:51:31Z

Some comments:

Dataset.DEFAULT does not exist anymore, but I want to get the default graph as context. Can I use DATASET_DEFAULT_GRAPH_ID now? Is there an other way. Of course I considered using the Dataset directly to mess with the Default graph, but this way I would need to add several case distinctions in my code. Is there another way to parse into the default graph using Dataset.parse?

The second thing is that I tried to parse an empty graph:

>>> d = rdflib.Dataset()
>>> d.parse(data='', format='turtle')
<Graph identifier=N3759e73365ba47b7b4002c3558ba4f01 (<class 'rdflib.graph.Graph'>)>
>>> for g in d.contexts(): print(g)
... 
<urn:x-rdflib:default> a rdfg:Graph;rdflib:storage [a rdflib:Store;rdfs:label 'IOMemory'].

As you see, it's not there, but since parse returns a graph, I expect that it has been added to the Dataset.

gromgull · 2013-06-27T10:03:06Z

You can use DATASET_DEFAULT_GRAPH_ID, but I didn't really consider it part of the public API.

Perhaps adding a get_default_graph method?

The parse problem is interesting, I'll have to look at the internals of the parser... and that means each and every parser....

uholzer · 2013-06-27T10:08:37Z

The parse problem is interesting, I'll have to look at the internals of the parser... and that means each and every parser...

Really? I mean, it DOES return a graph ...
Isn't just ConjunctiveGraph.parse the problem which does not add the graph it has created? The graph only turns up as context if non-empty, because it is backed by the same store ...

gromgull · 2013-06-27T10:13:13Z

Hmm - yes that solves to issue when only a single graph is created. But if you parse a trix document with empty graphs? (or trig, if we had parser) - they will create several graphs, but each parser is free to decide whether to use addN or get_context to add triples in particular graph/context.

I can do the simple fix now, and maybe clean up the other one when we solve #283 (i started this in a not yet pushed branch)

iherman · 2013-06-27T10:26:48Z

At moment, what I have (in my internal stuff) is

ds.default_graph_id
ds.default_graph

I also kept the Dataset.DEFAULT as a shorthand that a user can use although,
thinking about it further, it is probably be superfluous. The difference is that
Dataset.DEFAULT is a symbolic constant, so to say, independently of the Dataset
instance, whereas ds.default_graph_id is an instance variable, initialized by
the Dataset. But that should be ok...

Ivan

Urs Holzer wrote:

Some comments:

|Dataset.DEFAULT| does not exist anymore, but I want to get the default graph as
context. Can I use |DATASET_DEFAULT_GRAPH_ID| now? Is there an other way. Of
course I considered using the Dataset directly to mess with the Default graph,
but this way I would need to add several case distinctions in my code. Is there
another way to parse into the default graph using Dataset.parse?

The second thing is that I tried to parse an empty graph:

|>>> d = rdflib.Dataset()

d.parse(data='', format='turtle')
<Graph identifier=N3759e73365ba47b7b4002c3558ba4f01 (<class 'rdflib.graph.Graph'>)>
for g in d.contexts(): print(g)
...
urn:x-rdflib:default a rdfg:Graph;rdflib:storage [a rdflib:Store;rdfs:label 'IOMemory'].
|

As you see, it's not there, but since parse returns a graph, I expect that it
has been added to the Dataset.

—
Reply to this email directly or view it on GitHub
#309 (comment).

Ivan Herman
4, rue Beauvallon, Clos St. Joseph
13090 Aix-en-Provence
France
tel: +31-64-1044153 ou +33 6 52 46 00 43
http://www.ivan-herman.net

uholzer · 2013-06-27T14:47:40Z

I forgot about ConjunctiveGraph.default_context. Is this one part of the public API?

iherman · 2013-06-27T14:52:33Z

Yep.

Ivan

Urs Holzer wrote:

I forgot about |ConjunctiveGraph.default_context|. Is this one part of the
public API?

—
Reply to this email directly or view it on GitHub
#309 (comment).

Ivan Herman
4, rue Beauvallon, Clos St. Joseph
13090 Aix-en-Provence
France
tel: +31-64-1044153 ou +33 6 52 46 00 43
http://www.ivan-herman.net

discussed in #307 Summary of changes: * added methods ```add_graph``` and ```remove_graph``` to the Store API, implemented these for Sleepycat and IOMemory. A flag, ```graph_awareness``` is set on the store if they methods are supported, default implementations will raise an exception. * made the dataset require a store with the ```graph_awareness``` flag set. * removed the graph-state kept in the ```Dataset``` class directly. * removed ```dataset.add_quads```, ```remove_quads``` methods. The ```add/remove``` methods of ```ConjunctiveGraph``` are smart enough to work with triples or quads. * removed the ```dataset.graphs``` method - it now does exactly the same as ```contexts``` * cleaned up a bit more confusion of whether Graph instance or the Graph identifiers are passed to store methods. (#225)

Add graphs when parsing, so also empty graphs are added.

gromgull · 2013-07-29T16:44:18Z

I've made all the tests pass in this branch, and adapted the dawg test and sparql engine a tiny bit.

The only remaining question may be if contexts() should return the default graph

gromgull · 2013-07-29T16:46:43Z

Note: I've also rebased this onto the latest master, if you had a local copy you may have to force pull

coveralls · 2013-07-29T16:47:57Z

Coverage increased (+0%) when pulling 6c026d0 on graphaware into 8fad4ed on master.

uholzer · 2013-08-10T12:30:01Z

Ihave one wish: Could you add a Graph.graph_aware set to true by the subclass Dataset? This would be usueful for code that makes full use of Dataset but is also able to work with ConjunctiveGraph or a plain Graph. Graph.context_aware and Graph.default_union are already there.

gromgull · 2013-08-11T18:49:56Z

It seems we have more or less agreement I will merge this now - we can sort out the remaining bits in issues of their own!

Dataset cleanup - Store API for graph method

@PuckCh

2013/12/31 RELEASE 4.1 ====================== This is a new minor version RDFLib, which includes a handful of new features: * A TriG parser was added (we already had a serializer) - it is up-to-date wrt. to the newest spec from: http://www.w3.org/TR/trig/ * The Turtle parser was made up to date wrt. to the latest Turtle spec. * Many more tests have been added - RDFLib now has over 2000 (passing!) tests. This is mainly thanks to the NT, Turtle, TriG, NQuads and SPARQL test-suites from W3C. This also included many fixes to the nt and nquad parsers. * ```ConjunctiveGraph``` and ```Dataset``` now support directly adding/removing quads with ```add/addN/remove``` methods. * ```rdfpipe``` command now supports datasets, and reading/writing context sensitive formats. * Optional graph-tracking was added to the Store interface, allowing empty graphs to be tracked for Datasets. The DataSet class also saw a general clean-up, see: RDFLib/rdflib#309 * After long deprecation, ```BackwardCompatibleGraph``` was removed. Minor enhancements/bugs fixed: ------------------------------ * Many code samples in the documentation were fixed thanks to @PuckCh * The new ```IOMemory``` store was optimised a bit * ```SPARQL(Update)Store``` has been made more generic. * MD5 sums were never reinitialized in ```rdflib.compare``` * Correct default value for empty prefix in N3 [#312]RDFLib/rdflib#312 * Fixed tests when running in a non UTF-8 locale [#344]RDFLib/rdflib#344 * Prefix in the original turtle have an impact on SPARQL query resolution [#313]RDFLib/rdflib#313 * Duplicate BNode IDs from N3 Parser [#305]RDFLib/rdflib#305 * Use QNames for TriG graph names [#330]RDFLib/rdflib#330 * \uXXXX escapes in Turtle/N3 were fixed [#335]RDFLib/rdflib#335 * A way to limit the number of triples retrieved from the ```SPARQLStore``` was added [#346]RDFLib/rdflib#346 * Dots in localnames in Turtle [#345]RDFLib/rdflib#345 [#336]RDFLib/rdflib#336 * ```BNode``` as Graph's public ID [#300]RDFLib/rdflib#300 * Introduced ordering of ```QuotedGraphs``` [#291]RDFLib/rdflib#291 2013/05/22 RELEASE 4.0.1 ======================== Following RDFLib tradition, some bugs snuck into the 4.0 release. This is a bug-fixing release: * the new URI validation caused lots of problems, but is nescessary to avoid ''RDF injection'' vulnerabilities. In the spirit of ''be liberal in what you accept, but conservative in what you produce", we moved validation to serialisation time. * the ```rdflib.tools``` package was missing from the ```setup.py``` script, and was therefore not included in the PYPI tarballs. * RDF parser choked on empty namespace URI [#288](RDFLib/rdflib#288) * Parsing from ```sys.stdin``` was broken [#285](RDFLib/rdflib#285) * The new IO store had problems with concurrent modifications if several graphs used the same store [#286](RDFLib/rdflib#286) * Moved HTML5Lib dependency to the recently released 1.0b1 which support python3 2013/05/16 RELEASE 4.0 ====================== This release includes several major changes: * The new SPARQL 1.1 engine (rdflib-sparql) has been included in the core distribution. SPARQL 1.1 queries and updates should work out of the box. * SPARQL paths are exposed as operators on ```URIRefs```, these can then be be used with graph.triples and friends: ```py # List names of friends of Bob: g.triples(( bob, FOAF.knows/FOAF.name , None )) # All super-classes: g.triples(( cls, RDFS.subClassOf * '+', None )) ``` * a new ```graph.update``` method will apply SPARQL update statements * Several RDF 1.1 features are available: * A new ```DataSet``` class * ```XMLLiteral``` and ```HTMLLiterals``` * ```BNode``` (de)skolemization is supported through ```BNode.skolemize```, ```URIRef.de_skolemize```, ```Graph.skolemize``` and ```Graph.de_skolemize``` * Handled of Literal equality was split into lexical comparison (for normal ```==``` operator) and value space (using new ```Node.eq``` methods). This introduces some slight backwards incomaptible changes, but was necessary, as the old version had inconsisten hash and equality methods that could lead the literals not working correctly in dicts/sets. The new way is more in line with how SPARQL 1.1 works. For the full details, see: https://github.com/RDFLib/rdflib/wiki/Literal-reworking * Iterating over ```QueryResults``` will generate ```ResultRow``` objects, these allow access to variable bindings as attributes or as a dict. I.e. ```py for row in graph.query('select ... ') : print row.age, row["name"] ``` * "Slicing" of Graphs and Resources as syntactic sugar: ([#271](RDFLib/rdflib#271)) ```py graph[bob : FOAF.knows/FOAF.name] -> generator over the names of Bobs friends ``` * The ```SPARQLStore``` and ```SPARQLUpdateStore``` are now included in the RDFLib core * The documentation has been given a major overhaul, and examples for most features have been added. Minor Changes: -------------- * String operations on URIRefs return new URIRefs: ([#258](RDFLib/rdflib#258)) ```py >>> URIRef('http://example.org/')+'test rdflib.term.URIRef('http://example.org/test') ``` * Parser/Serializer plugins are also found by mime-type, not just by plugin name: ([#277](RDFLib/rdflib#277)) * ```Namespace``` is no longer a subclass of ```URIRef``` * URIRefs and Literal language tags are validated on construction, avoiding some "RDF-injection" issues ([#266](RDFLib/rdflib#266)) * A new memory store needs much less memory when loading large graphs ([#268](RDFLib/rdflib#268)) * Turtle/N3 serializer now supports the base keyword correctly ([#248](RDFLib/rdflib#248)) * py2exe support was fixed ([#257](RDFLib/rdflib#257)) * Several bugs in the TriG serializer were fixed * Several bugs in the NQuads parser were fixed

gromgull mentioned this pull request Jun 26, 2013

Dataset / ConjunctiveGraph #307

Closed

gromgull added 5 commits July 29, 2013 18:43

added support for toggleable union of default graph

06be678

whitespace gardening

8b110ca

made dawg tests use dataset class

5aaa6f4

Dataset and graph_aware store fixes

6c026d0

Add graphs when parsing, so also empty graphs are added.

gromgull added a commit that referenced this pull request Aug 11, 2013

Merge pull request #309 from RDFLib/graphaware

e70451e

Dataset cleanup - Store API for graph method

gromgull merged commit e70451e into master Aug 11, 2013

gromgull deleted the graphaware branch August 11, 2013 18:50

This was referenced Aug 11, 2013

Dataset: Choosing default graph to be the union or a graph on its own #303

Closed

Dataset.parse broken #301

Open

Dataset: Deleting default graph #302

Closed

pyup-bot mentioned this pull request Nov 8, 2016

Update rdflib to 4.2.1 mytardis/mytardis#733

Closed

This was referenced Jan 16, 2017

Initial Update mozilla/addons-server#4303

Closed

Update rdflib to 4.2.1 mozilla/addons-server#4390

Closed

pyup-bot mentioned this pull request Jan 29, 2017

Update rdflib to 4.2.2 mytardis/mytardis#815

Merged

This was referenced Mar 16, 2017

Initial Update mozilla/amo-validator#510

Closed

Update rdflib to 4.2.2 mozilla/amo-validator#515

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset cleanup - Store API for graph method #309

Dataset cleanup - Store API for graph method #309

gromgull commented Jun 26, 2013

gromgull commented Jun 26, 2013

uholzer commented Jun 27, 2013

gromgull commented Jun 27, 2013

uholzer commented Jun 27, 2013

gromgull commented Jun 27, 2013

iherman commented Jun 27, 2013

uholzer commented Jun 27, 2013

iherman commented Jun 27, 2013

gromgull commented Jul 29, 2013

gromgull commented Jul 29, 2013

coveralls commented Jul 29, 2013

uholzer commented Aug 10, 2013

gromgull commented Aug 11, 2013

Dataset cleanup - Store API for graph method #309

Dataset cleanup - Store API for graph method #309

Conversation

gromgull commented Jun 26, 2013

gromgull commented Jun 26, 2013

uholzer commented Jun 27, 2013

gromgull commented Jun 27, 2013

uholzer commented Jun 27, 2013

gromgull commented Jun 27, 2013

iherman commented Jun 27, 2013

uholzer commented Jun 27, 2013

iherman commented Jun 27, 2013

gromgull commented Jul 29, 2013

gromgull commented Jul 29, 2013

coveralls commented Jul 29, 2013

uholzer commented Aug 10, 2013

gromgull commented Aug 11, 2013