This project enables you to store, process and query RDF-based Linked Data in Apache CouchDB.
There exists a number of related efforts, such as ipublic/rdf-couchdb. However, with LD-in-Couch - which has been designed from scratch - we have the following requirements in mind:
- efficient query processing of RDF (SPARQL, etc.)
- support for sharding graphs automatically (or with minimal user intervention)
- minimise the time spent in scanning single documents for fields and values (P-O)
See the Wiki page Mapping RDF to JSON documents for details about LD-in-Couch's design considerations.
First, in case you haven't already done so, you want to install Apache CouchDB and set it up, that is, create a user called admin
with the password admin
- alternatively you can update the main script with the username and password you want to use. Then you go and install couchdbkit. Last but not least, you need to create a view in CouchDB, namely as shown in views/lookup-by_subject.txt
.
After all that, you're good to go. Now, for each RDF Triples document you want to process, you run once the import task, for example:
$ python ld-in-couch.py -i data/example_0.nt -g http://example.org
2012-10-06T10:38:04 INFO --------------------------------------------------------------------------------
2012-10-06T10:38:04 INFO *** CONFIGURATION ***
2012-10-06T10:38:04 INFO --------------------------------------------------------------------------------
2012-10-06T10:38:04 INFO Starting import ...
2012-10-06T10:38:04 INFO Importing NTriples file '/Users/michau/Documents/dev/ld-in-couch/data/example_0.nt' into graph <http://example.org>
2012-10-06T10:38:04 DEBUG --------------------
2012-10-06T10:38:04 DEBUG #1: S: http://example.org/#m P: http://www.w3.org/1999/02/22-rdf-syntax-ns#type O: http://xmlns.com/foaf/0.1/Person
2012-10-06T10:38:04 DEBUG http://example.org/#m is a resource I haven't seen in subject position, yet
2012-10-06T10:38:04 DEBUG ... created new entity with ID ac595818ad836bcda35ea9c9eea6b73a
2012-10-06T10:38:04 DEBUG ... querying view http://127.0.0.1:5984/rdf/_design/lookup/_view/by_subject?key="http%3A//xmlns.com/foaf/0.1/Person"
2012-10-06T10:38:04 DEBUG The entity document with http://xmlns.com/foaf/0.1/Person in subject position does not exist, yet.
2012-10-06T10:38:04 DEBUG ... created new back-link entity with ID ac595818ad836bcda35ea9c9eea6a82f with back-link ac595818ad836bcda35ea9c9eea6b73a
2012-10-06T10:38:04 DEBUG --------------------
2012-10-06T10:38:04 DEBUG #2: S: http://example.org/#m P: http://www.w3.org/2000/01/rdf-schema#label O: Michael
2012-10-06T10:38:04 DEBUG I've seen http://example.org/#m already in subject position
2012-10-06T10:38:04 DEBUG ... querying view http://127.0.0.1:5984/rdf/_design/lookup/_view/by_subject?key="http%3A//example.org/%23m"
2012-10-06T10:38:04 DEBUG The entity document with http://example.org/#m in subject position has the ID ac595818ad836bcda35ea9c9eea6b73a
2012-10-06T10:38:04 DEBUG ... updated existing entity with ID ac595818ad836bcda35ea9c9eea6b73a
2012-10-06T10:38:04 DEBUG --------------------
2012-10-06T10:38:04 DEBUG #3: S: http://example.org/#m P: http://xmlns.com/foaf/0.1/knows O: http://example.org/#r
2012-10-06T10:38:04 DEBUG I've seen http://example.org/#m already in subject position
2012-10-06T10:38:04 DEBUG ... querying view http://127.0.0.1:5984/rdf/_design/lookup/_view/by_subject?key="http%3A//example.org/%23m"
2012-10-06T10:38:04 DEBUG The entity document with http://example.org/#m in subject position has the ID ac595818ad836bcda35ea9c9eea6b73a
2012-10-06T10:38:04 DEBUG ... updated existing entity with ID ac595818ad836bcda35ea9c9eea6b73a
2012-10-06T10:38:04 DEBUG ... querying view http://127.0.0.1:5984/rdf/_design/lookup/_view/by_subject?key="http%3A//example.org/%23r"
2012-10-06T10:38:04 DEBUG The entity document with http://example.org/#r in subject position does not exist, yet.
2012-10-06T10:38:04 DEBUG ... created new back-link entity with ID ac595818ad836bcda35ea9c9eea6a789 with back-link ac595818ad836bcda35ea9c9eea6b73a
2012-10-06T10:38:04 DEBUG --------------------
2012-10-06T10:38:04 DEBUG #4: S: http://example.org/#r P: http://www.w3.org/1999/02/22-rdf-syntax-ns#type O: http://xmlns.com/foaf/0.1/Person
2012-10-06T10:38:04 DEBUG I've seen http://example.org/#r already in subject position
2012-10-06T10:38:04 DEBUG ... querying view http://127.0.0.1:5984/rdf/_design/lookup/_view/by_subject?key="http%3A//example.org/%23r"
2012-10-06T10:38:04 DEBUG The entity document with http://example.org/#r in subject position has the ID ac595818ad836bcda35ea9c9eea6a789
2012-10-06T10:38:04 DEBUG ... updated existing entity with ID ac595818ad836bcda35ea9c9eea6a789
2012-10-06T10:38:04 DEBUG ... querying view http://127.0.0.1:5984/rdf/_design/lookup/_view/by_subject?key="http%3A//xmlns.com/foaf/0.1/Person"
2012-10-06T10:38:04 DEBUG The entity document with http://xmlns.com/foaf/0.1/Person in subject position has the ID ac595818ad836bcda35ea9c9eea6a82f
2012-10-06T10:38:04 DEBUG ... updated existing entity with ID ac595818ad836bcda35ea9c9eea6a82f with back-link ac595818ad836bcda35ea9c9eea6a789
2012-10-06T10:38:04 DEBUG --------------------
2012-10-06T10:38:04 DEBUG #5: S: http://example.org/#r P: http://www.w3.org/2000/01/rdf-schema#label O: Richard
2012-10-06T10:38:04 DEBUG I've seen http://example.org/#r already in subject position
2012-10-06T10:38:04 DEBUG ... querying view http://127.0.0.1:5984/rdf/_design/lookup/_view/by_subject?key="http%3A//example.org/%23r"
2012-10-06T10:38:04 DEBUG The entity document with http://example.org/#r in subject position has the ID ac595818ad836bcda35ea9c9eea6a789
2012-10-06T10:38:04 DEBUG ... updated existing entity with ID ac595818ad836bcda35ea9c9eea6a789
2012-10-06T10:38:04 INFO Import completed. I've processed 6 triples and seen 3 subjects (incl. back-links).
Now you could, for example, look up the entity http://example.org/#m
in the graph http://example.org
like so:
$ curl 'http://127.0.0.1:5984/rdf/_design/entity/_view/by_subject?key="http%3A//example.org/%23mhttp://example.org"'
{
"total_rows": 6,
"offset": 1,
"rows": [{
"id": "ea479b6dad91e36e1cefac33b57ad884",
"key": "http://example.org/#mhttp://example.org",
"value": [
[{
"g": "http://example.org",
"s": "http://example.org/#m",
"p": "http://www.w3.org/1999/02/22-rdf-syntax-ns#type",
"o": "http://xmlns.com/foaf/0.1/Person",
"o_type": "uri"
}],
[{
"g": "http://example.org",
"s": "http://example.org/#m",
"p": "http://www.w3.org/2000/01/rdf-schema#label",
"o": "Michael",
"o_type": "literal"
}],
[{
"g": "http://example.org",
"s": "http://example.org/#m",
"p": "http://xmlns.com/foaf/0.1/knows",
"o": "http://example.org/#r",
"o_type": "uri"
}]
]
}]
}
- retain subject and object type (add uri, bNode, literal flags)
- add
o_in__with_p
to record with which predicate the resource is back-linked - use proper NTriples parser, for example, Sean's impl
- SPARQL support, for example through fyzz
This software is licensed under Apache 2.0 Software License. In case you have any questions, ask Michael Hausenblas.
The design for LD-in-Couch has been influenced and inspired by Ilya Katsov's NoSQL Data Modeling Techniques as well as the wonderful book Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement written by Eric Redmond and Jim R. Wilson.