New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDT support #894
Comments
@chrysn I made a quick proof of concept here: https://github.com/FlorianLudwig/rdflib-hdt/blob/master/rdflib_hdt.py without any optimizations - since they would need knowledge of how the query processor works which I don't have (yet :)). But it is already quite fast: I used the wikidata set for a quick benchmark and it looked quite good! |
I'm interested in this so I've allocated this to the 6.0.0 release (hopefully July 2020). @FlorianLudwig if you're interested in progressing this to a full Store implementation within rdflib, please let me (one of the new rdflib maintainers) know! |
hi @nicholascar, what would it take to make this a full store implementation? While my implementation is really simple it worked for my purposes well enough so never I looked into any optimizations. Creating hdt files might be of interest but I don't think that fits well into rdflib's model / api as hdt doesn't allow single writes but only all at once. |
@FlorianLudwig perhaps nothing much more than what you have already is needed! We should just document it nicely, especially if it needs to decalre itself read only etc. (as the SPARQL Store does) and ensure that we have a copy of the HTD code that is bundled or managed in some way so we aren't exposed to the repo going away. I might invite @Callidon to bring his HTD library into the rdflib family! Give me a couple of weeks until 5.0.0 is out and I'll get back to you about this. If you did want to put in a PR for this against master in the meanwhile, that would be great. We will just flag it for 6.0.0 as I've done for this Issue. |
Hi everyone, Thank you for all the interest over pyHDT, it's amazing to see a small side-project like that turning into something bigger 😄 In my opinion, the first piece of work is to adapt to inputs and outputs of pyHDT to use the data model of RDFlib (Literal, URIRef, BNode), because it currently only uses string to represent RDF terms. I'm unsure if we should to this directly into pyHDT (in the C++ binding code) or as a layer over it, as part of the "rdflib integration". Concerning the creation of HDT files, I agree with @FlorianLudwig that it might not fit well in the whole rdflib model, as an HDT file cannot be modified after creation. |
We have HDT handling within RDFlib with the rdflib-hdt project. Please take up all RDFlib/+HDT issues there! |
The HDT format appears to be a promising way to access large static databases. There exists a Python wrapper around the standard HDT library. (It has an open issue about result correctness, and I didn't at first glance find how blank nodes or literals are handled, but anyway, it's active so can grow to fit).
I'm not well versed in rdflib internals, but it appears that one could construct a read-only store in a quite straight-forward fashion (mainly implementing
.triples()
).Main questions (were I to find the time to implement this myself) would be
The text was updated successfully, but these errors were encountered: