Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Support for Offline Triplestore Dump to RDF Formats #1291

Open
arcangelo7 opened this issue Feb 29, 2024 · 1 comment
Open

Comments

@arcangelo7
Copy link

Hello QLever Team,

I've been exploring the capabilities of QLever and its control script, qlever-control, for managing SPARQL queries and datasets. To the best of my knowledge, I couldn't find a feature that allows for dumping the entire triplestore to an RDF file. This functionality is crucial for handling very large datasets efficiently.

For large triplestores, the approach of using SPARQL queries with OFFSET and LIMIT to paginate through results for dumping data becomes impractical due to time constraints. Similarly, attempting a single massive CONSTRUCT query to dump the entire dataset is not feasible due to memory limitations.

In comparison, Blazegraph offers a solution for this issue with its com.bigdata.rdf.sail.ExportKB class, enabling offline dumps of the triplestore in various formats such as N-Quads, JSON-LD, etc. This feature significantly simplifies managing and archiving large datasets.

My use case involves working with OpenCitations Meta, which comprises 4,236,287,432 triples for data and an additional 5,540,033,781 triples for provenance. Being able to dump our data from the triplestore into RDF formats is essential for our operations, and a similar feature in QLever would greatly benefit us and likely many others in the community.

Could you consider adding such a feature to QLever or qlever-control? An offline dump feature for the triplestore that supports multiple RDF formats would be a tremendous asset, especially for those of us dealing with extensive datasets.

Thank you for considering this request. Your efforts in developing and maintaining QLever are greatly appreciated.

@hannahbast
Copy link
Member

hannahbast commented Feb 29, 2024

@arcangelo7 Two questions:

  1. Can you briefly explain what is the advantage of dumping the complete dataset from a SPARQL endpoint vs. just downloading the dataset based on which the SPARQL endpoint was constructed.

  2. What exactly is impractical about multiple queries involving OFFSET and LIMIT? QLever does not support OFFSET for ?s ?p ?o queries yet, but that would be a relatively easy fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants