Merge feature_update_graph_pattern #43

The old Interface and Graph classes had a lot of cruft and were really confusing to keep straight. I've done a refactor that uses a different class structure that's a lot easier for me to understand. Hopefully it's easier for you too. See the README for more info, including a fancy picture, but the Interface class is now wrapped into a top-level SlinkyClient class and the old Graph class is now tied into a SparqlTripleStore class. With the old setup, you had to instantiate an Interface and a Graph. Now you just instantiate a SlinkyClient and you're good to go. Here's some more detail, copied from the README: - `SlinkyClient`: Entrypoint class that manages a connection to DataONE, a triple store, and Redis for short-term persistence and delayed jobs - `FilteredCoordinatingNodeClient`: A view into a Coordinating Node that can limit what content appears to be available based on a Solr query. e.g., a CN client that can only see datasets that are part of a specific EML project or in a particular region - `SparqlTripleStore`: Handles inserting into and querying a generic SPARQL-compliant RDF triplestore via SPARQL queries. Designed to be used with multiple triple stores. - `Processor`: Set of classes that convert documents of various formats (e.g., XML, JSON-LD) into a set of RDF statements The old package code is left in the legacy submodule (to be deleted in the future) and its tests are still alive and working via pytest.

This is pretty basic but you can run this container and hit /get?id=foo to get the Slinky RDF for given DataONE PID.

Closes #30 I couldn't find a way to send very large SPARQL queries to Virtuoso but Virtuoso does have an HTTP API that takes Turtle/NTriples/etc. Since this is specific to Virtuoso, I've made a separate model from SparqlTripleStore.

I don't know why I had tweaked this.

We don't want to run multiple update jobs at once and we also don't want to run update_job when the dataset queue is saturated. This change controls both of those scenarios.

This prevents repeated calls to get_new_datasets_since from inserting the most recent dataset over and over again

Turns out you _really_ need to pass binary data to ElementTree because it'll treat your XML content as ASCII if you have requests/httpx decode it first.

We need a way to override default behavior depending on context. Development, production, etc. I'm not sure if I want to do this via configuration or via environment variables just yet. This'll all probably change once I start building Docker images.

We use blank nodes for schema:identifier statements. If we reinsert the same dataset, we would get what are effectively duplicate blank nodes for things like identifier. ie If we inserted _twice_ a dataset where PersonA is a creator with ORCID O, we'd have to blank nodes for for PersonA, both with the value O. This isn't wrong but it'll make queries harder. So now we query first and don't re-insert the schema:identifier blank node (and its triples) if an equivalent set of triples already exist.

The old code was just placeholder and the new code should be logically correct

This commit also brings in a helper in processor_util, model_has_statement to fill a gap in Redlands

Get can now serialize turtle, ntriples, rdfxml, and jsonld. It can also now just return the number of triples in the graph instead of a serialization of the graph This introduces rdflib and rdflib's jsonld plugin as dependencies because redland doesn't support jsonld

This is easy enough to run standalone

I'm not really sure what to do about this yet. It's a bit painful to write code to handle every blank node pattern SOSO is going to throw at us. I might re-introduce this at some later point.

I had originally designed the filtered client to take a base filter + an extra filter. I didn't account for the fact that the filtered client needs to manage three filters: (1) the base, (2) the actual filter of interest (eg SASAP-only, ARCTICA-only) and (3) the filter we use as the cursor to determine when there are new datasets. This makes the class fully aware of all of that.

Ref #34

…ph_pattern

…ce-based network addresses

…need its contents

…hed images. Change the cli parameters to use the slinky cli. Use the slinky base image, which has the d1lod library installed on it

… the scheduler and worker

Commits on Apr 22, 2021

Create initial prototype of web app

amoeba committed Apr 22, 2021

Configuration menu

View commit details

Copy full SHA for 4cfebac

Browse repository at this point

Copy the full SHA

4cfebac View commit details

Browse the repository at this point in the history

Commits on May 1, 2021

Add note about Exceptions in d1lod readme

amoeba committed May 1, 2021

Configuration menu

View commit details

Copy full SHA for e1dd51f

Browse repository at this point

Copy the full SHA

e1dd51f View commit details

Browse the repository at this point in the history

Commits on May 15, 2021

Hook up new classes for easy testing

amoeba committed May 15, 2021

Configuration menu

View commit details

Copy full SHA for be4d94a

Browse repository at this point

Copy the full SHA

be4d94a View commit details

Browse the repository at this point in the history

Commits on Jul 8, 2021

Fix invalid EML doc in d1lod test suite

amoeba committed Jul 8, 2021

Configuration menu

View commit details

Copy full SHA for 7ba9b9c

Browse repository at this point

Copy the full SHA

7ba9b9c View commit details

Browse the repository at this point in the history

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge feature_update_graph_pattern #43

Merge feature_update_graph_pattern #43

Commits on Apr 13, 2021

Commits on Apr 22, 2021

Commits on Apr 24, 2021

Commits on Apr 30, 2021

Commits on May 1, 2021

Commits on May 4, 2021

Commits on May 12, 2021

Commits on May 15, 2021

Commits on May 18, 2021

Commits on May 19, 2021

Commits on May 26, 2021

Commits on May 28, 2021

Commits on Jun 3, 2021

Commits on Jun 4, 2021

Commits on Jun 5, 2021

Commits on Jun 23, 2021

Commits on Jul 8, 2021

Commits on Aug 19, 2021

Commits on Nov 5, 2021