Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge feature_update_graph_pattern #43

Merged
merged 78 commits into from
Nov 10, 2021
Merged

Commits on Apr 13, 2021

  1. Configuration menu
    Copy the full SHA
    e018a08 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    d9e67c1 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    05741d9 View commit details
    Browse the repository at this point in the history

Commits on Apr 22, 2021

  1. Configuration menu
    Copy the full SHA
    4cfebac View commit details
    Browse the repository at this point in the history

Commits on Apr 24, 2021

  1. Configuration menu
    Copy the full SHA
    109bc54 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    8378a34 View commit details
    Browse the repository at this point in the history

Commits on Apr 30, 2021

  1. Completely refactor d1lod package

    The old Interface and Graph classes had a lot of cruft and were really confusing to keep straight. I've done a refactor that uses a different class structure that's a lot easier for me to understand. Hopefully it's easier for you too.
    
    See the README for more info, including a fancy picture, but the Interface class is now wrapped into a top-level SlinkyClient class and the old Graph class is now tied into a SparqlTripleStore class. With the old setup, you had to instantiate an Interface and a Graph. Now you just instantiate a SlinkyClient and you're good to go.
    
    Here's some more detail, copied from the README:
    
    - `SlinkyClient`: Entrypoint class that manages a connection to DataONE, a triple store, and Redis for short-term persistence and delayed jobs
    - `FilteredCoordinatingNodeClient`: A view into a Coordinating Node that can limit what content appears to be available based on a Solr query. e.g., a CN client that can only see datasets that are part of a specific EML project or in a particular region
    - `SparqlTripleStore`: Handles inserting into and querying a generic SPARQL-compliant RDF triplestore via SPARQL queries. Designed to be used with multiple triple stores.
    - `Processor`: Set of classes that convert documents of various formats (e.g., XML, JSON-LD) into a set of RDF statements
    
    The old package code is left in the legacy submodule (to be deleted in the future) and its tests are still alive and working via pytest.
    amoeba committed Apr 30, 2021
    Configuration menu
    Copy the full SHA
    478b62f View commit details
    Browse the repository at this point in the history
  2. Remove top-level makefile

    amoeba committed Apr 30, 2021
    Configuration menu
    Copy the full SHA
    46009a7 View commit details
    Browse the repository at this point in the history
  3. Convert web front-end to use a SlinkyClient

    This is pretty basic but you can run this container and hit /get?id=foo to get the Slinky RDF for given DataONE PID.
    amoeba committed Apr 30, 2021
    Configuration menu
    Copy the full SHA
    062c562 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    2fca953 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    a782ff9 View commit details
    Browse the repository at this point in the history

Commits on May 1, 2021

  1. Configuration menu
    Copy the full SHA
    e1dd51f View commit details
    Browse the repository at this point in the history

Commits on May 4, 2021

  1. Configuration menu
    Copy the full SHA
    85163f6 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    fd6f7cb View commit details
    Browse the repository at this point in the history

Commits on May 12, 2021

  1. Configuration menu
    Copy the full SHA
    ac73868 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    104e190 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    777cf1f View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    04f9407 View commit details
    Browse the repository at this point in the history

Commits on May 15, 2021

  1. Configuration menu
    Copy the full SHA
    be4d94a View commit details
    Browse the repository at this point in the history

Commits on May 18, 2021

  1. Create new Virtuoso-specific store model

    Closes #30
    
    I couldn't find a way to send very large SPARQL queries to Virtuoso but Virtuoso does have an HTTP API that takes Turtle/NTriples/etc. Since this is specific to Virtuoso, I've made a separate model from SparqlTripleStore.
    amoeba committed May 18, 2021
    Configuration menu
    Copy the full SHA
    9783fb6 View commit details
    Browse the repository at this point in the history
  2. Change BlazegraphStore's default port

    I don't know why I had tweaked this.
    amoeba committed May 18, 2021
    Configuration menu
    Copy the full SHA
    08cbd7a View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    025933b View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    cb2ff80 View commit details
    Browse the repository at this point in the history
  5. Adjust logic for when update_job runs or doesn't

    We don't want to run multiple update jobs at once and we also don't want to run update_job when the dataset queue is saturated. This change controls both of those scenarios.
    amoeba committed May 18, 2021
    Configuration menu
    Copy the full SHA
    414fce3 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    9daeef9 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    65879b9 View commit details
    Browse the repository at this point in the history
  8. Make get_new_datasets_since query range-exclusive

    This prevents repeated calls to get_new_datasets_since from inserting the most recent dataset over and over again
    amoeba committed May 18, 2021
    Configuration menu
    Copy the full SHA
    729d615 View commit details
    Browse the repository at this point in the history

Commits on May 19, 2021

  1. Use response.content instead of response.text

    Turns out you _really_ need to pass binary data to ElementTree because it'll treat your XML content as ASCII if you have requests/httpx decode it first.
    amoeba committed May 19, 2021
    Configuration menu
    Copy the full SHA
    776f842 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    dd233d9 View commit details
    Browse the repository at this point in the history
  3. Begin work refactoring setups/environments

    We need a way to override default behavior depending on context. Development, production, etc. I'm not sure if I want to do this via configuration or via environment variables just yet. This'll all probably change once I start building Docker images.
    amoeba committed May 19, 2021
    Configuration menu
    Copy the full SHA
    21e4323 View commit details
    Browse the repository at this point in the history

Commits on May 26, 2021

  1. Remove unused code from cli.py

    amoeba committed May 26, 2021
    Configuration menu
    Copy the full SHA
    d1af30a View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    aab1c30 View commit details
    Browse the repository at this point in the history
  3. Prevent EMLProcessor from re-inserting identifier blank nodes

    We use blank nodes for schema:identifier statements. If we reinsert the same dataset, we would get what are effectively duplicate blank nodes for things like identifier. ie If we inserted _twice_ a dataset where PersonA is a creator with ORCID O, we'd have to blank nodes for for PersonA, both with the value O. This isn't wrong but it'll make queries harder. So now we query first and don't re-insert the schema:identifier blank node (and its triples) if an equivalent set of triples already exist.
    amoeba committed May 26, 2021
    Configuration menu
    Copy the full SHA
    a6ef121 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    ae07392 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    1c315ed View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    9b35c2a View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    9958cf3 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    49d78bd View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    7eee7bd View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    2a8586b View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    c3eadc4 View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    7f64dbe View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    c954903 View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    1d43a69 View commit details
    Browse the repository at this point in the history

Commits on May 28, 2021

  1. Configuration menu
    Copy the full SHA
    ee1907c View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    9eaec1e View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    6c60a8f View commit details
    Browse the repository at this point in the history

Commits on Jun 3, 2021

  1. Configuration menu
    Copy the full SHA
    da28ab4 View commit details
    Browse the repository at this point in the history
  2. Finish spdx:Checksum support

    The old code was just placeholder and the new code should be logically correct
    amoeba committed Jun 3, 2021
    Configuration menu
    Copy the full SHA
    dffb273 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    ee4e1a6 View commit details
    Browse the repository at this point in the history

Commits on Jun 4, 2021

  1. Configuration menu
    Copy the full SHA
    fa74fa5 View commit details
    Browse the repository at this point in the history
  2. Clean up whitespace in readme

    amoeba committed Jun 4, 2021
    Configuration menu
    Copy the full SHA
    c154540 View commit details
    Browse the repository at this point in the history
  3. Add in support for SOSO PropertyValue model for attributes

    This commit also brings in a helper in processor_util, model_has_statement to fill a gap in Redlands
    amoeba committed Jun 4, 2021
    Configuration menu
    Copy the full SHA
    7482b19 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    6d15143 View commit details
    Browse the repository at this point in the history
  5. Add count and format options to cli's get method

    Get can now serialize turtle, ntriples, rdfxml, and jsonld. It can also now just return the number of triples in the graph instead of a serialization of the graph
    
    This introduces rdflib and rdflib's jsonld plugin as dependencies because redland doesn't support jsonld
    amoeba committed Jun 4, 2021
    Configuration menu
    Copy the full SHA
    d4ac71c View commit details
    Browse the repository at this point in the history

Commits on Jun 5, 2021

  1. Configuration menu
    Copy the full SHA
    30e06a6 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    aeec699 View commit details
    Browse the repository at this point in the history
  3. Remove RQ Dashboard from compose file

    This is easy enough to run standalone
    amoeba committed Jun 5, 2021
    Configuration menu
    Copy the full SHA
    471dda8 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    8f1217d View commit details
    Browse the repository at this point in the history
  5. Remove test for double-processing

    I'm not really sure what to do about this yet. It's a bit painful to write code to handle every blank node pattern SOSO is going to throw at us. I might re-introduce this at some later point.
    amoeba committed Jun 5, 2021
    Configuration menu
    Copy the full SHA
    59ca2cb View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    91aa958 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    d89cf23 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    aa24cf2 View commit details
    Browse the repository at this point in the history
  9. Fix bug in FilteredCoordinatingNodeClient logic

    I had originally designed the filtered client to take a base filter + an extra filter. I didn't account for the fact that the filtered client needs to manage three filters: (1) the base, (2) the actual filter of interest (eg SASAP-only, ARCTICA-only) and (3) the filter we use as the cursor to determine when there are new datasets. This makes the class fully aware of all of that.
    amoeba committed Jun 5, 2021
    Configuration menu
    Copy the full SHA
    3928b6d View commit details
    Browse the repository at this point in the history

Commits on Jun 23, 2021

  1. Configuration menu
    Copy the full SHA
    81f81c3 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    72b00fa View commit details
    Browse the repository at this point in the history

Commits on Jul 8, 2021

  1. Configuration menu
    Copy the full SHA
    7ba9b9c View commit details
    Browse the repository at this point in the history

Commits on Aug 19, 2021

  1. Configuration menu
    Copy the full SHA
    8d664dc View commit details
    Browse the repository at this point in the history

Commits on Nov 5, 2021

  1. Configuration menu
    Copy the full SHA
    4b78c5c View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    16c9632 View commit details
    Browse the repository at this point in the history
  3. Refactor the Scheduler and SlinkyClient interactions to support servi…

    …ce-based network addresses
    ThomasThelen committed Nov 5, 2021
    Configuration menu
    Copy the full SHA
    50aa722 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    8e03b92 View commit details
    Browse the repository at this point in the history
  5. Refactor the scheduler to always pull an image to avoid using old cac…

    …hed images. Change the cli parameters to use the slinky cli. Use the slinky base image, which has the d1lod library installed on it
    ThomasThelen committed Nov 5, 2021
    Configuration menu
    Copy the full SHA
    3051b99 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    832c7a2 View commit details
    Browse the repository at this point in the history
  7. Remove the 'docker' folder since the d1lod image is now being used by…

    … the scheduler and worker
    ThomasThelen committed Nov 5, 2021
    Configuration menu
    Copy the full SHA
    ae271ea View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    b938f59 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    913791c View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    49e9c72 View commit details
    Browse the repository at this point in the history