Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

As a node operator, I want to harvest and ingest a subset of a bundle based on existing registered data. #130

Closed
tloubrieu-jpl opened this issue Jan 29, 2021 · 7 comments

Comments

@tloubrieu-jpl
Copy link
Member

tloubrieu-jpl commented Jan 29, 2021

For more information on how to populate this new feature request, see the PDS Wiki on User Story Development:

https://github.com/NASA-PDS/nasa-pds.github.io/wiki/Issue-Tracking#user-story-development

Motivation

...so that I can ingest just the "new" data to the registry. This should also improve performance for the tools.

Additional Details

Currently we can load a full PDS4 archive or bundle, into the registry but in case of updated bundle or archive we would like to help the user to only load the updated part of the archive rather than 1. letting him select which files have been updated 2. reload the full archive.

Acceptance Criteria

Given a bundle that has previously been registered, but now has 1 or more new products to register
When I perform harvest / registry manager
Then I expect to only harvest and ingest those 1 or more new products

Engineering Details

This EPIC is likely to imply the creation of new repository enabling both harvest and registry-manager functions and their synchronization with the elasticsearch content.

@tloubrieu-jpl tloubrieu-jpl added enhancement New feature or request triage-needed labels Jan 29, 2021
@tloubrieu-jpl tloubrieu-jpl self-assigned this Jan 29, 2021
@tloubrieu-jpl
Copy link
Member Author

@tdddblog I created the task as an EPIC (it did not fit into an existing EPIC), you can add tasks under if you feel like it needs to be broken up.

@tloubrieu-jpl tloubrieu-jpl added this to the 02.Marion.Jones milestone Feb 2, 2021
@jordanpadams jordanpadams changed the title Registry synchronization with PDS4 archive files Update Registry with incremental ingestion of data based on existing registry products Feb 2, 2021
@jordanpadams jordanpadams changed the title Update Registry with incremental ingestion of data based on existing registry products Update Registry with incremental ingestion of archive products based on existing registry products Feb 2, 2021
@jordanpadams jordanpadams changed the title Update Registry with incremental ingestion of archive products based on existing registry products Update Registry with incremental ingestion of archive products based on existing registered data Feb 2, 2021
@tloubrieu-jpl tloubrieu-jpl removed this from the 03.Wyomia.Tyus milestone Feb 11, 2021
@jordanpadams jordanpadams changed the title Update Registry with incremental ingestion of archive products based on existing registered data As a node operator, I want to register a subset of a bundle based on existing registered data Mar 27, 2021
@jordanpadams jordanpadams changed the title As a node operator, I want to register a subset of a bundle based on existing registered data As a node operator, I want to harvest and ingest a subset of a bundle based on existing registered data Mar 27, 2021
@jordanpadams
Copy link
Member

@tdddblog here is another one we should consider

@jordanpadams jordanpadams changed the title As a node operator, I want to harvest and ingest a subset of a bundle based on existing registered data As a node operator, I want to harvest and ingest a subset of a bundle based on existing registered data. Mar 29, 2021
@jordanpadams
Copy link
Member

already implemented through using the the bundle section of the config file.

@rchenatjpl
Copy link

rchenatjpl commented May 26, 2021

@jordanpadams @tdddblog @tloubrieu-jpl
I think this issue is getting resolved via the collection filter, which works well for NAIF, so I'm sorry the sample bundle I gave you was NAIF-based. Collections more typically grow by adding new products into the subdir, appending to collection.tab (which consists entirely of Primary LIDVIDs), and incrementing collection.xml's VID. So if I filter using <bundle>/<collection lidvid="something::8.0"/>, nothing gets excluded. (I'm assuming this. I didn't test this)

So I tried to filter via <harvest>/<registry>, but when I harvested a collection, added to it, incremented the collection VID, and re-harvested, the entire collection (not just the new products) showed up in /tmp/harvest/out/registry-docs.json. Since I really, really want to be done with testing, I'm calling this issue fixed because other, unchanged collections did not show up in the .json.

By the way, https://nasa-pds.github.io/pds-registry-app/operate/harvest.html# should indicate that <registry> only works with <bundles>. In email, I proposed changes to that doc, but apparently github is the way to get things done.

My suggestion: I believe this issue asks for a product-by-product filter. That may be too costly performance- or people-wise. If so, feel free to ignore this comment.

@jordanpadams
Copy link
Member

jordanpadams commented May 26, 2021

@rchenatjpl FYI, i updated you comment above to wrap you XML with backticks `. otherwise, the Github markdown sees a < and thinks it is HTML

@tdddblog
Copy link
Contributor

tdddblog commented Jun 8, 2021

@rchenatjpl Fixed. See pull request: NASA-PDS/harvest#51

@tloubrieu-jpl
Copy link
Member Author

The code has been included in release pds-registry-app v0.3.2

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants