Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update software to only include latest collection in when bundle references LIDs #24

Closed
jordanpadams opened this issue Mar 16, 2020 · 6 comments · Fixed by #68
Closed
Assignees
Labels
enhancement New feature or request

Comments

@jordanpadams
Copy link
Member

jordanpadams commented Mar 16, 2020

Is your feature request related to a problem? Please describe.
Currently, when a bundle only references LIDs, the software looks for all matches for a LID in collection products. We should only grab the latest version.

NOTE 💥 : There should be a flag to ignore this so we can use this software on previous releases of PDS4 data. Something like:

--include-all-collections     For bundles that reference collections by LID, this flag 
                      will include ALL versions of collections in the bundle. By default, 
                      the software only includes the latest version of the collection

Applicable requirements
Primary - 🦄 #50 (see Assumption 3)

@nutjob4life
Copy link
Member

Here's where my lack of familiarity with PDS concepts is showing.

Could I see a examples of bundle.xml files that have collection products with LIDs with multiple versions?

@jordanpadams
Copy link
Member Author

@nutjob4life I updated the test data on pds-dev-el7 to now include 2 data collections.

@nutjob4life
Copy link
Member

Thanks @jordanpadams! I can log in successfully to pds-dev-el7; point me to a specific file path that exhibits lid-only reference to multiple versions of collections? (Yes, I need my hand held.)

$ find /data -name harvest-2.0.0 -prune -o \( -iname '*bundle*.xml' -print \) 2>/dev/null
/data/home/pds4/insight_cameras/bundle.xml
/data/home/pds4/validate_regression_data/issue_42/V1900/dph_example_archive/bundle_izenberg_pdart14_meap.xml
/data/home/pds4/validate_regression_data/dph_example_archive/bundle_izenberg_pdart14_meap.xml
/data/home/pds4/testdata/dph_example_archive_VG2PLS/bundle_checksums.xml
/data/home/pds4/testdata/dph_example_archive_VG2PLS/bundle.xml
/data/home/pds4/testdata/urn-nasa-pds-kaguya_grs_spectra/bundle_kaguya_derived.xml

I should really read https://pds.nasa.gov/datastandards/documents/current-version.shtml some day, right?

@jordanpadams
Copy link
Member Author

/data/home/pds4/insight_cameras/bundle.xml

I modified the data you were using before to now use LID references and the data collection has multiple versions.

  <Bundle_Member_Entry>
    <lid_reference>urn:nasa:pds:insight_cameras:data</lid_reference> <<<<<<<------
    <member_status>Primary</member_status>
    <reference_type>bundle_has_data_collection</reference_type>
  </Bundle_Member_Entry>

Note: The software should throw a warning or something because there are several collections referenced in the bundle.xml that do not exist

@jordanpadams
Copy link
Member Author

@nutjob4life ☝️

@nutjob4life
Copy link
Member

Thanks @jordanpadams. Adding a note to @me:

A LID only reference looks like:

<Bundle_Member_Entry>
  <lid_reference>urn:nasa:pds:whatever</lid_reference>
  …
</Bundle_Member_Entry>

while a full LIDVID reference goes:

<Bundle_Member_Entry>
  <lidvid_reference>urn:nasa:pds:whatever::1.0</lidvid_reference>
  …
</Bundle_Member_Entry>

nutjob4life added a commit that referenced this issue Apr 22, 2020
-   #39: SIP contains just one product for Insight Spice example
    -   We match the lidvid against all primaries found
    -   Take advantage of a new LogicalReference class for lid+lidvid manipulations
        -   This probably already exists in Java land?
    -   Look for either lid_reference or lidvid_reference in bundle XML
-   Add a way to compare, hash, and manipulate lidvids with optional vids: class LogicalReference
    -   Representation, stringification
    -   Hashing, equality, comparisons
        -   Compare version IDs smartly (i.e., 2.9 < 2.10)
    -   Matching based on partial lid or full lidvid
    -   Battery of unit tests
-   Begin support for #24 with command-line handling and flag passing
    -   But default it to True until we have time to actually put it in
        -   Include prominent warning that we're doing that
-   Remove redundant database connection use and commit
jordanpadams pushed a commit that referenced this issue Apr 22, 2020
-   #39: SIP contains just one product for Insight Spice example
    -   We match the lidvid against all primaries found
    -   Take advantage of a new LogicalReference class for lid+lidvid manipulations
        -   This probably already exists in Java land?
    -   Look for either lid_reference or lidvid_reference in bundle XML
-   Add a way to compare, hash, and manipulate lidvids with optional vids: class LogicalReference
    -   Representation, stringification
    -   Hashing, equality, comparisons
        -   Compare version IDs smartly (i.e., 2.9 < 2.10)
    -   Matching based on partial lid or full lidvid
    -   Battery of unit tests
-   Begin support for #24 with command-line handling and flag passing
    -   But default it to True until we have time to actually put it in
        -   Include prominent warning that we're doing that
-   Remove redundant database connection use and commit
nutjob4life added a commit that referenced this issue Jun 14, 2020
…effort

When SIP generation was originally done, the requirements weren't fully appreciated and they also spread fairly wide (examination of local filesystems versus querying a registry service, using pre-computed checksums (digests) in unspecified file formats, etc.) and AIP generation was purely based on directory contents and not label files. This commit fixes all that with a unified bundle comprehension method (using a temporary `sqlite3` database) that's shared between AIP and SIP and architected with multiprocessing capability for the future.

It specifically addressess the following:

-   Resolves #24 by including support for the `--include-all-collections` argument for both AIP and SIP generation
-   Resolves #41 by building a structure for the PDS labels instead of blindly accepting what's in the bundle's directory structure
-   Resolves #65 by adding support for the `<directory_path_name>` element in XML labels
-   Resolves #64 by testing on the Insight Cameras dataset three times with reliable termination on the host `pds-dev-el7.jpl.nasa.gov` in approximately 1h 33m each time.
-   Resolves #63 by adding timestamps to logical identifiers and generated filenames.

This commit also:

-   Updates the documentation with some additional notes and diagnostics produced by this version of the software
    -   It also cleans up documentation with some cosmetic adjustments
    -   It fixes the base URL described in the example usage (missing trailing slash)
-   Removes extraneous package dependencies
-   Factors bundle comprehension and database generation for shared use by AIP and SIP generation
    -   Removed redundant database generation: if it's done for AIP just feed it into SIP
-   Adds functional tests for AIP generation
    -   Factors SIP and AIP test cases for code reuse
-   Removes hundreds of lines of now redundant code
jordanpadams added a commit that referenced this issue Jun 18, 2020
Bug fixes and improvements per #24, #41, #63, #64, #65
jordanpadams added a commit that referenced this issue Jun 19, 2020
Per some discussions with Steve and Co, by default, our software should include all the products and let NSSDCA figure out which ones they are already have. We should still provide the flag, but we just needed to change the default functionality.

refs #24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants