Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API not returning value for NAIF bundles #305

Closed
jordanpadams opened this issue Apr 4, 2023 · 9 comments · Fixed by #312
Closed

API not returning value for NAIF bundles #305

jordanpadams opened this issue Apr 4, 2023 · 9 comments · Fixed by #312
Assignees
Labels

Comments

@jordanpadams
Copy link
Member

Checked for duplicates

Yes - I've already checked

🐛 Describe the bug

When I did queries for a few NAIF bundle, they are not returning anything from the API. This may or may not be related to NASA-PDS/registry#180

🕵️ Expected behavior

I expected the API to return the product metadata

📜 To Reproduce

Here are a few examples that I can confirm from the Kibana dashboard exists in the NAIF production Registry:

https://pds.nasa.gov/api/search/1/products/urn:nasa:pds:mars2020.spice::6.0
https://pds.nasa.gov/api/search/1/products/urn:nasa:pds:insight.spice::15.0
https://pds.nasa.gov/api/search/1/products/urn:esa:psa:em16_spice::6.0
https://pds.nasa.gov/api/search/1/products/urn:nasa:pds:maven.spice::5.0

🖥 Environment Info

No response

📚 Version of Software Used

No response

🩺 Test Data / Additional context

No response

🦄 Related requirements

No response

⚙️ Engineering Details

Not sure if this is an API bug or a bug in the CCS config in production registry

@jimmie
Copy link
Member

jimmie commented Apr 4, 2023

Going directly to the naif registry reflects the same results, so not a CCS issue.

curl "https://pds.nasa.gov/api/search-naif/1/products/urn:nasa:pds:maven.spice::5.0"
{"request":"/products/urn:nasa:pds:maven.spice::5.0","message":"The lidvid urn:nasa:pds:maven.spice was not found"}

Will examine the CW logs to see if there's any smoking gun.

@jimmie
Copy link
Member

jimmie commented Apr 4, 2023

Seeing odd things w/ the data - e.g. looking at the last few versions of the lid url:esa:psa:em16_spice, archive_status disappears starting w/ vid 5.0 and superseded_by disappears starting w/ vid 6.0

     {
        "_id" : "urn:esa:psa:em16_spice::4.0",
        "_index" : "registry",
        "_score" : 7.4576373,
        "_source" : {
           "lidvid" : "urn:esa:psa:em16_spice::4.0",
           "ops:Provenance/ops:superseded_by" : "urn:esa:psa:em16_spice::5.0",
           "ops:Tracking_Meta/ops:archive_status" : "archived",
           "vid" : "4.0"
        },
        "_type" : "_doc"
     },
     {
        "_id" : "urn:esa:psa:em16_spice::5.0",
        "_index" : "registry",
        "_score" : 7.330605,
        "_source" : {
           "lidvid" : "urn:esa:psa:em16_spice::5.0",
           "ops:Provenance/ops:superseded_by" : "urn:esa:psa:em16_spice::6.0",
           "vid" : "5.0"
        },
        "_type" : "_doc"
     },
     {
        "_id" : "urn:esa:psa:em16_spice::6.0",
        "_index" : "registry",
        "_score" : 7.350583,
        "_source" : {
           "lidvid" : "urn:esa:psa:em16_spice::6.0",
           "vid" : "6.0"
        },
        "_type" : "_doc"
     },
     {
        "_id" : "urn:esa:psa:em16_spice::7.0",
        "_index" : "registry",
        "_score" : 7.330605,
        "_source" : {
           "lidvid" : "urn:esa:psa:em16_spice::7.0",
           "vid" : "7.0"
        },
        "_type" : "_doc"
     }

@jimmie
Copy link
Member

jimmie commented Apr 4, 2023

I've manually run provenance, and now all vids prior to 7 have appropriate superseded_by values. However, this query (by lidvid) fails to return the record:

curl -XPOST "https://search-naif-prod-pm7hsg36wqejex3whlnpj3d6ma.us-west-2.es.amazonaws.com:443/registry/_search" -H"Content-type:application/json" -u naif_registry_prod:${PASS} -d'{ "query": { "bool": { "must": [ { "term" : { "lidvid": { "value": "urn:esa:psa:em16_spice::7.0" } } } ] } }, "_source": { "includes": [ "lidvid", "ops:Provenance/ops:superseded_by", "ops:Tracking_Meta/ops:archive_status" ] } }'

Oddly, both of the following queries do (first by _id, the second by lid and vid):

curl -XPOST "https://search-naif-prod-pm7hsg36wqejex3whlnpj3d6ma.us-west-2.es.amazonaws.com:443/registry/_search" -H"Content-type:application/json" -u naif_registry_prod:${PASS} -d'{ "query": { "bool": { "must": [ { "term" : { "_id": { "value": "urn:esa:psa:em16_spice::7.0" } } } ] } }, "_source": { "includes": [ "lidvid", "ops:Provenance/ops:superseded_by", "ops:Tracking_Meta/ops:archive_status" ] } }'

and

curl -XPOST "https://search-naif-prod-pm7hsg36wqejex3whlnpj3d6ma.us-west-2.es.amazonaws.com:443/registry/_search" -H"Content-type:application/json" -u naif_registry_prod:${PASS} -d'{ "query": { "bool": { "must": [ { "term" : { "lid": { "value": "urn:esa:psa:em16_spice", "boost": 1 } } }, { "term" : { "vid": { "value": "7.0", "boost": 1 } } } ] } }, "_source": { "includes": [ "lidvid", "ops:Provenance/ops:superseded_by", "ops:Tracking_Meta/ops:archive_status" ] } }'

In the results of either of these two queries, you'll see the intended value for lidvid. I have no idea what's going on here but this is clearly the root cause of the inability to access the document through the API. Thoughts anyone?

@jimmie
Copy link
Member

jimmie commented Apr 5, 2023

If TLDR, skip to the bolded part as there is another issue.

We reindexed the registry and the Opensearch queries are returning expected results. I am a bit concerned as to why this happened - the best possibility is that I didn't complete the reindexing at the time we added the superseded mapping (although we are seeing superseded by values, so that seems unlikely). The worse case is something else screwed up the lidvid mapping. We should continue to keep an eye on this (more urgent justification for full automated, routine testing).

Note however that for the lidvids indicated in the description, data won't be returned for the following reasons:

  • urn:nasa:pds:mars2020.spice::6.0 : no archive status
  • urn:nasa:pds:insight.spice::15.0 : no archive status, superseded by 16.0
  • urn:esa:psa:em16_spice::6.0 : no archive status, superseded by 7.0
  • urn:nasa:pds:maven.spice::5.0 : superseded by 30.0 (25 versions have been added after 5.0!)

The first lidvid demonstrates an issue between the API and provenance. The former looks for a particular archive status (e.g. "archive_status = 'archived'") whereas the latter obtains documents by disregarding a particular archive_status (i.e. "archive status != 'staged', or, in Opensearch terms "must_not"). Note that "must_not" will return documents that don't have a value for archive_status (which is semantically correct), so provenance will assign superseded by values pointing to documents that do not have an archive status. But these will be excluded by the API.

So while we can now get results for direct-to-Opensearch queries looking for a particular lidvid, the API will continue to not return results in the cases of no archive_status. I will create a separate ticket for this.

@tloubrieu-jpl
Copy link
Member

Thanks @jimmie so for now the immediate solution I am seeing is to add the criteria on the archive_status in provenance script.

@jimmie
Copy link
Member

jimmie commented Apr 6, 2023

Provenance is filtering on archive_status, it's the manner in which it does that should be changed (i.e. go from 'must_not' to 'must') to conform to how the API does it.

@tloubrieu-jpl
Copy link
Member

I see, thanks @jimmie .

@alexdunnjpl
Copy link
Contributor

alexdunnjpl commented Apr 6, 2023

Currently confirming

  • Products have archive-status correctly assigned by default in harvest (see assignment in registry-common)
  • Products cannot override default archive-status if label contains empty ops:Tracking_Meta/ops:archive_status field (can probably be assumed assuming on the basis that the path is completely registry-constructed)
  • registry-mgr can only set status to one of an enumerated set of values (tested bad value, and null value - neither worked)

If all three are confirmed, then we can probably just fix the way provenance handles targeting to avoid breaking when erroneous null-statuses are present and call it a day... but it also leaves the question of how the null-status records came to be in the first place.

@tloubrieu-jpl
Copy link
Member

tloubrieu-jpl commented Apr 10, 2023

Actions forward:

  • make the provenance script process the document with archive _status in a list of values , the list of values need to be the same as for registry-api
  • run registry-manager to set all archive status to archived on all the collections
  • check archive_status for the other nodes
  • investigate how one field in opensearch can become non searchable although being present.

alexdunnjpl added a commit that referenced this issue Apr 11, 2023
…ther than excluding status==staged

this aligns provenance script's behaviour with the API's default behaviour and prevents problems where archive_status of a product is erroneously nulled in db

fixes #305
alexdunnjpl added a commit that referenced this issue Apr 11, 2023
…ther than excluding status==staged

this aligns provenance script's behaviour with the API's default behaviour and prevents problems where archive_status of a product is erroneously nulled in db

fixes #305
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
6 participants