Skip to content
This repository has been archived by the owner on Dec 22, 2022. It is now read-only.

blob is returned in product results if fields to return are not specified #51

Closed
jimmie opened this issue Jul 7, 2021 · 10 comments
Closed
Assignees
Labels
B12.0 bug Something isn't working c.api

Comments

@jimmie
Copy link
Member

jimmie commented Jul 7, 2021

馃悰 Describe the bug

If I submit a products request and do not specify which fields to return, the blob is included in the results. If I specify one or more fields, the blob is not returned and behavior is as expected.

馃摐 To Reproduce

Steps to reproduce the behavior:
This is easiest to reproduce using the Swagger UI:

  1. go to GET products
  2. for simplicity, set the limit to 1
  3. leave fields blank
  4. click "Try it out!"
  5. "ops.Label_File_Info.ops.blob" is one of the returned "property" values

Log output shows:
2021-07-07 15:53:15.170 DEBUG 21821 --- [/O dispatcher 3] org.apache.http.wire : http-outgoing-2 >> "{"from":0,"size":1,"timeout":"60s","query":{"bool":{"adjust_pure_negative":true,"boost":1.0}},"_source":{"includes":[],"excludes":["ops:Label_File_Info/ops:blob"]}}"

  1. Try again, this time in the fields, enter: summary,lidvid,lid
  2. click "Try it out!"
  3. "ops.Label_File_Info.ops.blob" is not included in the results

Log output in this case shows:
2021-07-07 15:30:58.994 DEBUG 21821 --- [/O dispatcher 2] org.apache.http.wire : http-outgoing-1 >> "{"from":0,"size":1,"timeout":"60s","query":{"bool":{"must":[{"bool":{"should":[{"exists":{"field":"summary","boost":1.0}},{"exists":{"field":"lidvid","boost":1.0}},{"exists":{"field":"lid","boost":1.0}}],"adjust_pure_negative":true,"minimum_should_match":"1","boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},"_source":{"includes":["summary","pds:File/pds:creation_date_time","ref_lid_instrument_host","pds:Time_Coordinates/pds:start_date_time","lid","ref_lid_investigation","lidvid","title","pds:Modification_Detail/pds:modification_date","ref_lid_instrument","pds:Time_Coordinates/pds:stop_date_time","product_class","vid","ref_lid_target","ops:Label_File_Info/ops:file_ref"],"excludes":["ops:Label_File_Info/ops:blob"]}}"

Note that this is running the service locally against the AWS ES instance (search-pds-dev-esext-kcq7xxa4lsrakjw33lywpjdyfy.us-west-2.es.amazonaws.com)

@jimmie jimmie added bug Something isn't working needs:triage labels Jul 7, 2021
@tloubrieu-jpl
Copy link
Member

I m not reproducing the error with ES versnio 7.10

% elasticsearch -V    
Version: 7.10.1, Build: default/tar/1c34507e66d7db1211f66f3513706fdf548736aa/2020-12-05T01:00:33.671820Z, JVM: 15.0.1

The log of the request to ES is similar:
request elasticSearch :SearchRequest{searchType=QUERY_THEN_FETCH, indices=[registry], indicesOptions=IndicesOptions[ignore_unavailable=false, allow_no_indices=true, expand_wildcards_open=true, expand_wildcards_closed=false, expand_wildcards_hidden=false, allow_aliases_to_multiple_indices=true, forbid_closed_indices=true, ignore_aliases=false, ignore_throttled=true], types=[], routing='null', preference='null', requestCache=null, scroll=null, maxConcurrentShardRequests=0, batchedReduceSize=512, preFilterShardSize=null, allowPartialSearchResults=null, localClusterAlias=null, getOrCreateAbsoluteStartMillis=-1, ccsMinimizeRoundtrips=true, source={"from":0,"size":1,"timeout":"60s","query":{"bool":{"adjust_pure_negative":true,"boost":1.0}},"_source":{"includes":[],"excludes":["ops:Label_File_Info/ops:blob"]}}}

@jimmie
Copy link
Member Author

jimmie commented Jul 7, 2021

AWS ES is version 7.9.1:

{
"name" : "3334c6c25678e7620e61ad3838447dec",
"cluster_name" : "445837347542:pds-dev-esext",
"cluster_uuid" : "o0Ed4jbcQDakdxNsVu0gNQ",
"version" : {
"number" : "7.9.1",
"build_flavor" : "oss",
"build_type" : "tar",
"build_hash" : "unknown",
"build_date" : "2020-11-03T09:54:32.349659Z",
"build_snapshot" : false,
"lucene_version" : "8.6.2",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}

@tloubrieu-jpl
Copy link
Member

And I am wondering, @jimmie are we using the open source elastic search on AWS yet ?

@jimmie
Copy link
Member Author

jimmie commented Jul 8, 2021

In gov.nasa.pds.api.engineering.elasticsearch.entities.EntitytProductWithBlob, the BLOB_PROPERTY constant is declared as:

public static final String BLOB_PROPERTY = "ops:Label_File_Info/ops:blob";

but should(?) instead be:

public static final String BLOB_PROPERTY = "ops/Label_File_Info/ops/blob";

After making this change, the BLOB is omitted even when no 'includes' are specified.

@jimmie
Copy link
Member Author

jimmie commented Jul 9, 2021

The registry schema was updated such that the proper field name (& convention) is ops:Label_File_Info/ops:blob. I have updated the registry to the 0.3.2 schema and ingested insight camera data - the blobs are no longer returned by default. I am closing the issue.

Perhaps in the future schema changes should be publicized - e.g. posted to Slack as a Fun Fact?

Other findings (that I will find the proper medium to make more well-known):

  • the directory in the harvest config file is always relative - you can not use an explicit path (i.e. one beginning with '/')
  • we should document the format of the registry authentication properties file. I always have to go look in the code to see what it's supposed to be.

@jimmie jimmie closed this as completed Jul 9, 2021
@tloubrieu-jpl
Copy link
Member

@jimmie thanks for your feedback.

Regarding harvest I am usually using an absolute path beginning with / and it works. What issue did you have with that ? An error message ? If yes you can create a ticket in the harvest or pds-registry-app repos ?

@jordanpadams
Copy link
Member

@jimmie @tloubrieu-jpl I also would like to see what is going on here. There should never be a need to "update schema". harvest and registry-mgr have both been updated over the last several months, so maybe those both need to be upgraded?

@jordanpadams jordanpadams reopened this Jul 12, 2021
@jordanpadams jordanpadams added this to the 10.Lynn.Jennings milestone Jul 12, 2021
@jimmie
Copy link
Member Author

jimmie commented Jul 12, 2021

@tloubrieu-jpl - let me try to reproduce. If so, I'll create a ticket.

@jordanpadams - the problem I ran into was that the blob field name changed from ops/Label_file_info/ops/blob to ops:Label_file_info/ops:blob. The current version (3.2) of the registry api service was explicitly excluding the latter but since the blobs were stored under the former, it was being included in the results.

@jimmie
Copy link
Member Author

jimmie commented Jul 12, 2021

@tloubrieu-jpl - I was unable to recreate the issue w/ the explicit vs relative directories. Obviously PBKAC... Apologies.

@tloubrieu-jpl
Copy link
Member

@jimmie used the registry-app tools to upgrade the registry.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
B12.0 bug Something isn't working c.api
Projects
None yet
Development

No branches or pull requests

3 participants