API crashes with JVM memory error on data sets with very large labels (>1MB) #296

jordanpadams · 2023-03-23T15:36:15Z

Checked for duplicates

Yes - I've already checked

🐛 Describe the bug

When I did a query for products of the SHERLOC bundle, I noticed having performing the query a few times, the API stopped working. After further investigation, it appears the API crashed, with initial thoughts being due to a JVM memory overflow error

🕵️ Expected behavior

I expected the query would return as expected.

📜 To Reproduce

performed the following query a few times:

curl --GET "https://pds.nasa.gov/api/search/1/products?q=lidvid%20like%20%22urn:nasa:pds:mars2020_sherloc*%22"

eventually started seeing 500 errors from all endpoints.
from the server logs, the following error was noted:

2023-03-22 19:01:32.751 DEBUG 1 --- [-nio-80-exec-10] org.opensearch.client.RestClient         : request [POST [https://search-en-prod-di7dor7quy7qwv3husi2wt5tde.us-west-2.es.amazonaws.com:443/registry,psa-prod:registry,naif-prod-ccs:registry,rms-prod:registry,sbnumd-prod-ccs:registry,geo-prod-ccs:registry,atm-prod-ccs:registry,sbnpsi-prod-ccs:registry,ppi-prod-ccs:registry,img-prod-ccs:registry/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true](https://search-en-prod-di7dor7quy7qwv3husi2wt5tde.us-west-2.es.amazonaws.com/registry,psa-prod:registry,naif-prod-ccs:registry,rms-prod:registry,sbnumd-prod-ccs:registry,geo-prod-ccs:registry,atm-prod-ccs:registry,sbnpsi-prod-ccs:registry,ppi-prod-ccs:registry,img-prod-ccs:registry/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true)] failed

🖥 Environment Info

No response

📚 Version of Software Used

No response

🩺 Test Data / Additional context

The bundle in question is here: https://pds-geosciences.wustl.edu/m2020/urn-nasa-pds-mars2020_sherloc/
The specific collection I am finding some large labels: https://pds-geosciences.wustl.edu/m2020/urn-nasa-pds-mars2020_sherloc/data_raw/
and example of a very large label: https://pds-geosciences.wustl.edu/m2020/urn-nasa-pds-mars2020_sherloc/data_raw/sol_00004/SS__0004_0667298185_056ECA__0010052SRLC10002_0_____J10.xml

🦄 Related requirements

No response

⚙️ Engineering Details

No response

The text was updated successfully, but these errors were encountered:

jimmie · 2023-03-23T20:51:25Z

In ecs.tf terraform script, bumped vCPU to 1024 (1 full vCPU in AWS terms) and memory to 8096 (8GB). Applied to EN only.

A concern is that while the service was returning 500's in response to API requests, the ECS health check continued to succeed (which only verifies a 200 redirect from a request for the swagger docs). We should consider a more meaningful request - I would think /classes would be a good one.

jimmie · 2023-03-23T23:14:52Z

I would suggest we deploy the updated health check at the same time we deploy the updated docker image w/ explicit JVM memory controls from #300

jordanpadams added bug Something isn't working needs:triage labels Mar 23, 2023

jordanpadams assigned jordanpadams, jimmie, tloubrieu-jpl and alexdunnjpl and unassigned jordanpadams Mar 23, 2023

jordanpadams added B15.0 sprint-backlog s.medium Medium level severity B14.0 and removed needs:triage B15.0 labels Mar 23, 2023

viviant100 mentioned this issue Mar 23, 2023

Verify Registry API/OpenSearch work with manually created label NASA-PDS/system-i-n-t#40

Closed

jordanpadams mentioned this issue Mar 23, 2023

Queries for data products with lots of metadata attributes crash in browser #292

Closed

alexdunnjpl mentioned this issue Mar 29, 2023

increase java process memory allocation to 75% of available RAM... #303

Merged

alexdunnjpl closed this as completed in #303 Mar 31, 2023

gxtchen added the i&t.done label Jul 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API crashes with JVM memory error on data sets with very large labels (>1MB) #296

API crashes with JVM memory error on data sets with very large labels (>1MB) #296

jordanpadams commented Mar 23, 2023

jimmie commented Mar 23, 2023

jimmie commented Mar 23, 2023

API crashes with JVM memory error on data sets with very large labels (>1MB) #296

API crashes with JVM memory error on data sets with very large labels (>1MB) #296

Comments

jordanpadams commented Mar 23, 2023

Checked for duplicates

🐛 Describe the bug

🕵️ Expected behavior

📜 To Reproduce

🖥 Environment Info

📚 Version of Software Used

🩺 Test Data / Additional context

🦄 Related requirements

⚙️ Engineering Details

jimmie commented Mar 23, 2023

jimmie commented Mar 23, 2023