Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OPS] S1 L0ASP Job stuck in preparation worker due to true Ghost #1089

Open
12 tasks
suberti-ads opened this issue Jul 5, 2024 · 1 comment
Open
12 tasks
Labels
bug Something isn't working CCB Issue for CCB ops Ticket from ADS operation team workaround Workaround activated

Comments

@suberti-ads
Copy link

Environment:

  • Delivery tag:
  • Platform: OPS Orange Cloud
  • Configuration:
    rs-addon s1-l0asp v1.15

Traceability:

Current Behavior:
L0ASP doesn't publish job despite of reception of FULL segment.

Expected Behavior:
When all data available preparation worker should generate job to be processed.

Steps To Reproduce:
This issue has been seen on S1 Sample production.
To reproduced it :

  • Ingested "true ghost" then ingest Full Segment associated to this DT

Test execution artefacts (i.e. logs, screenshots…)
Preparation log (job 152500 impacted)
s1-l0asp-part1-preparation-worker-v31-5b75f68487-86wmp.log

Whenever possible, first analysis of the root cause
DT 06A5D5 has been downloaded on 2 Stations (SGS and MTI)
2 Segment generated :

  • S1A_IW_RAW__0SSH_20240704T150234_20240704T150234_054612_06A5D5_8834.SAFE
  • S1A_IW_RAW__0SSH_20240704T150048_20240704T150234_054612_06A5D5_AB85.SAFE

Due to ingestion issue first Segment generated was the end of this DT(S1A_IW_RAW__0SSH_20240704T150234_20240704T150234_054612_06A5D5_8834.SAFE)
Preparation worker job stuck in INITIAL State (other test with job 152757)

{ "_id" : NumberLong(152757), "productName" : "S1A_IW_RAW__0SSH_20240704T150048_20240704T150234_054612_06A5D5_AB85.SAFE", "generation" : { "state" : "INITIAL" } }

Following information in this job, It only missing:
ORBSCT and MPL_ORBPRE.
These 2 AUX were available on catalog.

When we search l0asp preparation logs it seems that segment was not detected has complete

{"header":{"type":"LOG","timestamp":"2024-07-04T17:14:01.781837Z","level":"INFO","line":70,"file":"InputSearchService.java","thread":"KafkaConsumerDestination{consumerDestinationName='s1-l0asp-part1.message-filter', partitions=6, dlqName='error-warning'}.container-0-C-1"},"message":{"content":"Main input search did not complete successfully: [input l0_segments_for_06A5D8] [reason Missing segments for the coverage of polarisation VV:  2024-07-04T15:18:41.466957Z 2024-07-04T15:20:00.274177Z | END 2024-07-04T15:19:59.725517Z 2024-07-04T15:20:00.274177Z | ][msg Missing inputs]"},"custom":{"logger_string":"esa.s1pdgs.cpoc.preparation.worker.service.InputSearchService"}}

Hypthesys "True Ghost" prevent the prepation worker to detect FULL segment.

Test to delete Ghost from catalog and regenerate job ==> Production OK

Impact: Timeliness on DT.

Workaround ( delete Job, scale down preparation worker, delete "Ghost Segment", scale up preparation worker, published "FULL" segment)

Bug Generic Definition of Ready (DoR)

  • The affect version in which the bug has been found is mentioned
  • The context and environment of the bug is detailed
  • The description of the bug is clear and unambiguous
  • The procedure (steps) to reproduce the bug is clearly detailed
  • The tested User Story / features is linked to the bug if available
  • Logs are attached if available
  • A data set attached if available

Bug Generic Definition of Done (DoD)

  • the modification implemented (the solution to fix the bug) is described in the bug.
  • Unit tests & Continuous integration performed - Test results available - Structural Test coverage reported by SONAR
  • Code committed in GIT with right tag or Analysis/Trade Off documentation up-to-date in reference-system-documentation repository
  • Code is compliant with coding rules (SONAR Report as evidence)
  • Acceptance criteria of the related User story are checked and Passed
@suberti-ads suberti-ads added bug Something isn't working CCB Issue for CCB ops Ticket from ADS operation team workaround Workaround activated labels Jul 5, 2024
@suberti-ads
Copy link
Author

Note : the workaround allow to start production before timeout.
It will not prevent next occurrence, and timeliness can be impacted if it was not detected early.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CCB Issue for CCB ops Ticket from ADS operation team workaround Workaround activated
Projects
None yet
Development

No branches or pull requests

1 participant