You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
diego-release version or other BOSH releases you have deployed - Diego v2.66.2
Possible Causes or Fixes (optional)
The reason seems to be, that after the initial crash, BBS records in the DB that the "crash_count" for this LRP is 1 (or more in case of frequent subsequent crashes. But single sporadic crashes would result in crash_count = 1.
mediates the issue definitely
What it does is that since we are anyway in the CrashActualLRP, we just need to ensure that the Crash Event would be sent. So setting the CrashCount of the before lrp to after-1 seems to be enough to pass the check
Additional Text Output, Screenshots, contextual information (optional)
The text was updated successfully, but these errors were encountered:
Summary
If an LRP has already crashed once and then it crashed again more than 5 mins later, a crash event is not reported.
Steps to Reproduce
while true; do cf ssh spring-music -c "kill -9 \$(pidof java)" ; sleep 600; done
cf events
and see a sequence ofinstead of
Diego repo
Environment Details
Possible Causes or Fixes (optional)
The reason seems to be, that after the initial crash, BBS records in the DB that the "crash_count" for this LRP is 1 (or more in case of frequent subsequent crashes. But single sporadic crashes would result in crash_count = 1.
Then on subsequent crashes this code in actual_lrp_db
Will actually set the
newCrashCount = 1
and later in actual_lrp_event_calculator.go/generateUnclaimedInstanceEventswill not append the
NewActualLRPCrashedEvent
because bothCrashCount
s are equal.Not quite sure how to properly fix it but this fix in actual_lrp_lifecycle_controller/CrashActualLRP
mediates the issue definitely
What it does is that since we are anyway in the
CrashActualLRP
, we just need to ensure that the Crash Event would be sent. So setting theCrashCount
of thebefore
lrp toafter-1
seems to be enough to pass the checkAdditional Text Output, Screenshots, contextual information (optional)
The text was updated successfully, but these errors were encountered: