You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In LHCb we've seen the below callstack happening from time to time. I think it's a race between different RPC calls that asynchronously set the Status parameter. There is a retry_on_conflict option in elasticsearch but I'm not convinced this is the right thing to do.
This is not specific to 8.0 release, as it was happening in LHCbDIRAC installation also before. Colleagues running older versions of DIRAC might see the same. The error also can be seen in slightly different forms, e.g.
2023-01-23 16:13:09 UTC WorkloadManagement/JobStateUpdate NOTICE: Returning response ([::ffff:202.122.32.249]:49810)[lhcb_mc:fstagni] (0.49 secs) ERROR: Server error while serving setJobParameters: ConflictError(409, 'version_conflict_engine_exception', '[709513455]: version conflict, required seqNo [8530694], primary term [1]. current document has seqNo [8531980] and primary term [1]')
But it is anyway the same.
The indices where we are doing these updates only have 1 replica, but in any case the update operation performed here is "heavy" from OpenSearch pov.
I am also not convinced that using retry_on_conflict is the right thing to do, but at the moment I don't see a better option.
In LHCb we've seen the below callstack happening from time to time. I think it's a race between different RPC calls that asynchronously set the
Status
parameter. There is aretry_on_conflict
option in elasticsearch but I'm not convinced this is the right thing to do.The text was updated successfully, but these errors were encountered: