Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[8.0] feat: Make RemoteRunner more resilient to CE issues #7606

Merged

Conversation

aldbr
Copy link
Contributor

@aldbr aldbr commented May 8, 2024

In LHCb, we are using a new HPC in "pre-production" and the connection is quite unstable for now.
To avoid reporting a Done job as Failed just because we cannot get its status or its outputs, we retry to contact the CE a few times.

BEGINRELEASENOTES
*WorkloadManagement
CHANGE: Make RemoteRunner more resilient to CE issues
ENDRELEASENOTES

@DIRACGridBot DIRACGridBot added the alsoTargeting:integration Cherry pick this PR to integration after merge label May 8, 2024
@fstagni fstagni merged commit 89c2b7a into DIRACGrid:rel-v8r0 May 10, 2024
26 checks passed
@DIRACGridBot DIRACGridBot added the sweep:done All sweeping actions have been done for this PR label May 10, 2024
DIRACGridBot pushed a commit to DIRACGridBot/DIRAC that referenced this pull request May 10, 2024
@DIRACGridBot
Copy link

Sweep summary

Sweep ran in https://github.com/DIRACGrid/DIRAC/actions/runs/9029762875

Successful:

  • integration

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
alsoTargeting:integration Cherry pick this PR to integration after merge sweep:done All sweeping actions have been done for this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants