New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
task/install/deb: retry installing the package if resource temporarily unavailable #1572
Conversation
return remote.run(args=args, stdout=stdout) | ||
except run.CommandFailedError: | ||
if "Resource temporarily unavailable" in stdout.getvalue().lower(): | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am not sure if this would work. as test nodes are always locked for exclusive use. would be better if we can figure out the other user holding the frontend lock of dpkg.
i think the first step is to catch CommandFailedError
exception and run
sudo fuser -v /var/lib/dpkg/lock-frontend
if "Could not get lock" is in the error message, to understand the root cause.
also in newer versions of apt-get
the error message might vary, see https://github.com/Debian/apt/blob/289ee74dd23cba7e08b08c6c3602bcf4bf8167bc/apt-pkg/contrib/fileutl.cc#L310-L313
so i'd suggest search "Could not get lock" instead.
and you might want to capture the stderr instead of stdout for the error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense and fixed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am not sure if this would work. as test nodes are always locked for exclusive use. would be better if we can figure out the other user holding the frontend lock of dpkg.
i still think we need to root cause the issue before using "retry" as the cure. it's like,
well, have you tried turning it if off and on again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checked the teuthology logs again more carefully and I couldn't find any clue about who is holding the lock. We need this patch to reproduce it to list all the users of lock-frontend
to dig it furture.
e65dcc0
to
0b9d582
Compare
Fixes: https://tracker.ceph.com/issues/46878 Signed-off-by: Xiubo Li <xiubli@redhat.com>
@susebot run deploy |
Commit f032123 is OK. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you keep the tracker ticket open before root causing it?
Sure. Thanks. |
Fixes: https://tracker.ceph.com/issues/46878
Signed-off-by: Xiubo Li xiubli@redhat.com