Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

task/install/deb: retry installing the package if resource temporarily unavailable #1572

Merged
merged 1 commit into from Nov 9, 2020

Conversation

lxbsz
Copy link
Member

@lxbsz lxbsz commented Oct 23, 2020

return remote.run(args=args, stdout=stdout)
except run.CommandFailedError:
if "Resource temporarily unavailable" in stdout.getvalue().lower():
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am not sure if this would work. as test nodes are always locked for exclusive use. would be better if we can figure out the other user holding the frontend lock of dpkg.

i think the first step is to catch CommandFailedError exception and run

sudo fuser -v /var/lib/dpkg/lock-frontend

if "Could not get lock" is in the error message, to understand the root cause.

also in newer versions of apt-get the error message might vary, see https://github.com/Debian/apt/blob/289ee74dd23cba7e08b08c6c3602bcf4bf8167bc/apt-pkg/contrib/fileutl.cc#L310-L313

so i'd suggest search "Could not get lock" instead.

and you might want to capture the stderr instead of stdout for the error message.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense and fixed it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am not sure if this would work. as test nodes are always locked for exclusive use. would be better if we can figure out the other user holding the frontend lock of dpkg.

i still think we need to root cause the issue before using "retry" as the cure. it's like,

well, have you tried turning it if off and on again?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked the teuthology logs again more carefully and I couldn't find any clue about who is holding the lock. We need this patch to reproduce it to list all the users of lock-frontend to dig it furture.

@kshtsk
Copy link
Contributor

kshtsk commented Nov 4, 2020

@susebot run deploy

@susebot
Copy link

susebot commented Nov 4, 2020

Commit f032123 is OK.
Check tests results in the Jenkins job: https://ceph-ci.suse.de/job/pr-teuthology-deploy/267/

@tchaikov tchaikov changed the title deb: retry installing the package if resource temporarily unavailable task/install/deb: retry installing the package if resource temporarily unavailable Nov 9, 2020
Copy link
Contributor

@tchaikov tchaikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you keep the tracker ticket open before root causing it?

@lxbsz
Copy link
Member Author

lxbsz commented Nov 9, 2020

could you keep the tracker ticket open before root causing it?

Sure. Thanks.

@tchaikov tchaikov merged commit 79c79b9 into ceph:master Nov 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants