Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting PKGBUILD file can fail due to pacman database lock #88

Closed
fergus-dall opened this issue Dec 30, 2020 · 2 comments
Closed

Getting PKGBUILD file can fail due to pacman database lock #88

fergus-dall opened this issue Dec 30, 2020 · 2 comments

Comments

@fergus-dall
Copy link
Contributor

When not being run directly on a PKGBUILD, repro runs a short script inside an ephemeral container based on the downloaded chroot. This script involves running pacman -S asp devtools --noconfirm to install tools required to download and validate the PKGBUILD file, which in turn will fail if pacman cannot take the lock on its database.

Early in it's run, repro will also try and update the chroot by running pacman -Syu --noconfirm inside it, which also takes the pacman database lock. In order to guard against this possibility, repro locks the file root.lock before doing this, and skips this step if it's already locked. However, the step above does not try and take this lock, so two instances of repro can try and run these two steps simultaneously.

The fact that the PKGBUILD step is run in an ephemeral container prevents it from damaging the update step, but not vice versa. If the snapshot used by the ephemeral container is taken while the update is in progress, it will include the lock file, and the PKGBUILD step will fail.

This can be seen by doing something like:

for i in {1..20}
do
    CACHEDIR=/path/to/cache repro ./acl-2.2.53-3-x86_64.pkg.tar.zst 2>output-${i} >output-${i} &
    sleep 1
done

Most of the spawned repro instances will fail due to this issue.

Taking a shared lock on root.lock for this step (lines 301-318 in repro.in) solves this issue, and will also prevent issues that may arise from operating in a chroot that is mid-upgrade. This should also be done for the other ephemeral containers used to set up the build chroot, though I haven't actually seen any issues from this yet, probably because chroot updates are relatively rare.

Also related to this lock, when setting up the chroot for the first time, repro attempts to take the lock, but doesn't check if this operation succeeds (line 215). Thus, multiple instances of repro can be trying to install the chroot at the same time. The correct order of operations should be:

if [ ! -d "$BUILDDIRECTORY"/root ]; then
    # block on the lock file
    nlock 9 "$BUILDDIRECTORY"/root.lock
    if [ -d "$BUILDDIRECTORY"/root ]; then
        # chroot was created by someone else, don't need to do anything
        unlock 9
    else
        # set up chroot
    fi
fi
@Foxboron
Copy link
Member

Yes, this was working correctly before the reimplementation with ephemeral nspawn containers. Limited testing with multiple workers probably lead to this being discovered by accident.

I actually have local patches for this sitting in my tree, I just haven't had time to assemble a POC so I can verify the fix. There is also a locking issue for the keyring. So I need to rework the locking functions (exclusive and shared) and ensure we grab exclusive and shared locks.

Thanks for the detailed issue!

@Foxboron
Copy link
Member

I believe the series of commits from March solving quite a few locking issue fixes this problem. I forgot to tag the issue :) Please reopen if that is not the case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants