Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

config/functions: eliminate unecessary dashboard flock() #3447

Merged
merged 3 commits into from
Apr 25, 2019
Merged

config/functions: eliminate unecessary dashboard flock() #3447

merged 3 commits into from
Apr 25, 2019

Conversation

MilhouseVH
Copy link
Contributor

We only update the dashboard after updating the history, and since we already have an exclusive lock on the history there's no need to take a lock on the status (dashboard) file too, so we can dispense with the flock() on .status entirely.

(Hiding the whitespace changes makes the relevant changes much easier to see!)

@MilhouseVH
Copy link
Contributor Author

Added a second commit which removes support for non-multithreaded calls to update_dashboard() - nothing currently uses this functionality, and is unlikely to in future.

@MilhouseVH
Copy link
Contributor Author

MilhouseVH commented Apr 24, 2019

Third commit added: "remove owner when unlocking"

One of the Jenkins builds (Allwinner/A64 with UBOOT_SYSTEM=orangepi_win) has been failing repeatedly with the following:

rm: cannot remove '/var/lib/jenkins/LE/build2/workspace/orangepi_win/build.LibreELEC-A64.arm-9.1-devel/.threads/locks/distutilscross:host.build.failed': No such file or directory
FAILURE: scripts/install simplejson has failed!
[213/240] [FAIL] install simplejson

The following logs for this failure are available:
  stdout: /var/lib/jenkins/LE/build2/workspace/orangepi_win/build.LibreELEC-A64.arm-9.1-devel/.threads/logs/219/stdout
  stderr: /var/lib/jenkins/LE/build2/workspace/orangepi_win/build.LibreELEC-A64.arm-9.1-devel/.threads/logs/219/stderr

parallel: This job failed:
package_worker 8 219 240 'install simplejson'

Grepping the history for distutilscross:host we can see:

2019-04-24 13:09:56.342641875: <09699> [04/216] LOCKED  build   distutilscross:host
2019-04-24 13:10:01.575549377: <09699> [04/216] ACTIVE  build   distutilscross:host
2019-04-24 13:10:05.529836257: <09699> [04/216] UNLOCK  build   distutilscross:host                 (built)
2019-04-24 13:10:47.120378959: <13125> [05/218] LOCKED  build   distutilscross:host
2019-04-24 13:10:47.450057692: <13125> [05/218] UNLOCK  build   distutilscross:host                 (already built)
2019-04-24 13:10:54.627084365: <17956> [08/219] LOCKED  build   distutilscross:host
2019-04-24 13:10:54.837867158: <17956> [08/219] UNLOCK  build   distutilscross:host                 (already built)
2019-04-24 13:11:15.299462253: <31153> [08/219] LOCKED  install distutilscross:host
2019-04-24 13:11:15.501959560: <31342> [01/220] LOCKED  build   distutilscross:host
2019-04-24 13:11:15.613853444: <31482> [08/219] LOCKED  build   distutilscross:host
2019-04-24 13:11:15.722740910: <31342> [01/220] UNLOCK  build   distutilscross:host                 (already built)
2019-04-24 13:11:15.933875098: <31482> [08/219] UNLOCK  build   distutilscross:host                 (already built)

What is happening is that seq 01/220 acquires the exclusive build lock, but before it is able to set itself as the new owner seq 08/219 attempts to also gain the exclusive lock, which fails. However because seq 08/219 is still recorded as the current owner (from when it previously held the lock) the build system permits seq 08/219 to continue processing - so now we have two processes building distutilscross:host concurrently.

Since the package has already been built this isn't really an issue. However once the .failed marker file is removed by seq 01/220 (when it "unlocks" the package), there is no .failed file for seq 08/219 to remove when it subsequently unlocks the same package, so seq 08/219 - which is actually the simplejson process - fails with the error at the top of this comment.

Removing the .owner file when unlocking the package avoids this racy behaviour.

@MilhouseVH
Copy link
Contributor Author

Removing the .owner file revealed another race situation, whereby we could spin indefinitely trying to read a non-existent .owner file if the .owner file is removed in the period between failing to acquire the flock() and reading the .owner file with read -r.

The updated commit now replaces sleep 1 with flock --wait 1 which fixes the new race, and is also a small performance improvement. It also means we can refactor the locking code to always read the .owner file as this should not exist unless the package is already locked, and if it doesn't exist we will attempt to acquire the lock until we succeed or the .owner file appears.

@jernejsk jernejsk merged commit 45899b4 into LibreELEC:master Apr 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants