-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libsolv %_dbpath fixes rollup #2553
Conversation
/retest |
Ohh, I think Prow may not do CI for draft PRs by default and so switching to draft cancelled those jobs? Let's see if the explicit test request works. |
OK, successfully failed here:
|
24f929d
to
d2f6e00
Compare
Looks like commit message needs adjusting too, otherwise LGTM! |
One thing to bear in mind is this will also break the use case of e.g. |
Anyways, LGTM - let's lift the draft and remove "DNM:" tag? |
I'm waiting on CI to make sure it's green with the new libsolv before dropping that commit and tweaking the commit message. Although... probably not a bad idea to just keep that override anyway. |
d2f6e00
to
7db243e
Compare
7db243e
to
afeef97
Compare
Yeah let's keep the override for a little bit at least I think; we have coverage of older libsolv via kola tests. |
Huh interesting, upgrade test failed with
This is the OK so the main CoreOS CI flow all uses the updated rpm-ostree compose side, where as this test will end up exercising "new rpm-ostree client side, old server side". |
OK coreos-assembler already has the new libsolv, so that was used at compose time. It'll be in the target ostree too. The upgrade test grabbed...ah yes, the testing-devel build with libsolv reverted. It was also built with an rpm-ostree before this patch, obviously. So our starting state is (old rpm-ostree, old libsolv) and then kola pulls an ostree commit with (new rpm-ostree, new libsolv). I...think what's happening is it's actually the old rpm-ostree failing to load pkglist from the new ostree commit. |
Aaand...trying to reproduce this locally it passes, of course. |
@jlebon OK if we restart CI on this? I want to see if this is somehow a flake (seems unlikely though). Or we could try to push more debugging bits to this PR. |
@@ -5,7 +5,8 @@ parallel rpms: { | |||
def n = 5 | |||
cosaPod(buildroot: true, runAsUser: 0, memory: "2Gi", cpu: "${n}") { | |||
checkout scm | |||
shwrap("""RPM_BUILD_NCPUS=${n} ./ci/coreosci-rpmbuild.sh | |||
// 2:1 job to CPU at most should keep us from getting kicked out | |||
shwrap("""RPM_BUILD_NCPUS=${n} CARGO_BUILD_JOBS=${n} ./ci/coreosci-rpmbuild.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm shouldn't cargo respect the make jobserver? rust-lang/cargo#4110
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Eh we can look at this later)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, seems like it would.
Ohhh, I think it might be the cmake call for libdnf which is not capped. I wonder if the issue is that the jobserver isn't surviving the make
-> cmake
-> make
chain. Briefly searching around, it's not clear to me whether cmake supports using a parent job server for the children make processes.
Next time we hit the libgomp thing, let's try something like this instead:
diff --git a/.cci.jenkinsfile b/.cci.jenkinsfile
index 5cd9e699..29d11897 100644
--- a/.cci.jenkinsfile
+++ b/.cci.jenkinsfile
@@ -6,7 +6,7 @@ parallel rpms: {
cosaPod(buildroot: true, runAsUser: 0, memory: "2Gi", cpu: "${n}") {
checkout scm
// 2:1 job to CPU at most should keep us from getting kicked out
- shwrap("""RPM_BUILD_NCPUS=${n} CARGO_BUILD_JOBS=${n} ./ci/coreosci-rpmbuild.sh
+ shwrap("""RPM_BUILD_NCPUS=${n} CMAKE_BUILD_PARALLEL_LEVEL=${n} ./ci/coreosci-rpmbuild.sh
mv packaging/*/*.rpm .
""")
// make it easy for anyone to download the RPMs
Hmm, or maybe that should be in the spec file instead since RPM_BUILD_NCPUS
really should be the only top-level knob.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohhh, I think it might be the cmake call for libdnf which is not capped. I wonder if the issue is that the jobserver isn't surviving the make -> cmake -> make chain.
I think it should all be inherited by the environment by default? Unless the child make explicitly invokes make -jN
which AIUI resets things.
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cgwalters, jlebon The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
OK next failure is
which must be related; I bet we were implicitly relying on |
Testing out locally
|
New changes are detected. LGTM label has been removed. |
Yep that works, pushed. |
We're definitely reaping the value of all that work on the test suite here! |
Yup thanks, makes sense to me! I think a lot of the work we're doing here is good prep for when we actually move the rpmdb to |
We need to make sure that we can work with newer libsolv, which changed how the rpmdb is found (see coreos#2548).
…hare/rpm We trigger a librpm macro file load in many of our paths. Since the default value shipped by rpm's macro file sets `_dbpath` to `/var/lib/rpm`, we have to explicitly set that back to `/usr/share/rpm` in those paths. This became more problematic recently with libsolv v0.7.17 which fully keys off of `_dbpath` to find the rpmdb path to load: openSUSE/libsolv@04d4d03 And it's not technically wrong; we really should make that macro not lie. This is what this patch does by injecting an RPM macro file in our composes which sets it to /usr/share/rpm. So then e.g. the `rpm` CLI doesn't actually need the `/var/lib/rpm` backcompat link anymore, though there's no harm in leaving it. In the future, we should be able to drop this once we move all of Fedora to `/usr/lib/sysimage/rpm` (see coreos/fedora-coreos-tracker#639). Closes: coreos#2548
We do fallback here.
We don't technically need this yet, but it mirrors how it's set up in our composes so that if there's code that wants to use the new location too, it'll just work.
Follow-up to previous commit: we had another path where we made a temporary rootfs and symlinked `/var/lib/rpm` to the base rpmdb. That of course broke now that we inject a macro to point the rpmdb to `/usr/share/rpm`. Rework this to use `/usr/share/rpm` since that's our canonical location for now, but also add the compat symlinks so that this logic should keep working even on trees without the injected macro yet.
We lost this at some point during the CI re-shuffle. We need to constrain cargo builds too to respect our CPU allocation. This doesn't totally keep all jobs under 5 since e.g. we could have 5 make jobs and 5 cargo codegen builds going at once, but I think as long as it's not something ridiculous like 40, it should be fine. Otherwise we'll tighten it more.
Now that we inject the `%_dbpath /usr/share/rpm` macro, `rpm -q` will start using it. But in RPM script invocation, we don't want them to see any RPM database at all - trying to query it should be a clean failure.
23a05d2
to
6e7107a
Compare
New changes are detected. LGTM label has been removed. |
Fixed the compose test case to look for |
will this be backported in https://src.fedoraproject.org/rpms/rpm-ostree ? |
Yes, either that or we do a new release. Probably makes the most sense to backport. |
We trigger a librpm macro file load in many of our paths. Since the
default value shipped by rpm's macro file sets
_dbpath
to/var/lib/rpm
, we have to explicitly set that back to/usr/share/rpm
in those paths.
This became more problematic recently with libsolv v0.7.17 which fully
keys off of
_dbpath
to find the rpmdb path to load:openSUSE/libsolv@04d4d03
And it's not technically wrong; we really should make that macro not
lie. This is what this patch does by injecting an RPM macro file in our
composes which sets it to /usr/lib/sysimage/rpm (which currently is a
symlink to /usr/share/rpm, but eventually will become the canonical
location). So then e.g. the
rpm
CLI doesn't actually need the/var/lib/rpm
backcompat link anymore, though there's no harm inleaving it.
In the future, we should be able to drop this once we move all of Fedora
to
/usr/lib/sysimage/rpm
(seecoreos/fedora-coreos-tracker#639).
Closes: #2548