Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issues with sysroot.bootprefix: true #1667

Open
dustymabe opened this issue Feb 7, 2024 · 9 comments
Open

issues with sysroot.bootprefix: true #1667

dustymabe opened this issue Feb 7, 2024 · 9 comments

Comments

@dustymabe
Copy link
Member

dustymabe commented Feb 7, 2024

I think we are seeing a few problems since we implemented setting sysroot.bootprefix: true in coreos/coreos-assembler@f5677a3

EDIT: this was unrelated to the sysroot.bootprefix` change. See coreos/coreos-assembler#3728

The first problem is that in the OSBuild workflow (i.e. on rawhide right now) about half the time we end up with artifacts that fail to boot:

  Booting `Fedora CoreOS 40.20240207.dev.1 (ostree:0)'

error: ../../grub-core/fs/fshelp.c:257:file
`/boot/ostree/fedora-coreos-f25027bf4dfb2213
7b7c8401ed2396924ba112482afc30d62b7a4ab1310b09db/vmlinuz-6.8.0-0.rc0.20240112gi
t70d201a40823.5.fc40.x86_64' not found.
error: ../../grub-core/loader/i386/pc/linux.c:422:you need to load the kernel
first.
 
Press any key to continue...

but it doesn't appear to be every time, which is suspect. One thing about OSBuild versus non-OSBuild is that we are using bootupd to do the bootloader install there.

The other issue we are hitting in the non-OSBuild workflow is that coreos-installer on s390x appears to not be able to handle this change either (pipeline run link) (at least I suspect the failure is related to the sysroot.bootprefix change).

 Read disk 2.2 GiB/2.2 GiB (100%)^M                                                                                                                                                      
 Read disk 2.2 GiB/2.2 GiB (100%)^M                                                                                                                                                      
 Writing Ignition config^M                                                                                                                                                               
 Copying networking configuration from /etc/NetworkManager/system-connections/^M                                                                                                         
 Copying /etc/NetworkManager/system-connections/coreos-dhcp.nmconnection to installed system^M                                                                                           
 Copying /etc/NetworkManager/system-connections/br-ex.nmconnection to installed system^M                                                                                                 
 Error: Could not add image file '/boot/ostree/fedora-coreos-36affd073876e61712d708a2e6801d958a88beae5a44525cce7fdb77e8e1cabc/vmlinuz-6.6.14-200.fc39.s390x': Could not get disk geometry
 Using config file '/tmp/coreos-installer-zipl.znTNvH' (from command line)^M                                                                                                             
 Using BLS config file '/tmp/coreos-installer-zipl-bls-Zix4Hk/loader/entries/ostree-1-fedora-coreos.conf'^M                                                                              
 Building bootmap in '/tmp/coreos-installer-B6PWED'^M                                                                                                                                    
 Building menu 'zipl-automatic-menu'^M                                                                                                                                                   
 Adding #1: IPL section 'Fedora CoreOS 39.20240205.20.2 (ostree:0)' (default)^M                                                                                                          
 Error: Command {^M                                                                                                                                                                      
     program: "zipl",^M                                                                                                                                                                  
     args: [^M                                                                                                                                                                           
         "zipl",^M                                                                                                                                                                       
         "--blsdir",^M                                                                                                                                                                   
         "/tmp/coreos-installer-zipl-bls-Zix4Hk/loader/entries",^M                                                                                                                       
         "--config",^M                                                                                                                                                                   
         "/tmp/coreos-installer-zipl.znTNvH",^M                                                                                                                                          
     ],^M                                                                                                                                                                                
     create_pidfd: false,^M                                                                                                                                                              
 } failed with exit status: 1^M                                                                                                                                                          
 Resetting partition table^M        
@dustymabe
Copy link
Member Author

revert PR for now: coreos/coreos-assembler#3723

dustymabe added a commit to coreos/coreos-assembler that referenced this issue Feb 7, 2024
This reverts commit f5677a3.

We're seeing some related problems. Tracked in
coreos/fedora-coreos-tracker#1667
dustymabe added a commit to dustymabe/coreos-assembler that referenced this issue Feb 8, 2024
This reverts commit 2a8d1e6.

After some further testing this might be what is causing the GRUB
failure mentioned in coreos/fedora-coreos-tracker#1667.
Let's revert to see if the rawhide pipeline failures clear up.
@dustymabe
Copy link
Member Author

The s390x coreos-installer failure has cleared up now that we have done coreos/coreos-assembler#3723

The GRUB failure did not clear up, however. I think it's related to coreos/coreos-assembler@2a8d1e6. Revert in coreos/coreos-assembler#3725

dustymabe added a commit to coreos/coreos-assembler that referenced this issue Feb 8, 2024
This reverts commit 2a8d1e6.

After some further testing this might be what is causing the GRUB
failure mentioned in coreos/fedora-coreos-tracker#1667.
Let's revert to see if the rawhide pipeline failures clear up.
dustymabe added a commit to dustymabe/coreos-assembler that referenced this issue Feb 10, 2024
Upstream PR: osbuild/osbuild#1574

Right now we set compression to `true` (the default) because I'm
still investigating if turning off compression is somehow causing
our artifacts to not boot: coreos/fedora-coreos-tracker#1667 (comment)
dustymabe added a commit to coreos/coreos-assembler that referenced this issue Feb 12, 2024
Upstream PR: osbuild/osbuild#1574

Right now we set compression to `true` (the default) because I'm
still investigating if turning off compression is somehow causing
our artifacts to not boot: coreos/fedora-coreos-tracker#1667 (comment)
@dustymabe
Copy link
Member Author

I split out the GRUB failure into a separate issue in coreos/coreos-assembler#3728

dustymabe added a commit to dustymabe/coreos-assembler that referenced this issue Feb 14, 2024
And drop all patches that have now been upstreamed. The only remaining
patches are one to enable s390x builds to work while we figure out [1]
and another that adds a log statement when cache eviction happens, which
I plan to upstream at some point.

[1] coreos/fedora-coreos-tracker#1667
dustymabe added a commit to dustymabe/coreos-assembler that referenced this issue Feb 14, 2024
And drop all patches that have now been upstreamed. The only remaining
patches are one to enable s390x builds to work while we figure out [1]
and another that adds a log statement when cache eviction happens, which
I plan to upstream at some point.

[1] coreos/fedora-coreos-tracker#1667
dustymabe added a commit to dustymabe/coreos-assembler that referenced this issue Feb 15, 2024
And drop all patches that have now been upstreamed. The only remaining
patches are one to enable s390x builds to work while we figure out [1]
and another that adds a log statement when cache eviction happens, which
I plan to upstream at some point.

[1] coreos/fedora-coreos-tracker#1667
@nikita-dubrovskii
Copy link

nikita-dubrovskii commented Feb 16, 2024

I've checked the issue on s390x. Because now linux and initrd contain full path with /boot/ostree/... this happens:

  • we have /boot squasfs filesystem
  • we have /tmp/coreos-installer-zipl-bls-xyz ext4 filesystem
  • both contain ostree/fedora-coreos-xyz123/ folder with vmlinuz-6.8.0.fc40 and initrd-6.8.0.fc40
  • zipl parses bls.conf and grabs kernel and initrd , which now first could be found on squashfs
  • zipl failes with disk geometry

Probably we could change installer :

  • to modify bls.conf - drop /boot/ before zipl
  • to parse bls.conf and use zipl -t -i -r syntax. As i remember we had it before but switched to --blsdir mode

Or don't use bootprefix: true

(By the way in case of installation we already copy&modify bls.conf, so sed -i 's/\boot//' could be easily performed

/cc @jlebon

dustymabe added a commit to coreos/coreos-assembler that referenced this issue Feb 16, 2024
And drop all patches that have now been upstreamed. The only remaining
patches are one to enable s390x builds to work while we figure out [1]
and another that adds a log statement when cache eviction happens, which
I plan to upstream at some point.

[1] coreos/fedora-coreos-tracker#1667
nikita-dubrovskii added a commit to nikita-dubrovskii/coreos-installer that referenced this issue Feb 19, 2024
There is no need to copy `boot/loader/entries` folder and to create
temporary `zipl.conf` , instead we could parse bls config and tell
`zipl` which image and disk to use. This also fixes an issue when
images comes with `bootprefix`.

Issue: coreos/fedora-coreos-tracker#1667
nikita-dubrovskii added a commit to nikita-dubrovskii/coreos-installer that referenced this issue Feb 21, 2024
There is no need to copy `boot/loader/entries` folder and to create
temporary `zipl.conf` , instead we could parse bls config and tell
`zipl` which image and disk to use. This also fixes an issue when
images comes with `bootprefix`.

Issue: coreos/fedora-coreos-tracker#1667
@nikita-dubrovskii
Copy link

I split out the GRUB failure into a separate issue in coreos/coreos-assembler#3728

Should we now close this issue ?

@dustymabe
Copy link
Member Author

dustymabe commented Feb 27, 2024

Yes. This was fixed with coreos/coreos-installer#1422 which landed in coreos/fedora-coreos-config@d517266 in rawhide and we switched over to setting sysroot.bootprefix: true for OSBuild built images in coreos/coreos-assembler@a7d4312

@dustymabe
Copy link
Member Author

ok found a new issue with the length of file paths for the kernel, which bootprefix seems to make worse :(

#1647 (comment)

I'll revert the change in COSA for now.

@dustymabe dustymabe reopened this Feb 27, 2024
dustymabe added a commit to dustymabe/coreos-assembler that referenced this issue Feb 27, 2024
Workaround an issue we are seeing where apparently the length
of the entry in the BLS config is causing systems to not boot.

- coreos/fedora-coreos-tracker#1667 (comment)
- coreos/fedora-coreos-tracker#1647 (comment)
@dustymabe
Copy link
Member Author

I'll revert the change in COSA for now.

OK. rather than doing a full revert of coreos/coreos-assembler@a7d4312 I added a commit to coreos/coreos-assembler#3746 to just not set sysroot.bootprefix: true for aarch64 only while we investigate #1647 further.

dustymabe added a commit to coreos/coreos-assembler that referenced this issue Feb 28, 2024
Workaround an issue we are seeing where apparently the length
of the entry in the BLS config is causing systems to not boot.

- coreos/fedora-coreos-tracker#1667 (comment)
- coreos/fedora-coreos-tracker#1647 (comment)
dustymabe added a commit to dustymabe/coreos-assembler that referenced this issue Apr 8, 2024
We were blocked by coreos/fedora-coreos-tracker#1647
but the 6.9 kernel series doesn't appear to have the same problem. The 6.8
kernel does have the problem but our kernel filenames in stable releases
of Fedora (i.e. not `rawhide`) won't be long enough to trigger the bug
so we should be able to safely remove this since `rawhide` has moved on
to 6.9 rc kernels.

This should be the final piece to close out
coreos/fedora-coreos-tracker#1667
@dustymabe
Copy link
Member Author

PR to tie off the remaining loose end here:

dustymabe added a commit to coreos/coreos-assembler that referenced this issue Apr 10, 2024
We were blocked by coreos/fedora-coreos-tracker#1647
but the 6.9 kernel series doesn't appear to have the same problem. The 6.8
kernel does have the problem but our kernel filenames in stable releases
of Fedora (i.e. not `rawhide`) won't be long enough to trigger the bug
so we should be able to safely remove this since `rawhide` has moved on
to 6.9 rc kernels.

This should be the final piece to close out
coreos/fedora-coreos-tracker#1667
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants