Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some AWS instance types fail to get networking starting with 35.20211226.20.0 #1066

Closed
dustymabe opened this issue Jan 11, 2022 · 43 comments
Closed
Assignees
Labels
jira for syncing to jira kind/bug

Comments

@dustymabe
Copy link
Member

Describe the bug
It seems as if networking never comes up enough to reach the metadata service:

[    7.950274] ignition[559]: INFO     : PUT http://169.254.169.254/latest/api/token: attempt #1
[    7.956165] ignition[559]: INFO     : PUT error: Put "http://169.254.169.254/latest/api/token": dial tcp 169.254.169.254:80: connect: network is unreachable

Full log: i-0ab8a90bd2a4814db.log

You can reproduce easily with something like:

cosa buildfetch --stream=testing-devel --build=35.20220107.20.0 --force

cosa kola run --build=35.20220107.20.0 -p=aws --aws-credentials-file=/srv/creds --aws-region=us-east-1 basic --aws-type=m4.large

The instance never boots. Checking out the system log shows the infinite Ignition attempts.

@dustymabe
Copy link
Member Author

dustymabe commented Jan 11, 2022

It's also notable that the cosa kola run command never seems to time out so there is some gap in our timeout logic here. The Jenkins runs are timing out at the Jenkins level and we're leaking instances (never cleaning them up).

This "Jenkins Timeout" versus a hard failure caused us to think that Jenkins was just having trouble and ignoring the behavior. Combine it with the Holiday's where we aren't paying attention as much and we get this perfect storm.

@dustymabe dustymabe added the jira for syncing to jira label Jan 11, 2022
@mike-nguyen
Copy link
Member

mike-nguyen commented Jan 11, 2022

I performed a bisect to find when the m4.large instances started failing. It seems that 35.20211226.20.0 is when the m4.large instances stopped working. Prior builds are working fine.

The change in 35.20211226.20.0 was a kernel bump a CVE.

Upgraded:

    kernel 5.15.10-200.fc35.x86_64 → 5.15.11-200.fc35.x86_64
    kernel-core 5.15.10-200.fc35.x86_64 → 5.15.11-200.fc35.x86_64
    kernel-modules 5.15.10-200.fc35.x86_64 → 5.15.11-200.fc35.x86_64
    libzstd 1.5.0-2.fc35.x86_64 → 1.5.1-1.fc35.x86_64 

SecAdvisories:

    FEDORA-2021-4f1a2cdf2e (moderate severity)
        Packages:
            kernel-5.15.11-200.fc35.x86_64
            kernel-core-5.15.11-200.fc35.x86_64
            kernel-modules-5.15.11-200.fc35.x86_64 
        CVEs:
            CVE-2021-28714 CVE-2021-28715 xen: guest can force Linux netback driver to hog large amounts of kernel memory
            CVE-2021-28711 CVE-2021-28712 CVE-2021-28713 xen: rogue backends can cause DoS of guests via high frequency events

@mike-nguyen
Copy link
Member

mike-nguyen commented Jan 11, 2022

I took the latest testing-devel and pinned the kernel to 5.15.10-200.fc35.x86_64 and booting m4.large instances works.

@mike-nguyen
Copy link
Member

I ran a few more tests:

  • Started with 35.20211224.20.0 and ran rpm-ostree rebase --bypass-driver fedora-compose: to update. I could no longer connect through SSH afterwards. The console shows the host is up but networking doesn't seem to be working.
  • Used the latest Fedora Cloud (ami-08b4ee602f76bff79) and it booted fine. Updated with dnf update -y, rebooted, and it was fine. I noticed only kernel-core is installed for Fedora Cloud.
  • Used the latest Fedora Cloud (ami-08b4ee602f76bff79) and installed the latest kernel, kernel-core, and kernel-modules (-5.15.13-200.fc35.x86_64) rebooted and everything worked fine.

@dustymabe dustymabe changed the title AWS m4.large instances fail to boot starting with 35.20220106.20.0 AWS m4.large instances fail to boot starting with 35.20211226.20.0 Jan 11, 2022
@dustymabe
Copy link
Member Author

One thing to note here is that @mike-nguyen is reporting that 35.20211224.20.0 is the last good testing-devel, but we have testing (35.20220103.2.0) and next (35.20220103.1.0) releases that went out last week that both passed tests on m4.large instance types. I just spawned one successfully with:

cosa kola spawn -b fcos --stream testing -p aws --aws-credentials-file /srv/creds --aws-type=m4.large

So there's at least something to investigate there.

@mike-nguyen
Copy link
Member

mike-nguyen commented Jan 11, 2022

This was still hung after 20 minutes (m4.large):

[coreos-assembler]$ kola spawn -b fcos --stream testing -p aws --aws-credentials-file /srv/.aws/credentials --aws-type=m4.large
Resolved distro=fcos stream=testing platform=aws arch=x86_64 to release=35.20220103.2.0 (region us-east-1, ami-0b50205d463aa46d9)
^C

I ran an m5.large instance with the same stream and it booted up right away. Seems like testing could also be broken.

[coreos-assembler]$ kola spawn -b fcos --stream testing -p aws --aws-credentials-file /srv/.aws/credentials --aws-type=m5.large
Resolved distro=fcos stream=testing platform=aws arch=x86_64 to release=35.20220103.2.0 (region us-east-1, ami-0b50205d463aa46d9)
Fedora CoreOS 35.20220103.2.0
Tracker: https://github.com/coreos/fedora-coreos-tracker
Discuss: https://discussion.fedoraproject.org/c/server/coreos/

[bound] -bash-5.1$ exit
logout

@dustymabe
Copy link
Member Author

Weird. Here's my terminal output from the run:

[dustymabe@media fcos]$ cosa kola spawn -b fcos --stream testing -p aws --aws-credentials-file /srv/creds --aws-type=m4.large
...
<snip>
...
kola --output-dir tmp/kola spawn -b fcos --stream testing -p aws --aws-credentials-file /srv/creds --aws-type=m4.large
Resolved distro=fcos stream=testing platform=aws arch=x86_64 to release=35.20220103.2.0 (region us-east-1, ami-0b50205d463aa46d9)
Fedora CoreOS 35.20220103.2.0
Tracker: https://github.com/coreos/fedora-coreos-tracker
Discuss: https://discussion.fedoraproject.org/c/server/coreos/

[bound] -bash-5.1$ 
[bound] -bash-5.1$ 
[bound] -bash-5.1$ 
[bound] -bash-5.1$ rpm-ostree status 
State: idle
Deployments:
* fedora:fedora/x86_64/coreos/testing
                   Version: 35.20220103.2.0 (2022-01-04T23:47:56Z)
                    Commit: dfcd79a42dfd10d9f60c8dd8bd63c6c5f976af64738ce42720b0120d1d20b26b
              GPGSignature: Valid signature by 787EA6AE1147EEE56C40B30CDB4639719867C58F
[bound] -bash-5.1$

@dustymabe
Copy link
Member Author

And running it again failed. So yeah. Looks like we just got lucky when we ran basic tests on the single m4.large instance for testing and next streams.

@dustymabe
Copy link
Member Author

Some more information. The difference between Fedora Cloud Base and
FCOS seems to be in the NIC driver:

Fedora 35 Cloud Base - ami-08b4ee602f76bff79 - (from release day)

$ rpm -qa | grep kernel
kernel-core-5.14.10-300.fc35.x86_64

$ ethtool -i eth0
driver: vif
version: 5.14.10-300.fc35.x86_64
firmware-version:
expansion-rom-version:
bus-info: vif-0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no

After full update (sudo dnf update -y && sudo reboot):

$ rpm -qa | grep kernel
kernel-core-5.14.10-300.fc35.x86_64
kernel-core-5.15.13-200.fc35.x86_64

$ ethtool -i eth0
driver: vif
version: 5.15.13-200.fc35.x86_64
firmware-version:
expansion-rom-version:
bus-info: vif-0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no

On latest FCOS stable 35.20211215.3.0:

$ ethtool -i ens3
driver: ixgbevf
version: 5.15.7-200.fc35.x86_64
firmware-version: 
expansion-rom-version: 
bus-info: 0000:00:03.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: yes
supports-priv-flags: yes

In Fedora Cloud Base it's using the xen_netfront module which is aliased to vif.

Indeed in FCOS we have enhanced networking enabled so that's why we're using the ixgbevf module.

[dustymabe@media ~]$ aws ec2 describe-images --image-id ami-0b50205d463aa46d9 --query "Images[].SriovNetSupport" # testing 35.20220103.2.0
simple
[dustymabe@media ~]$ aws ec2 describe-images --image-id ami-007fb1d49ba422a31 --query "Images[].SriovNetSupport" # stable 35.20211215.3.0
simple
[dustymabe@media ~]$ aws ec2 describe-images --image-id ami-08b4ee602f76bff79 --query "Images[].SriovNetSupport" # 35 cloud base
[dustymabe@media ~]$

So it must be some issue with the driver in the newer kernel.
Unfortunately, while you can enable the enhanced networking for an
instance,

aws ec2 modify-instance-attribute --instance-id instance_id --sriov-net-support simple

you can't disable it.

@dustymabe
Copy link
Member Author

dustymabe commented Jan 12, 2022

Until we get a fix we need to investigate if we can properly denylist the ixgbevf module and successfully have it use the vif(xen_netfront) module even if enhanced networking is enabled for an instance.

If we can't we'll probably need to revert to the older kernel in testing/next.

@mike-nguyen
Copy link
Member

I got similar results overnight. I ran the basic test with 35.20220107.20.0 (version in the reproducer) 20 times last night and it passed 1 time. With fedora-coreos-config commit 11ec3b601acec7e8a8d1e2fd730c14aff07ae24d (latest testing-devel) with the kernel pinned at 5.15.10-200.fc35.x86_64 it passed 20 times.

@mike-nguyen
Copy link
Member

mike-nguyen commented Jan 12, 2022

For completeness, I enabled enhanced networking on Fedora Cloud Base 35 from release day (ami-08b4ee602f76bff79) and I was not able to SSH to the VM afterwards.

Here are the steps:

  • Provisioned m4.large instances of ami-08b4ee602f76bff79 (from release day) and updated with dnf -y update
  • Verified login after rebooting system
  • Verified the system had the ixgbevf module by running modinfo ixgbevf
  • Shut down the VM
  • Enabled enhanced networking with aws ec2 modify-instance-attribute --instance-id <instance-id> --sriov-net-support simple
  • Started the VM
  • Verified the host was up by looking at the system screenshot through the AWS web console
  • Verified inability to SSH to the VM (note: the IP address changed after enabling enhanced networking)

@dustymabe
Copy link
Member Author

I finally managed to get into a testing (35.20220103.2.0) machine (as @mike-nguyen mentioned above the success rate is ~5%):

Fedora CoreOS 35.20220103.2.0
Tracker: https://github.com/coreos/fedora-coreos-tracker
Discuss: https://discussion.fedoraproject.org/c/server/coreos/

[bound] -bash-5.1$ 
[bound] -bash-5.1$ rpm -q kernel
kernel-5.15.12-200.fc35.x86_64
[bound] -bash-5.1$ 
[bound] -bash-5.1$ ip -4 -o a
1: lo    inet 127.0.0.1/8 scope host lo\       valid_lft forever preferred_lft forever
2: ens3    inet 172.31.24.155/20 brd 172.31.31.255 scope global dynamic noprefixroute ens3\       valid_lft 3153sec preferred_lft 3153sec
[bound] -bash-5.1$ ethtool -i ens3
driver: ixgbevf
version: 5.15.12-200.fc35.x86_64
firmware-version: 
expansion-rom-version: 
bus-info: 0000:00:03.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: yes
supports-priv-flags: yes

In case it ends up mattering this particular instance is in us-east-1b.

@dustymabe
Copy link
Member Author

After a bunch more tries I got up another instance in us-east-1a so the availability zone doesn't seem to be the contributing factor.

@davdunc
Copy link
Contributor

davdunc commented Jan 12, 2022

AWS Internal Review Reference: tt:V502410588

@dustymabe
Copy link
Member Author

Until we get a fix we need to investigate if we can properly denylist the ixgbevf module and successfully have it use the vif(xen_netfront) module even if enhanced networking is enabled for an instance.

If we can't we'll probably need to revert to the older kernel in testing/next.

For now we'll revert the kernel. Options:

Looking at CVE info (details below) there were no kernel CVEs fixed between kernel-5.15.7-200.fc35 and kernel-5.15.10-200.fc35 so let's stick with the more conservative approach of kernel-5.15.7-200.fc35.

'35.20220111.20.0'
ostree diff commit from: compose:fedora/x86_64/coreos/testing-devel^ (7c0c009111c60e35ab551b2d56dacdc7b516f4ec37fed8c9280fa82965c1956e)
ostree diff commit to:   compose:fedora/x86_64/coreos/testing-devel (1887b30792f3e884015420fe4fdeff257b05e43cae0f4d51756bd876f06b5254)
Upgraded:
  kernel 5.15.12-200.fc35 -> 5.15.13-200.fc35
  kernel-core 5.15.12-200.fc35 -> 5.15.13-200.fc35
  kernel-modules 5.15.12-200.fc35 -> 5.15.13-200.fc35
  libuv 1:1.42.0-2.fc35 -> 1:1.43.0-2.fc35

'35.20220110.20.1'
ostree diff commit from: compose:fedora/x86_64/coreos/testing-devel^^ (735ed358cc020931b0931c7a7e651bed14d4a4c00bfaa890b9d80a2c7681af72)
ostree diff commit to:   compose:fedora/x86_64/coreos/testing-devel^ (7c0c009111c60e35ab551b2d56dacdc7b516f4ec37fed8c9280fa82965c1956e)
Upgraded:
  git-core 2.33.1-2.fc35 -> 2.34.1-1.fc35
  ostree 2021.6-3.fc35 -> 2022.1-1.fc35
  ostree-libs 2021.6-3.fc35 -> 2022.1-1.fc35
  rpm-ostree 2021.14-2.fc35 -> 2022.1-2.fc35
  rpm-ostree-libs 2021.14-2.fc35 -> 2022.1-2.fc35
  selinux-policy 35.7-1.fc35 -> 35.8-1.fc35
  selinux-policy-targeted 35.7-1.fc35 -> 35.8-1.fc35

'35.20220107.20.0'
ostree diff commit from: compose:fedora/x86_64/coreos/testing-devel^^^ (71cb6b657b22bcab45a5a76211215caebc6c9a557a287c359a6bc23182844483)
ostree diff commit to:   compose:fedora/x86_64/coreos/testing-devel^^ (735ed358cc020931b0931c7a7e651bed14d4a4c00bfaa890b9d80a2c7681af72)
Upgraded:
  vim-data 2:8.2.3755-1.fc35 -> 2:8.2.4006-1.fc35
  vim-minimal 2:8.2.3755-1.fc35 -> 2:8.2.4006-1.fc35

SecAdvisories:
    FEDORA-2022-a3d70b50f0  Low        vim-data-2:8.2.4006-1.fc35.noarch
    FEDORA-2022-a3d70b50f0  Low        vim-minimal-2:8.2.4006-1.fc35.x86_64
      CVE-2021-4136 vim: heap-based buffer overflow in eval_lambda() in src/eval.c
      https://bugzilla.redhat.com/show_bug.cgi?id=2034720
      CVE-2021-4166 vim: out-of-bounds read in do_arg_all() in src/arglist.c
      https://bugzilla.redhat.com/show_bug.cgi?id=2035928
      CVE-2021-4173 vim: use-after-free with nested :def function
      https://bugzilla.redhat.com/show_bug.cgi?id=2035930
      CVE-2021-4187 vim: use-after-free vulnerability
      https://bugzilla.redhat.com/show_bug.cgi?id=2036129
'35.20220106.20.1'
ostree diff commit from: compose:fedora/x86_64/coreos/testing-devel^^^^ (6d8b724642228e09340c67cab2ed160f29d5fda5dc8fdc2d34b8fe85dc3f20a3)
ostree diff commit to:   compose:fedora/x86_64/coreos/testing-devel^^^ (71cb6b657b22bcab45a5a76211215caebc6c9a557a287c359a6bc23182844483)

'35.20220106.20.0'
ostree diff commit from: compose:fedora/x86_64/coreos/testing-devel^^^^^ (6294f72a791a0b7b1c0b832e02cc123cb31008965bc41fad30e4d7eaddf17f2c)
ostree diff commit to:   compose:fedora/x86_64/coreos/testing-devel^^^^ (6d8b724642228e09340c67cab2ed160f29d5fda5dc8fdc2d34b8fe85dc3f20a3)
Upgraded:
  libcap-ng 0.8.2-6.fc35 -> 0.8.2-8.fc35
  libzstd 1.5.1-3.fc35 -> 1.5.1-4.fc35

'35.20220103.20.0'
ostree diff commit from: compose:fedora/x86_64/coreos/testing-devel^^^^^^ (4315b7332222716c73cf7b0112f4bf243ac0d659eccb5d85c7d698221bd4e704)
ostree diff commit to:   compose:fedora/x86_64/coreos/testing-devel^^^^^ (6294f72a791a0b7b1c0b832e02cc123cb31008965bc41fad30e4d7eaddf17f2c)
Upgraded:
  coreos-installer 0.12.0-1.fc35 -> 0.12.0-2.fc35
  coreos-installer-bootinfra 0.12.0-1.fc35 -> 0.12.0-2.fc35

'35.20220101.20.0'
ostree diff commit from: compose:fedora/x86_64/coreos/testing-devel^^^^^^^ (7f3f4420e736af69e39d76ebfa67292145d13b7c25b88c4c1f63248bcf94832d)
ostree diff commit to:   compose:fedora/x86_64/coreos/testing-devel^^^^^^ (4315b7332222716c73cf7b0112f4bf243ac0d659eccb5d85c7d698221bd4e704)
Upgraded:
  kernel 5.15.11-200.fc35 -> 5.15.12-200.fc35
  kernel-core 5.15.11-200.fc35 -> 5.15.12-200.fc35
  kernel-modules 5.15.11-200.fc35 -> 5.15.12-200.fc35

SecAdvisories:
    FEDORA-2021-a7a558062e  Important  kernel-5.15.12-200.fc35.x86_64
    FEDORA-2021-a7a558062e  Important  kernel-core-5.15.12-200.fc35.x86_64
    FEDORA-2021-a7a558062e  Important  kernel-modules-5.15.12-200.fc35.x86_64
      CVE-2021-45469 kernel: out-of-bounds memory access in __f2fs_setxattr() in fs/f2fs/xattr.c when an inode has an invalid last xattr entry
      https://bugzilla.redhat.com/show_bug.cgi?id=2035817
'35.20211231.20.0'
ostree diff commit from: compose:fedora/x86_64/coreos/testing-devel^^^^^^^^ (c81987715cfd689acaac9e205527d7b8ba108e84405ba79cdbd1ad7c5c4a55a0)
ostree diff commit to:   compose:fedora/x86_64/coreos/testing-devel^^^^^^^ (7f3f4420e736af69e39d76ebfa67292145d13b7c25b88c4c1f63248bcf94832d)
Upgraded:
  crun 1.3-1.fc35 -> 1.4-1.fc35
  libzstd 1.5.1-2.fc35 -> 1.5.1-3.fc35

'35.20211230.20.0'
ostree diff commit from: compose:fedora/x86_64/coreos/testing-devel^^^^^^^^^ (971748df89e4daa9261fbdc3cbb50fd8019e370749ab9a70ff305836f67588e0)
ostree diff commit to:   compose:fedora/x86_64/coreos/testing-devel^^^^^^^^ (c81987715cfd689acaac9e205527d7b8ba108e84405ba79cdbd1ad7c5c4a55a0)
Upgraded:
  libzstd 1.5.1-1.fc35 -> 1.5.1-2.fc35

'35.20211226.20.0'
ostree diff commit from: compose:fedora/x86_64/coreos/testing-devel^^^^^^^^^^ (2da47ccd0eb17138609d87fe8bd2934e542593a3b98517a1e0c2b7c11765f065)
ostree diff commit to:   compose:fedora/x86_64/coreos/testing-devel^^^^^^^^^ (971748df89e4daa9261fbdc3cbb50fd8019e370749ab9a70ff305836f67588e0)
Upgraded:
  kernel 5.15.10-200.fc35 -> 5.15.11-200.fc35
  kernel-core 5.15.10-200.fc35 -> 5.15.11-200.fc35
  kernel-modules 5.15.10-200.fc35 -> 5.15.11-200.fc35
  libzstd 1.5.0-2.fc35 -> 1.5.1-1.fc35

SecAdvisories:
    FEDORA-2021-4f1a2cdf2e  Moderate   kernel-5.15.11-200.fc35.x86_64
    FEDORA-2021-4f1a2cdf2e  Moderate   kernel-core-5.15.11-200.fc35.x86_64
    FEDORA-2021-4f1a2cdf2e  Moderate   kernel-modules-5.15.11-200.fc35.x86_64
      CVE-2021-28714 CVE-2021-28715 xen: guest can force Linux netback driver to hog large amounts of kernel memory
      https://bugzilla.redhat.com/show_bug.cgi?id=2031199
      CVE-2021-28711 CVE-2021-28712 CVE-2021-28713 xen: rogue backends can cause DoS of guests via high frequency events
      https://bugzilla.redhat.com/show_bug.cgi?id=2034940
'35.20211224.20.0'
ostree diff commit from: compose:fedora/x86_64/coreos/testing-devel^^^^^^^^^^^ (97a0f141f2cd3f9f32beb26c94ce64767e524cf82f7aa94ae8dec1264059d280)
ostree diff commit to:   compose:fedora/x86_64/coreos/testing-devel^^^^^^^^^^ (2da47ccd0eb17138609d87fe8bd2934e542593a3b98517a1e0c2b7c11765f065)
Upgraded:
  gnupg2 2.3.3-2.fc35 -> 2.3.4-1.fc35
  selinux-policy 35.6-1.fc35 -> 35.7-1.fc35
  selinux-policy-targeted 35.6-1.fc35 -> 35.7-1.fc35

'35.20211222.20.0'
ostree diff commit from: compose:fedora/x86_64/coreos/testing-devel^^^^^^^^^^^^ (4cfeb90becc7ac07a8287819d9773bfbfa63473251c013f5a0559fc34666acb3)
ostree diff commit to:   compose:fedora/x86_64/coreos/testing-devel^^^^^^^^^^^ (97a0f141f2cd3f9f32beb26c94ce64767e524cf82f7aa94ae8dec1264059d280)
Upgraded:
  bind-libs 32:9.16.23-1.fc35 -> 32:9.16.24-1.fc35
  bind-license 32:9.16.23-1.fc35 -> 32:9.16.24-1.fc35
  bind-utils 32:9.16.23-1.fc35 -> 32:9.16.24-1.fc35
  linux-firmware 20211027-126.fc35 -> 20211216-127.fc35
  linux-firmware-whence 20211027-126.fc35 -> 20211216-127.fc35

'35.20211221.20.1'
ostree diff commit from: compose:fedora/x86_64/coreos/testing-devel^^^^^^^^^^^^^ (2d5a8af2fc30f20c1d8e5728e4236de3ec5f35139993182973d1251fd465d5ab)
ostree diff commit to:   compose:fedora/x86_64/coreos/testing-devel^^^^^^^^^^^^ (4cfeb90becc7ac07a8287819d9773bfbfa63473251c013f5a0559fc34666acb3)

'35.20211221.20.0'
ostree diff commit from: compose:fedora/x86_64/coreos/testing-devel^^^^^^^^^^^^^^ (347a053bec10b3e05069aa99eb310091046515d8cb9f5dd985f0840fb2181025)
ostree diff commit to:   compose:fedora/x86_64/coreos/testing-devel^^^^^^^^^^^^^ (2d5a8af2fc30f20c1d8e5728e4236de3ec5f35139993182973d1251fd465d5ab)
Upgraded:
  kernel 5.15.8-200.fc35 -> 5.15.10-200.fc35
  kernel-core 5.15.8-200.fc35 -> 5.15.10-200.fc35
  kernel-modules 5.15.8-200.fc35 -> 5.15.10-200.fc35

'35.20211220.20.0'
ostree diff commit from: compose:fedora/x86_64/coreos/testing-devel^^^^^^^^^^^^^^^ (02689ff7dca982fbb3e581bf261b8abc5a7888dbb4afe705f4969977a6a96dca)
ostree diff commit to:   compose:fedora/x86_64/coreos/testing-devel^^^^^^^^^^^^^^ (347a053bec10b3e05069aa99eb310091046515d8cb9f5dd985f0840fb2181025)
Upgraded:
  fwupd 1.7.2-1.fc35 -> 1.7.3-1.fc35
  libxcrypt 4.4.26-4.fc35 -> 4.4.27-1.fc35

'35.20211218.20.0'
ostree diff commit from: compose:fedora/x86_64/coreos/testing-devel^^^^^^^^^^^^^^^^ (c0650a5c94c0de87826d68540d2dedcccaec32be025a499a7027c78ab2a1dea8)
ostree diff commit to:   compose:fedora/x86_64/coreos/testing-devel^^^^^^^^^^^^^^^ (02689ff7dca982fbb3e581bf261b8abc5a7888dbb4afe705f4969977a6a96dca)
Upgraded:
  ca-certificates 2021.2.50-3.fc35 -> 2021.2.52-1.0.fc35
  console-login-helper-messages 0.21.2-2.fc35 -> 0.21.2-3.fc35
  console-login-helper-messages-issuegen 0.21.2-2.fc35 -> 0.21.2-3.fc35
  console-login-helper-messages-motdgen 0.21.2-2.fc35 -> 0.21.2-3.fc35
  console-login-helper-messages-profile 0.21.2-2.fc35 -> 0.21.2-3.fc35
  coreos-installer 0.11.0-1.fc35 -> 0.12.0-1.fc35
  coreos-installer-bootinfra 0.11.0-1.fc35 -> 0.12.0-1.fc35
  kernel 5.15.7-200.fc35 -> 5.15.8-200.fc35
  kernel-core 5.15.7-200.fc35 -> 5.15.8-200.fc35
  kernel-modules 5.15.7-200.fc35 -> 5.15.8-200.fc35
  podman 3:3.4.2-1.fc35 -> 3:3.4.4-1.fc35
  podman-plugins 3:3.4.2-1.fc35 -> 3:3.4.4-1.fc35

SecAdvisories:
    FEDORA-2021-6bc3fe7129  Moderate   podman-3:3.4.4-1.fc35.x86_64
    FEDORA-2021-6bc3fe7129  Moderate   podman-plugins-3:3.4.4-1.fc35.x86_64
      CVE-2021-4024 podman: podman machine spawns gvproxy with port bound to all IPs
      https://bugzilla.redhat.com/show_bug.cgi?id=2026675
'35.20211215.20.1'
ostree diff commit from: compose:fedora/x86_64/coreos/testing-devel^^^^^^^^^^^^^^^^^ (c478fa0ff9444078e7ef45f8350a265ced0e862592c16505cbc0896b1b8f54f9)
ostree diff commit to:   compose:fedora/x86_64/coreos/testing-devel^^^^^^^^^^^^^^^^ (c0650a5c94c0de87826d68540d2dedcccaec32be025a499a7027c78ab2a1dea8)

'35.20211215.20.0'
ostree diff commit from: compose:fedora/x86_64/coreos/testing-devel^^^^^^^^^^^^^^^^^^ (98164e633b7db8f45f5d6d5824138d5524bc596f4b477028b67efcdfb9690cf6)
ostree diff commit to:   compose:fedora/x86_64/coreos/testing-devel^^^^^^^^^^^^^^^^^ (c478fa0ff9444078e7ef45f8350a265ced0e862592c16505cbc0896b1b8f54f9)
Upgraded:
  kernel 5.15.6-200.fc35 -> 5.15.7-200.fc35
  kernel-core 5.15.6-200.fc35 -> 5.15.7-200.fc35
  kernel-modules 5.15.6-200.fc35 -> 5.15.7-200.fc35

'35.20211213.20.2'
ostree diff commit from: compose:fedora/x86_64/coreos/testing-devel^^^^^^^^^^^^^^^^^^^ (b605bf93e0cdb0b32e542031aba609ce3285b6804169b76c3e85eaf1c396ee12)
ostree diff commit to:   compose:fedora/x86_64/coreos/testing-devel^^^^^^^^^^^^^^^^^^ (98164e633b7db8f45f5d6d5824138d5524bc596f4b477028b67efcdfb9690cf6)

'35.20211213.20.1'
ostree diff commit from: compose:fedora/x86_64/coreos/testing-devel^^^^^^^^^^^^^^^^^^^^ (c6eda5655c8c3c95e568f1a13d96e1f9711bd1c2eac519c436be365eb8b42443)
ostree diff commit to:   compose:fedora/x86_64/coreos/testing-devel^^^^^^^^^^^^^^^^^^^ (b605bf93e0cdb0b32e542031aba609ce3285b6804169b76c3e85eaf1c396ee12)


dustymabe added a commit to dustymabe/fedora-coreos-config that referenced this issue Jan 12, 2022
Newer kernels seem to have an issue with enhanced networking on some
AWS instance types so they don't boot. Let's pin on the older kernel
for now while we investigate and find a proper solution.

See coreos/fedora-coreos-tracker#1066
@dustymabe
Copy link
Member Author

pin PR: coreos/fedora-coreos-config#1416

dustymabe added a commit to coreos/fedora-coreos-config that referenced this issue Jan 12, 2022
Newer kernels seem to have an issue with enhanced networking on some
AWS instance types so they don't boot. Let's pin on the older kernel
for now while we investigate and find a proper solution.

See coreos/fedora-coreos-tracker#1066
@dustymabe
Copy link
Member Author

I just built a rawhide locally and uploaded an AMI. It seems to be running and passing kola tests (run hasn't completed yet) with kernel-5.16.0-60.fc36.

@dustymabe
Copy link
Member Author

I just built a rawhide locally and uploaded an AMI. It seems to be running and passing kola tests (run hasn't completed yet) with kernel-5.16.0-60.fc36.

I spoke too soon. It seems like the failure rate is more like 50-60% though, rather than 95%, which is why I thought it was fixed.

dustymabe added a commit to dustymabe/fedora-coreos-config that referenced this issue Jan 12, 2022
Newer kernels seem to have an issue with enhanced networking on some
AWS instance types so they don't boot. Let's pin on the older kernel
for now while we investigate and find a proper solution.

See coreos/fedora-coreos-tracker#1066

(cherry picked from commit 467af82)
dustymabe added a commit to dustymabe/fedora-coreos-config that referenced this issue Jan 12, 2022
Newer kernels seem to have an issue with enhanced networking on some
AWS instance types so they don't boot. Let's pin on the older kernel
for now while we investigate and find a proper solution.

See coreos/fedora-coreos-tracker#1066

(cherry picked from commit 467af82)
dustymabe added a commit to dustymabe/fedora-coreos-config that referenced this issue Mar 9, 2022
This is the first kernel with the most recent revert that allows
us to not regress on some AWS instance types [1]. Because it is
newer than 5.16.11 it also allows for us to pick up the fix to
CVE-2022-0847 [2].

[1] coreos/fedora-coreos-tracker#1066
[2] coreos/fedora-coreos-tracker#1118
dustymabe added a commit to dustymabe/fedora-coreos-config that referenced this issue Mar 9, 2022
This is the first kernel with the most recent revert that allows
us to not regress on some AWS instance types [1]. Because it is
newer than 5.16.11 it also allows for us to pick up the fix to
CVE-2022-0847 [2].

[1] coreos/fedora-coreos-tracker#1066
[2] coreos/fedora-coreos-tracker#1118
dustymabe added a commit to dustymabe/fedora-coreos-config that referenced this issue Mar 9, 2022
This is the first kernel with the most recent revert that allows
us to not regress on some AWS instance types [1]. Because it is
newer than 5.16.11 it also allows for us to pick up the fix to
CVE-2022-0847 [2].

[1] coreos/fedora-coreos-tracker#1066
[2] coreos/fedora-coreos-tracker#1118
dustymabe added a commit to dustymabe/fedora-coreos-config that referenced this issue Mar 9, 2022
This is the first kernel with the most recent revert that allows
us to not regress on some AWS instance types [1]. Because it is
newer than 5.16.11 it also allows for us to pick up the fix to
CVE-2022-0847 [2].

[1] coreos/fedora-coreos-tracker#1066
[2] coreos/fedora-coreos-tracker#1118
dustymabe added a commit to dustymabe/fedora-coreos-config that referenced this issue Mar 9, 2022
This is the first kernel with the most recent revert that allows
us to not regress on some AWS instance types [1]. Because it is
newer than 5.16.11 it also allows for us to pick up the fix to
CVE-2022-0847 [2].

[1] coreos/fedora-coreos-tracker#1066
[2] coreos/fedora-coreos-tracker#1118
dustymabe added a commit to dustymabe/fedora-coreos-config that referenced this issue Mar 9, 2022
This is the first kernel with the most recent revert that allows
us to not regress on some AWS instance types [1]. Because it is
newer than 5.16.11 it also allows for us to pick up the fix to
CVE-2022-0847 [2].

[1] coreos/fedora-coreos-tracker#1066
[2] coreos/fedora-coreos-tracker#1118
dustymabe added a commit to dustymabe/fedora-coreos-config that referenced this issue Mar 9, 2022
This is the first kernel with the most recent revert that allows
us to not regress on some AWS instance types [1]. Because it is
newer than 5.16.11 it also allows for us to pick up the fix to
CVE-2022-0847 [2].

[1] coreos/fedora-coreos-tracker#1066
[2] coreos/fedora-coreos-tracker#1118
dustymabe added a commit to dustymabe/fedora-coreos-config that referenced this issue Mar 9, 2022
This is the first kernel with the most recent revert that allows
us to not regress on some AWS instance types [1]. Because it is
newer than 5.16.11 it also allows for us to pick up the fix to
CVE-2022-0847 [2].

[1] coreos/fedora-coreos-tracker#1066
[2] coreos/fedora-coreos-tracker#1118
dustymabe added a commit to coreos/fedora-coreos-config that referenced this issue Mar 9, 2022
This is the first kernel with the most recent revert that allows
us to not regress on some AWS instance types [1]. Because it is
newer than 5.16.11 it also allows for us to pick up the fix to
CVE-2022-0847 [2].

[1] coreos/fedora-coreos-tracker#1066
[2] coreos/fedora-coreos-tracker#1118
dustymabe added a commit to coreos/fedora-coreos-config that referenced this issue Mar 9, 2022
This is the first kernel with the most recent revert that allows
us to not regress on some AWS instance types [1]. Because it is
newer than 5.16.11 it also allows for us to pick up the fix to
CVE-2022-0847 [2].

[1] coreos/fedora-coreos-tracker#1066
[2] coreos/fedora-coreos-tracker#1118
dustymabe added a commit to coreos/fedora-coreos-config that referenced this issue Mar 9, 2022
This is the first kernel with the most recent revert that allows
us to not regress on some AWS instance types [1]. Because it is
newer than 5.16.11 it also allows for us to pick up the fix to
CVE-2022-0847 [2].

[1] coreos/fedora-coreos-tracker#1066
[2] coreos/fedora-coreos-tracker#1118
dustymabe added a commit to coreos/fedora-coreos-config that referenced this issue Mar 9, 2022
This is the first kernel with the most recent revert that allows
us to not regress on some AWS instance types [1]. Because it is
newer than 5.16.11 it also allows for us to pick up the fix to
CVE-2022-0847 [2].

[1] coreos/fedora-coreos-tracker#1066
[2] coreos/fedora-coreos-tracker#1118
@dustymabe dustymabe removed the meeting topics for meetings label Mar 16, 2022
@davdunc
Copy link
Contributor

davdunc commented Apr 9, 2022

@carnil
Copy link

carnil commented Apr 28, 2022

@dustymabe
Copy link
Member Author

We did the revert to get our streams back in shape and since then none of our tests have failed so I assumed necessary fixes landed in the kernel. Closing this out.

HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this issue Oct 10, 2023
Newer kernels seem to have an issue with enhanced networking on some
AWS instance types so they don't boot. Let's pin on the older kernel
for now while we investigate and find a proper solution.

See coreos/fedora-coreos-tracker#1066
HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this issue Oct 10, 2023
We shipped out updates this morning to `testing` and `next`
with a downgraded kernel that matches what is already in `stable`.
For our upcoming regularly scheduled releases let's at least get to
the latest possible known good kernel.

See coreos/fedora-coreos-tracker#1066
HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this issue Oct 10, 2023
This kernel has a revert [1] that allows us to get AWS instance types
working again [2] and also is newer so it includes a fix for recent
CVE-2022-0185 [3].

[1] https://gitlab.com/cki-project/kernel-ark/-/commit/63aede4
[2] coreos/fedora-coreos-tracker#1066 (comment)
[3] https://bugzilla.redhat.com/show_bug.cgi?id=2042052
HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this issue Oct 10, 2023
This allows us to get the latest kernel-5.16.12-200.fc35. Moving to
a kernel newer than 5.16.11 picks up the fix fo CVE-2022-0847. We're
able to do this because the Fedora kernel maintainers agreed to again
pick up a revert that allows us to not regress on some AWS instance
types (coreos/fedora-coreos-tracker#1066).

Closes coreos/fedora-coreos-tracker#1118
HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this issue Oct 10, 2023
This is the first kernel with the most recent revert that allows
us to not regress on some AWS instance types [1]. Because it is
newer than 5.16.11 it also allows for us to pick up the fix to
CVE-2022-0847 [2].

[1] coreos/fedora-coreos-tracker#1066
[2] coreos/fedora-coreos-tracker#1118
HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this issue Oct 10, 2023
Newer kernels seem to have an issue with enhanced networking on some
AWS instance types so they don't boot. Let's pin on the older kernel
for now while we investigate and find a proper solution.

See coreos/fedora-coreos-tracker#1066
HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this issue Oct 10, 2023
We shipped out updates this morning to `testing` and `next`
with a downgraded kernel that matches what is already in `stable`.
For our upcoming regularly scheduled releases let's at least get to
the latest possible known good kernel.

See coreos/fedora-coreos-tracker#1066
HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this issue Oct 10, 2023
This kernel has a revert [1] that allows us to get AWS instance types
working again [2] and also is newer so it includes a fix for recent
CVE-2022-0185 [3].

[1] https://gitlab.com/cki-project/kernel-ark/-/commit/63aede4
[2] coreos/fedora-coreos-tracker#1066 (comment)
[3] https://bugzilla.redhat.com/show_bug.cgi?id=2042052
HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this issue Oct 10, 2023
This allows us to get the latest kernel-5.16.12-200.fc35. Moving to
a kernel newer than 5.16.11 picks up the fix fo CVE-2022-0847. We're
able to do this because the Fedora kernel maintainers agreed to again
pick up a revert that allows us to not regress on some AWS instance
types (coreos/fedora-coreos-tracker#1066).

Closes coreos/fedora-coreos-tracker#1118
HuijingHei pushed a commit to HuijingHei/fedora-coreos-config that referenced this issue Oct 10, 2023
This is the first kernel with the most recent revert that allows
us to not regress on some AWS instance types [1]. Because it is
newer than 5.16.11 it also allows for us to pick up the fix to
CVE-2022-0847 [2].

[1] coreos/fedora-coreos-tracker#1066
[2] coreos/fedora-coreos-tracker#1118
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira for syncing to jira kind/bug
Projects
None yet
Development

No branches or pull requests

7 participants