Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Failed to update pull secret on the disk on Centos 9 Stream #3446

Open
danpawlik opened this issue Dec 7, 2022 · 9 comments
Open

[BUG] Failed to update pull secret on the disk on Centos 9 Stream #3446

danpawlik opened this issue Dec 7, 2022 · 9 comments
Labels
kind/bug Something isn't working nested-virt To identify issues that are related to nested virtualization os/linux resolution/unsupported Unsupported setup; hardware and/or software, nested virtualization

Comments

@danpawlik
Copy link

danpawlik commented Dec 7, 2022

General information

  • OS: Linux
  • Hypervisor: KVM
  • Did you run crc setup before starting it (Yes/No)? - Yes
  • Running CRC on: VM

CRC version

CRC version: 2.11.0+823e40d
OpenShift version: 4.11.13
Podman version: 4.2.0

CRC config

- consent-telemetry                     : no
- kubeadmin-password                   : 123456789
- pull-secret-file                      : pull-secret.txt

Host Operating System

NAME="CentOS Stream"
VERSION="9"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="9"
PLATFORM_ID="platform:el9"
PRETTY_NAME="CentOS Stream 9"
ANSI_COLOR="0;31"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:centos:centos:9"
HOME_URL="https://centos.org/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux 9"
REDHAT_SUPPORT_PRODUCT_VERSION="CentOS Stream"

Steps to reproduce

  1. Update Centos 9 stream
  2. Setup CRC
  3. Start CRC

Expected

CRC will start normally.

Actual

DEBU Making call to close connection to plugin binary
Failed to update pull secret on the disk: Temporary error: pull secret not updated to disk (x159)

Logs

https://gist.github.com/danpawlik/608fd45ce9e8642ce43baace625575d4

Before gather the logs try following if that fix your issue

$ crc delete -f
$ crc cleanup
$ crc setup
$ crc start --log-level debug

Also it does not help.

@danpawlik danpawlik added kind/bug Something isn't working status/need triage labels Dec 7, 2022
@praveenkumar
Copy link
Member

@danpawlik have you able to reproduce it constantly?

@danpawlik
Copy link
Author

danpawlik commented Dec 7, 2022

Unfortunately yes.

@praveenkumar
Copy link
Member

Also I can see Running CRC on: VM which means you are using nested virtualization setup which is not tested by us. Can you use https://github.com/crc-org/crc/wiki/Debugging-guide one and ssh to the VM and the check /var/lib/kubelet/config.json file exist with your pull secret content (this is what this check do which is failing for you) ?

@danpawlik
Copy link
Author

danpawlik commented Dec 7, 2022

thanks @praveenkumar .

So on the VM, the file contains:

{}

so it was not copied.

I see a lot traceback in dmesg:

[  880.704548] RIP: 0033:0x7f8b19a3ec6b
[  880.704736] Code: 73 01 c3 48 8b 0d b5 b1 1b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0
 ff ff 73 01 c3 48 8b 0d 85 b1 1b 00 f7 d8 64 89 01 48
[  880.705596] RSP: 002b:00007f88b7ffe4a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  880.705966] RAX: ffffffffffffffda RBX: 00007f88c4ff8e50 RCX: 00007f8b19a3ec6b
[  880.706317] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000001c
[  880.706684] RBP: 0000000000000000 R08: 0000000000000000 R09: 00000000000000ff
[  880.707052] R10: 00007f88b0051580 R11: 0000000000000246 R12: 000055b9750bf620
[  880.707412] R13: 00007f88c4ff8ff0 R14: 9ebdce38c25acb00 R15: 00007f88c4ff8e48
[  880.707754]  </TASK>
[  880.707887] Call Trace:
[  880.708010]  <TASK>
[  880.708116]  x86_pmu_stop+0x50/0xb0
[  880.708289]  x86_pmu_del+0x73/0x190
[  880.708463]  event_sched_out.part.0+0x7a/0x1f0
[  880.708679]  group_sched_out.part.0+0x93/0xf0
[  880.708898]  ctx_sched_out+0x124/0x2a0
[  880.709083]  perf_event_context_sched_out+0x1a5/0x460
[  880.709329]  __perf_event_task_sched_out+0x50/0x170
[  880.709572]  ? pick_next_task+0x51/0x940
[  880.709766]  prepare_task_switch+0xbd/0x2a0
[  880.709997]  __schedule+0x1cb/0x620
[  880.710172]  schedule+0x5a/0xc0
[  880.710331]  xfer_to_guest_mode_handle_work+0xac/0xe0
[  880.710578]  vcpu_run+0x1f5/0x250 [kvm]
[  880.710801]  kvm_arch_vcpu_ioctl_run+0x104/0x620 [kvm]
[  880.711079]  kvm_vcpu_ioctl+0x271/0x670 [kvm]
[ 1898.648039] RIP: 0033:0x7f8b19a3ec6b
[ 1898.648213] Code: 73 01 c3 48 8b 0d b5 b1 1b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0
 ff ff 73 01 c3 48 8b 0d 85 b1 1b 00 f7 d8 64 89 01 48
[ 1898.649089] RSP: 002b:00007f88c57f84a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 1898.649448] RAX: ffffffffffffffda RBX: 00007f88c5ffae50 RCX: 00007f8b19a3ec6b
[ 1898.649790] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000001b
[ 1898.650141] RBP: 0000000000000000 R08: 0000000000000000 R09: 00000000000000ff
[ 1898.650481] R10: 00007f88b0051580 R11: 0000000000000246 R12: 000055b9750af730
[ 1898.650815] R13: 00007f88c5ffaff0 R14: 9ebdce38c25acb00 R15: 00007f88c5ffae48
[ 1898.651154]  </TASK>
[ 1898.651824] Call Trace:
[ 1898.652155]  <TASK>
[ 1898.652401]  amd_pmu_enable_all+0x44/0x60
[ 1898.652851]  __perf_install_in_context+0x16c/0x220
[ 1898.653372]  remote_function+0x47/0x50
[ 1898.653781]  generic_exec_single+0x78/0xb0
[ 1898.654254]  smp_call_function_single+0xeb/0x130
[ 1898.654569]  ? sw_perf_event_destroy+0x60/0x60
[ 1898.654871]  perf_install_in_context+0xcf/0x200
[ 1898.655173]  ? ctx_resched+0xe0/0xe0
[ 1898.655416]  perf_event_create_kernel_counter+0x114/0x180
[ 1898.655776]  pmc_reprogram_counter.constprop.0+0xec/0x220 [kvm]
[ 1898.656230]  amd_pmu_set_msr+0x106/0x170 [kvm_amd]
[ 1898.656562]  ? __svm_vcpu_run+0x67/0x110 [kvm_amd]
[ 1898.656898]  ? get_gp_pmc_amd+0x129/0x200 [kvm_amd]
[ 1898.657235]  __kvm_set_msr+0x7f/0x1c0 [kvm]
[ 1898.657567]  kvm_emulate_wrmsr+0x52/0x1b0 [kvm]
[ 1898.657923]  vcpu_enter_guest+0x667/0x1010 [kvm]
[ 1898.658277]  ? kvm_get_rflags+0xe/0x30 [kvm]
[ 1898.658606]  ? svm_get_if_flag+0x1d/0x50 [kvm_amd]
[ 1898.658931]  ? kvm_apic_has_interrupt+0x32/0x90 [kvm]
[ 1898.659311]  ? kvm_cpu_has_interrupt+0x60/0x80 [kvm]
[ 1898.659681]  vcpu_run+0x33/0x250 [kvm]
[ 1898.659977]  kvm_arch_vcpu_ioctl_run+0x104/0x620 [kvm]
[ 1898.660365]  kvm_vcpu_ioctl+0x271/0x670 [kvm]
[ 1898.660702]  ? __seccomp_filter+0x45/0x470

The odd think here is, that on Centos 8 Stream is working normally (same hypervisor that got AMD CPU, just instance has been rebuilt).

I have done one more test on different Cloud Provider with same image and the result is.... it is working normally (but there was Intel CPU).

I will try to dig more, what is breaking the crc start there. Maybe it would be helpful to others that got same issue.

@danpawlik
Copy link
Author

Workaround, ansible-playbook:

---
# This playbook deploy crc and prepare VM to make a snapshot, that later
# can be deployed in CI.
- hosts: crc.dev
  become: true
  tasks:
    - name: Install packages
      yum:
        name:
          - qemu-kvm-common
        state: present

    - name: Ensure CentOS runs with selinux permissive
      selinux:
        policy: targeted
        state: permissive

    - name: Enable nested virtualization
      lineinfile:
        path: /etc/modprobe.d/kvm.conf
        regexp: '^#options kvm_amd nested=1'
        line: 'options kvm_amd nested=1'

    # From https://lore.kernel.org/lkml/20220830235537.4004585-8-seanjc@google.com/T/
    - name: Disable ept
      shell: |
        sed -i 's/net.ifnames=0/net.ifnames=0 ept=0/g' /etc/default/grub

    - name: Regenerate grub
      shell: |
        grub2-mkconfig -o /boot/grub2/grub.cfg

# REBOOT HOST.

After applying playbook, it seems that it move forward.
IMO the issue is not on crc side, but kvm/libvirt.

@danpawlik
Copy link
Author

Still it does not deploy CRC. Created bug for kvm https://bugzilla.redhat.com/show_bug.cgi?id=2151878

@cfergeau
Copy link
Contributor

cfergeau commented Dec 8, 2022

For what it's worth, there has been multiple similar reports in the past
#3366 (comment)
#1830

(searching closed issues for "AMD Intel" might give more results)

@gbraad gbraad added os/linux nested-virt To identify issues that are related to nested virtualization resolution/unsupported Unsupported setup; hardware and/or software, nested virtualization and removed status/need triage labels Dec 20, 2022
@gbraad
Copy link
Contributor

gbraad commented Dec 20, 2022

Marking as unsupported. This is not something we can resolve as this is relate to nested virtualization and the 'incompatiblity' (read existing bugs: https://marc.info/?l=kvm&m=166886061623174&w=2) with some AMD Ryzen/Epyc CPUs.

@danpawlik
Copy link
Author

Workaround on Centos 9 Stream: install kernel from elrepo.org.

Steps:

  1. rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
  2. dnf install -y https://www.elrepo.org/elrepo-release-9.el9.elrepo.noarch.rpm
  3. dnf --enablerepo=elrepo-kernel install -y kernel-ml
  4. reboot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working nested-virt To identify issues that are related to nested virtualization os/linux resolution/unsupported Unsupported setup; hardware and/or software, nested virtualization
Projects
None yet
Development

No branches or pull requests

4 participants