Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUILD_BUG() failed with arm64's gki_defconfig #1503

Closed
nathanchance opened this issue Nov 12, 2021 · 4 comments
Closed

BUILD_BUG() failed with arm64's gki_defconfig #1503

nathanchance opened this issue Nov 12, 2021 · 4 comments
Labels
[ARCH] arm64 This bug impacts ARCH=arm64 [BUG] linux A bug that should be fixed in the mainline kernel. [FEATURE] CFI Related to building the kernel with Clang Control Flow Integrity [FIXED][LINUX] 5.16 This bug was fixed in Linux 5.16

Comments

@nathanchance
Copy link
Member

During Android's mainline merges, I noticed a patch removing some BUILD_BUG() calls that were triggering. This can be reproduced with a ToT LLVM and mainline kernel with Android's gki_defconfig so this is probably something that should be investigated and fixed.

$ clang --version | head -n1
ClangBuiltLinux clang version 14.0.0 (https://github.com/llvm/llvm-project 9b6f264d2b09ab5e970e54f77119eb823f0fcc50)

$ git show -s --format='%h ("%s")'
5833291ab6de ("Merge tag 'pci-v5.16-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci")

# gitlies's txt export is base64 encoded so it has to be decoded...
$ curl -LSs 'https://android.googlesource.com/kernel/common/+/refs/heads/android-mainline/arch/arm64/configs/gki_defconfig?format=TEXT' | base64 -d > .config

# Android uses full LTO, which takes forever; this bug can be reproduced with ThinLTO
$ scripts/config -d LTO_CLANG_FULL -e LTO_CLANG_THIN

$ make -skj"$(nproc)" ARCH=arm64 LLVM=1 olddefconfig vmlinux
ld.lld: error: call to __compiletime_assert_200 marked "dontcall-error": BUILD_BUG failed
...

Brief initial triage leads me to believe this is an LTO problem, as outright disabling LTO makes the problem go away:

$ curl -LSs 'https://android.googlesource.com/kernel/common/+/refs/heads/android-mainline/arch/arm64/configs/gki_defconfig?format=TEXT' | base64 -d > .config

$ scripts/config -d LTO_CLANG_FULL

$ make -skj"$(nproc)" ARCH=arm64 LLVM=1 olddefconfig vmlinux

$ echo $?
0

I will see if I can figure out exactly what configuration options tickle this then create a small reproducer and report it upstream.

@nathanchance nathanchance added [BUG] Untriaged Something isn't working [ARCH] arm64 This bug impacts ARCH=arm64 [FEATURE] LTO Related to building the kernel with LLVM Link Time Optimization labels Nov 12, 2021
@nathanchance
Copy link
Member Author

nathanchance commented Nov 13, 2021

There is probably a way better IR reproducer for this but this is what cvise came up with (it appears CFI related in reality):

$ cat arm_arch_timer.i
struct thread_info {
  struct {
    struct {
      int count;
    } preempt;
  };
} preempt_schedule_notrace(void);
struct arch_timer_erratum_workaround {
  long (*read_cntvct_el0)();
} timer_unstable_counter_workaround;
long arch_timer_reg_read_cp15___val, __arch_counter_get_cntvct_stable___ptr,
    erratum_set_next_event_generic_cval;
typeof(&timer_unstable_counter_workaround)
    __arch_counter_get_cntvct_stable_pscr_ret__;
long arch_timer_read_cntvct_el0();
inline long arch_timer_reg_read_cp15(int access) {
  if (access == 0)
    return ({
      asm("");
      arch_timer_reg_read_cp15___val;
    });
  if (access == 1)
    return ({
      asm("");
      arch_timer_reg_read_cp15___val;
    });
  __attribute__((__noreturn__)) void __compiletime_assert_200(void)
      __attribute__((__error__("BUILD_BUG failed")));
  __compiletime_assert_200();
}
void __arch_counter_get_cntvct_stable() {
  __asm__("");
  ({
    struct arch_timer_erratum_workaround *__wa = ({
      ({
        __arch_counter_get_cntvct_stable_pscr_ret__ = ({
          *({
            ({
              (typeof((typeof(&timer_unstable_counter_workaround) *)0))
                  __arch_counter_get_cntvct_stable___ptr;
            });
          });
        });
      });
    });
    __wa && __wa->read_cntvct_el0 ? ({
      asm("");
      __wa->read_cntvct_el0;
    })
                                  : arch_timer_read_cntvct_el0;
  })("");
  long ti_0_0 = ({
    ({
      typeof(&ti_0_0) __x = &ti_0_0;
      asm("" : : "Q"(__x));
      0;
    });
  });
  preempt_schedule_notrace();
}
void arch_counter_get_cntpct_stable() {
  typeof(&((struct thread_info *)0)->preempt.count) __x =
      &((struct thread_info *)0)->preempt.count;
  switch (((struct thread_info *)0)->preempt.count)
  case 4:
    asm("" : : "Q"(__x));
  __asm__("");
  long ti_0_0 = ({
    ({
      typeof(&ti_0_0) __x = &ti_0_0;
      asm("" : : "Q"(__x));
      0;
    });
  });
}
static void erratum_set_next_event_generic(int access) {
  arch_timer_reg_read_cp15(access);
  arch_counter_get_cntpct_stable();
  asm("msr "
      "cntp_cval_el0"
      ", %x0"
      :
      : "rZ"(0));
  __arch_counter_get_cntvct_stable();
  asm("msr "
      "cntv_cval_el0"
      ", %x0"
      :
      : "rZ"(erratum_set_next_event_generic_cval));
}
void erratum_set_next_event_virt() {
  erratum_set_next_event_generic(1);
  erratum_set_next_event_generic(0);
}

# ThinLTO + CFI has the issue
$ clang -O2 --target=aarch64-linux-gnu -flto=thin -fsplit-lto-unit -fvisibility=hidden -fsanitize=cfi -fsanitize-cfi-cross-dso -fno-sanitize-cfi-canonical-jump-tables -fno-sanitize-trap=cfi -fno-sanitize-blacklist -c -o arm_arch_
timer.o arm_arch_timer.i

$ llvm-ar cDPrST arm_arch_timer.a arm_arch_timer.o

$ ld.lld -EL -maarch64elf -z norelro -mllvm -import-instr-limit=5 -r -o /dev/null --whole-archive arm_arch_timer.a
ld.lld: error: call to __compiletime_assert_200 marked "dontcall-error": BUILD_BUG failed

# Just ThinLTO is fine
$ clang -O2 --target=aarch64-linux-gnu -flto=thin -c -o arm_arch_timer.o arm_arch_timer.i

$ llvm-ar cDPrST arm_arch_timer.a arm_arch_timer.o

$ ld.lld -EL -maarch64elf -z norelro -mllvm -import-instr-limit=5 -r -o /dev/null --whole-archive arm_arch_timer.a

$ echo $?
0

cvise files here: https://github.com/nathanchance/creduce-files/tree/1475da898a391c5eeb19eaf57c5656530d21e920/cbl-1503

@nathanchance nathanchance added the [FEATURE] CFI Related to building the kernel with Clang Control Flow Integrity label Nov 13, 2021
@nathanchance
Copy link
Member Author

It looks like this can be reproduced with just defconfig + CONFIG_LTO_CLANG_THIN=y + CONFIG_CFI_CLANG=y (our CI is now hitting this): https://github.com/ClangBuiltLinux/continuous-integration2/runs/4212005132?check_suite_focus=true

@nathanchance
Copy link
Member Author

Marc Zyngier sent https://lore.kernel.org/r/20211117113532.3895208-1-maz@kernel.org/ to resolve this.

@nathanchance nathanchance added [BUG] linux A bug that should be fixed in the mainline kernel. [PATCH] Submitted A patch has been submitted for review and removed [BUG] Untriaged Something isn't working [FEATURE] LTO Related to building the kernel with LLVM Link Time Optimization labels Nov 17, 2021
@nathanchance
Copy link
Member Author

@nathanchance nathanchance added [FIXED][LINUX] 5.16 This bug was fixed in Linux 5.16 and removed [PATCH] Submitted A patch has been submitted for review labels Dec 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[ARCH] arm64 This bug impacts ARCH=arm64 [BUG] linux A bug that should be fixed in the mainline kernel. [FEATURE] CFI Related to building the kernel with Clang Control Flow Integrity [FIXED][LINUX] 5.16 This bug was fixed in Linux 5.16
Projects
None yet
Development

No branches or pull requests

1 participant