Skip to content

ClangBuiltLinux/plumbers-2020-slides

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

plumbers-2020-slides

Slides/videos from Linux Plumbers Conf 2020 (Virtual)

Recording

Slides: TODO: links

Notes

The LLVM MC and LLVM BoF at Linux Plumbers Conf 2020 went well; we had 9 talks/session, 1 call for a new mailing list, 2 maintainers named (Nathan Chancellor, Nick Desaulniers), 3 sent improving documentation and committing to a support model around the latest release of clang (clang-10/10.0.1), and we recorded 172 attendees in our MC at one point.

Dependency ordering in the Linux kernel

Will Deacon gave a great background with simple examples defining Control vs Data vs Address dependencies. He highlighted how modern hardware can reorder read->read control dependencies, and specifically that compiler transforms that could convert read->read address dependencies into control dependencies would break hardware ordering.

Being able to help identify such cases during a build would be helpful. There was some discussion about bringing this terminology of the 3 different dependencies into the ISO WG14 standards body. Peter Zijlstra and Paul McKenney discussed a feature request for annotating control dependencies maybe with the use of keyword volatile on the closing } of a loop. It was noted that loop exit in general was a concern. Marco Elver suggested marking a block statement volatile.

Peter had the idea to start a kernel toolchain agnostic mailing list to discuss the details and design goals further. Nick Desaulniers emailed VGER postmaster about the idea: https://groups.google.com/g/clang-built-linux/c/GLEkFKlDXfo/m/o6UmfyvDAAAJ.

Barriers to in-tree Rust

Geoffrey Thomas presented the work around prototyping in tree support for writing Linux kernel drivers in Rust. Improved memory safety and statistics around 2/3 of bugs written C/C++ being memory safety issues were referenced.

It was noted that the suggestion was not to to attempt to rewrite the kernel in Rust, but rather provide Kbuild integration such that greenfield drivers may be written in the language. Also, discussion about using cargo but requiring all modules be in tree occurred.

There were a fair amount of questions around rustc's dependence on LLVM, which lacks backends for some of the more obscure ISAs that the Linux kernel supports. It was noted that there's interest and potentially a bounty on implementing a Rust frontend to GCC, though it was suggested that having rustc emit a GCC IR might be less work.

It seems that the use of bindgen for automated language bindings to the kernel had a few questions. bindgen requires libclang to parse kernel headers. The generated bindings don't provide lifetime annotations so it's common to wrap autogenerated bindings in manual wrappers that provide lifetime annotations. Auto generating bindings can help detect when interfaces change.

There was also questions around whether targeting the save version of LLVM IR or not as Clang for kernel C could would imply an ABI breakage. It likely would, though in practice for small samples this has yet to be an issue. For strict guarantees, the Clang and rustc would have to use precisely the same version of LLVM.

LTO, PGO, and AutoFDO

Sami Tolvanen, Bill Wendling, and Nick Desaulniers gave an overview of what these are, and some brief numbers showing their successful use in 3 different kernel distributions from Google. Upstreaming this work was a major question, as the build times went up significantly for LTO or even thinLTO builds. Also, the profiling data had to be post processed using an open source utility that's out of tree both for the kernel and LLVM. Mark Brown suggested that maybe git wasn't the best place to store binary data that undergoes significant churn, and that maybe CI systems in place for the kernel could provide relevant training data.

The talk covered some similar topics as Ian Bearman's talk "Exploring Profile Guided Optimization of the Linux Kernel." https://linuxplumbersconf.org/event/7/contributions/771/ Collaboration between toolchain implementations on kernel patches was encouraged.

Measuring Kernel Compile Times w/ Clang

Nathan Chancellor and Nathan Huckleberry presented data and techniques for measuring compile times of the Linux kernel with Clang. Mr. Chancellor presented data showing GCC beating Clang across the board. It was noted that the use of profiling data and LTO builds of Clang could bring Clang more in line to be competitive, but the same modifications and measurements were not done for GCC which likely would see significant performance improvements as well.

Mr. Huckleberry presented graphs and profile reports of builds with Clang, noting that there was significant low hanging fruit around inline asm statements (13% of a build wasted recomputing values, since fixed in clang-11) and macros with large token counts, such as the kernel's use of GNU C statement expressions (identified but not yet fixed). Profile data was shown that significant time was spent in the compiler front end (lexing, parsing, and semantic analysis) rather than the backend (optimization and codegen). Work was also show for Perfetto which allowed graphical profiles to be shared and queried. It was noted that this is early days of compiler performance optimization research, and that there was still a lot of work to do here.

Jason Gunthorpe asked about the use of precompiled headers for the kernel, which Arnd Bergmann reported was problematic last time Arnd tried.

Arnd hinted at a WIP series of patches that significantly cut down on the compile times with both GCC and Clang by minimizing header dependencies. https://drive.google.com/file/d/1GFCmN3r93EJImvo-cbYJLd-iY1vJ_G5i/view?usp=sharing provides a visualization of the problem. Nodes in the graph with high fan in or fan out may be interesting to break up.

Marco Elver asked about "include-what-you-use" (IWYU), a tool commonly used to solve this problem. Ilie Halip and Arnd Bergmann bother reported issues running that tool on the kernel sources due to it not understanding the kernels sometimes-config-based includes.

Using clang-tidy and clang-format

Miguel Ojeda and Nathan Huckleberry presented work they've done to support automating fixing kernel style nits via clang-tidy rules and help catch bugs via static analysis via clang-tidy and scan-build (Clang's static analyzer).

When polled, there was a split between maintainers that would and would not consider running clang-tidy on their who subtree inducing churn. git clang-tidy was suggested for developers to format just their patches and not the rest of whole files. Will Deacon noted that many maintainers no longer run ./scripts/checkpatch.pl on their trees due to false positives.

clang-tidy was presented as a codebase specific linter for writing codebase specific warnings. Masahiro Yamada asked if warnings could live in-tree of the kernel. Stephen Hines clarified that clang-tidy warnings were appropriate to upstream into LLVM, as many projects have custom rules in clang-tidy and that it was easy to specify the checks you want. Nathan's patch to enable clang-tidy already disables all checks, then re-enables Linux kernel specific ones.

Asm Goto with Outputs

Bill Wendling gave a presentation on how he designed and implemented an extension to the GNU C extension asm goto to support outputs along the fallthrough path. This feature was requested by Linus and other kernel developers to improve some of the code for the happy path of get_user/put_user. Some ambiguous cases were pointed out.

Collaboration with GCC developers was welcomed in implementing. A shared kernel toolchain mailing list would be preferred to do such design collaboration in the future. Since then, Segher Boessenkool has reached out to Nick Desaulniers, Bill Wendling, and James Knight to discuss nitty gritty details.

Towards Learning From Linux Kernel Configurations' Failures with Clang

Prof. Mathieu Acher from University of Renne 1, Inria presented research on the use of Machine Learning (ML) classification via statical analysis and use of decision trees to help identify broken kernel configurations. Mathieu noted that the kernel has 10^6000 configurations, and that a decision tree made it easy to visualize what commonalities various broken builds may have. Further research was meant to analyze which configs either hurt binary size (such as CONFIG_DEBUG_INFO and friends) or compile times.

Arnd Bergmann noted that in his randconfig testing, he observed about 1.6% of randconfig builds failing with Clang, which was not much more than he observed in builds with GCC.

Mathieu noted that their testing was done against x86_64, and that other lighter tested architectures would like face more build failures with GCC (or Clang).

TuxMake and TuxBuild

Dan Rue and Antonio Terceiro presented work they've done to build the TuxMake and TuxBuild microservices, to help maintainers or CI system developers solve build scaling related issues, and maintain artifacts for reproducibility.

It was demonstrated that LKFT is already making use of the services to run ~70 builds in ~15 minutes.

Kees Cook asked about boot-testing related microservices.

Khem Raj asked about distributed builds. Dan explained that most kernel developers don't do such builds so they avoid them in case that causes differences in the resulting binaries.

Dan recommended checking out https://gitlab.com/Linaro/tuxbuild or emailing tuxbuild@linaro.org for access.

CI Systems and Clang

Nick Desaulniers gave a quick overview of supported architectures supported by LLVM and the Linux kernel (arm, arm64, x86, powerpc, mips, arc, hexagon, riscv, s390, sparc) and mentioned to CI implementors that we'd like to get environments setup to be testing those ISAs.

Then the move to using LLVM=1 to simplify testing was discussed. Such a move would help give test coverage to LLVM's binutils substitutes (ld.lld, llvm-nm, etc.). Also, this would help minimize the command line to build the kernel. Kevin Hillman implemented support for LLVM=1 in KernelCI shortly thereafter.

Nick mentioned that it would be nice too to omit CROSS_COMPILE since it was mostly redundant and could be inferred from ARCH in most cases. Masahiro Yamada asked if CROSS_COMPILE could/should be omitted if LLVM=1 LLVM_IAS=1. Mark Brown noted this could be tricky for environments with multiple versions of toolchains installed, such as for KenelCI.

Guillame Tucker mentioned that LLVM=1 should be tested with scripts/merge_config.sh, which has problems with CC=clang. Guillame later noted that LLVM=1 was good to go with merge_config.sh.

Finally, Geoffrey Thomas recommended checking out a tool called "rust crater."

Thank Yous

Thank you to all of our great speakers, those that submitted proposals for the MC, and attendees. Particularly:

  • Aditya Kumar
  • Alessandro Decina
  • Alex Gaynor
  • Antonio Terceiro
  • Bill Wendling
  • Dan Rue
  • Geoffrey Thomas
  • John Baublitz
  • Josh Triplett
  • Mathieu Acher
  • Miguel Ojeda
  • Nathan Chancellor
  • Nathan Huckleberry
  • Nick Desaulniers
  • Paul McKenney
  • Peter Parkanyi
  • Peter Zijlstra
  • Sami Tolvanen
  • Will Deacon

Thank you to the MC leads for putting together the proposal and reviewing submissions, as well as moderating:

  • Behan Webster
  • Nick Desaulniers

Thank you to the Planning Committee, for the tireless effort involved in planning and building infrastructure for the virtual event. We saw many of you working hard day of behind the scenes resolving minor issues. This was a major contribution of time and effort to the Linux ecosystem, and we're so thankful for all of the work you did that made Linux plumbers conf 2020 such a success.

  • Carlos O'Donnell
  • Christian Brauner
  • David Woodhouse
  • Elena Zannoni
  • Guy Lunardi
  • James Bottomley
  • Jon Corbert
  • Kate Stewart
  • Laura Abbott
  • Paul McKenney
  • Steven Rostedt

About

Slides/videos from Linux Plumbers Conf 2020 (Virtual)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published