Skip to content

Performance: LLVM 20 aggressively optimizes popcnt and results in worse performance #142042

Closed
@jabraham17

Description

@jabraham17

I am finding that the same microbenchmark, when compiled with clang 20, is much slower then the same benchmark compiled with clang 19.

This link has the full benchmark and the assembly for clang 19 and 20.

The LLVM 20 code is much longer, and seems to be because the LLVM 20 version is vectorized and not using the popcnt instruction. For some reason, this is slower. The LLVM 20 version takes .15s, the LLVM 19 version takes .05s.

Using the naive popcnt for the C version does seem to get pattern matched better and result in LLVM 20 being just as fast, if not faster

   uint64_t c = 0;
   while (n) {
        n &= (n - 1);
        c++;
    }
    return c;

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions