Performance: LLVM 20 aggressively optimizes popcnt and results in worse performance

I am finding that the same microbenchmark, when compiled with clang 20, is much slower then the same benchmark compiled with clang 19.

[This link](https://godbolt.org/z/5Md1dG9YG) has the full benchmark and the assembly for clang 19 and 20.

The LLVM 20 code is much longer, and seems to be because the LLVM 20 version is vectorized and not using the `popcnt` instruction. For some reason, this is slower. The LLVM 20 version takes .15s, the LLVM 19 version takes .05s.

Using the naive popcnt for the C version does seem to get pattern matched better and result in LLVM 20 being just as fast, if not faster

```
   uint64_t c = 0;
   while (n) {
        n &= (n - 1);
        c++;
    }
    return c;
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance: LLVM 20 aggressively optimizes popcnt and results in worse performance #142042

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Performance: LLVM 20 aggressively optimizes popcnt and results in worse performance #142042

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions