New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clang performance regression since 40800 due to GCC 14 tool chain #3036
Comments
I ran serially without OpenMP. Same thing, "get properties" is slower using the GCC 14 toolchain (right column). clang++ --gcc-install-dir=/usr/lib64/gcc/x86_64-generic-linux/13 ...
clang++ --gcc-install-dir=/usr/lib64/gcc/x86_64-generic-linux/14 ... $ ./llil4map in/big* in/big* in/big* >/dev/null
llil4map (fixed string length=12) start
don't use OpenMP
use boost sort
get properties 228.544 secs 233.647 secs
phmap to vector 0.967 secs 0.964 secs
vector stable sort 22.299 secs 22.317 secs
write stdout 3.306 secs 3.363 secs
total time 255.117 secs 260.292 secs
count lines 970195200 970195200
count unique 200483043 200483043 |
hmm very surprised that gcc has so much impact on llvm
…On Mon, Feb 12, 2024 at 12:16 PM Mario Roy ***@***.***> wrote:
I ran serially without OpenMP. Same thing, "get properties" is slower
using the GCC 14 toolchain (right column).
clang++ --gcc-install-dir=/usr/lib64/gcc/x86_64-generic-linux/13 ...
clang++ --gcc-install-dir=/usr/lib64/gcc/x86_64-generic-linux/14 ...
$ ./llil4map in/big* in/big* in/big* >/dev/null
llil4map (fixed string length=12) start
don't use OpenMPuse boost sortget properties 228.544 secs 233.647 secsphmap to vector 0.967 secs 0.964 secsvector stable sort 22.299 secs 22.317 secswrite stdout 3.306 secs 3.363 secstotal time 255.117 secs 260.292 secs count lines 970195200 970195200 count unique 200483043 200483043
—
Reply to this email directly, view it on GitHub
<#3036 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJ54FOZPVCOI6DPR6I7WIDYTJ2AZAVCNFSM6AAAAABDFI4RSOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZZGUYDCOBVHE>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
OpenMP was removed in LLVM since CL 39970. To restore clang/clang++ OpenMP functionality, I have a script to install the missing OpenMP bits (headers and libs).
#!/bin/bash
if [[ -d /opt/llvm-17/lib64/clang/17 && -d /usr/lib64/clang/17 ]]; then
cd /usr/lib64/cmake
sudo cp -a /opt/llvm-17/lib64/cmake/openmp .
cd /usr/lib64/clang/17/include
sudo cp -a /opt/llvm-17/lib64/clang/17/include/omp*.h .
cd /usr/lib64
sudo cp -a /opt/llvm-17/lib64/libarcher.so .
sudo cp -a /opt/llvm-17/lib64/libompd.so .
sudo cp -a /opt/llvm-17/lib64/libomp.so .
sudo ln -sf libomp.so libiomp5.so
fi That restored my sanity from the CL team removing OpenMP functionality in LLVM :) |
The source for llil4map.cc resides at https://gist.github.com/marioroy/862fa2fc6aa3b6f523f7a6ef9dd8d157. The parallel chunk "get_properties" routine begins at line 190. Read IO is serially. Otherwise, threads append to the hash map concurrently. On my machine, LLVM with OpenMP functionality is amazing. CL 40750 25.17x get_properties parallel performance over serial |
One can specify which GCC tool chain to use to restore the prior performance.
|
it would be really weird that llvm depends on this but also .. quite interesting to figure out what is the issue |
I tried again. Testing was done on Clear 41120, LTS kernel 6.1.69-1331.ltsprev. I captured data using xcapture (via run_xcapture.sh) from https://0x.tools/. The data suggests more futex locking behind the scene versus gcc13. clang++ using gcc14 toolchain:
clang++ using gcc13 toolchain:
|
I compared various Mutex libraries. The issue is resolved at the application level.
// std::mutex L[NUM_MAPS];
// omp_lock_t L[NUM_MAPS];
// fast_mutex L[NUM_MAPS];
spinlock_mutex L[NUM_MAPS]; Selecting gcc13 or gcc14 toolchain. clang++ -o llil4emh \
--gcc-install-dir=/usr/lib64/gcc/x86_64-generic-linux/13 \
-std=c++20 -fopenmp -Wall -O3 llil4emh.cc
clang++ -o llil4emh \
--gcc-install-dir=/usr/lib64/gcc/x86_64-generic-linux/14 \
-std=c++20 -fopenmp -Wall -O3 llil4emh.cc Testing involves 963 mutexes. OpenMP threads pick one depending on the hash value % number of maps. The program outputs the time to populate the maps in parallel (get_properties).
There are ~ 1 billion lines read, of which ~ 200 million unique.
|
I have an application, when re-compiled, noticed a signification performance regression.
The times on the left and right columns are CL 40750 and 40800, respectively. The get properties does parallel IO (chunking), inserting into
emhash7::HashMap
orphmap::parallel_flat_hash_map
container. The other aspects of the application have similar times { map container to vector, vector stable sort, and write stdout }.Were you guys expecting the performance regression with GCC 14? No improvements updating to CL 40830.
Selecting GCC 13 by passing
--gcc-install-dir=/usr/lib64/gcc/x86_64-generic-linux/13
to clang++ resolves the issue.The text was updated successfully, but these errors were encountered: