gcc target_clones for math-heavy packages #522

Artoria2e5 · 2016-12-11T22:51:26Z

Modern instruction sets often provide significant boosts to math performance. Some time ago I proposed some /usr/lib/<march>/ structure with ld.conf; and recently Bai proposed /usr/<march>/ prefixes as seen on Clear Linux and Solus on Core 5. Both of these solutions make shipping and handling as packages harder. Is there an easier solution? Look, we can't use ICC.

Back then I have looked into GCC 4.8+'s function multi-versioning, but it requires duplicating the functions manually, and only works for C++. With GCC 6 it turns out that it not only can make clones and dispatch automatically, but also works for C. And gcc even has a flag that gives suggestions for automatically tagging functions with FMV. With everything easier now, we should make an attempt at getting all the performance bonus.

Align make-fmv-patch to current autobuild3 practices.
- amd64: target_clones("default", "arch=ivybridge")
Either create an autobuild3 pass for obtaining vec info like in PGO, or modify make-fmv-patch so it accepts some path:funcname form for predetermined patching.
Use -O3 for mathy packages.

The text was updated successfully, but these errors were encountered:

MingcongBai · 2016-12-11T22:57:47Z

This is very good information indeed, even better if we could make out a plan with specific changes. Currently I don't quite have the chance (time allowance) to participate in this change - would you be so kind to go ahead and implement some of the changes so I could help testing (glibc would be a very good place to start)?

Artoria2e5 · 2016-12-11T23:01:29Z

According to the LWN article, glibc already manually uses similar dispatch techniques for things like memcpy(), so I doubt the efficacy of doing so. I guess I will start with mpfr or gmp instead.

MingcongBai · 2016-12-11T23:02:59Z

Also for the record, Clear Linux and Solus only used /usr/lib/<march> for specifically AVX2 optimizations - and only applied on Glibc - nothing else.

My plan is to use /usr/<march> because this suggests a further generalization of this "optimization split/fork" - and only bin and lib would be included in the split/fork. However, having said that I would be interested to know if any project differentiates in the contents of headers/include based on optimization/instruction set availability?

Artoria2e5 · 2016-12-11T23:04:09Z

Hmm. I will try glibc first then.

So far I need a gcc wrapper that collects -fopt-info-vec for me. (gcc overwrites the file every single invocation.)

MingcongBai · 2016-12-11T23:04:33Z

According to the LWN article, glibc already manually uses similar dispatch techniques for things like memcpy(), so I doubt the efficacy of doing so.

Hmm, but that's probably the only package Solus and Intel did the trick on, but they have included a patch specifically for AVX2-enabled processors.

I guess I will start with mpfr or gmp instead.

But this could be a very good place to start as well.

ikeydoherty · 2016-12-13T18:43:32Z

Randomly came across this on Google, so I'll chip in.

The glibc patch modifies the dynamic loader to load the optimized AVX2 libraries from /usr/$LIB/avx2 if the CPU supports it. Thus any package that ships ET_DYN files in that directory, will have those files loaded only in the presence of an AVX2-capable CPU. This allows a dual build strategy where the "normal" libraries are shipped, and capable computers automatically leverage the new instructions.

In terms of build system integration, it's a case of doing the build first with AVX2 enabled (-mavx2 in combination with existing -O3 like flags) first, installing, and purging non lib files from the avx2 dir before the next stage of the install.

While it might sound convoluted, you could perhaps look at how I integrated it into ypkg, the Solus build system:

https://github.com/solus-project/ypkg/blob/master/ypkg2/main.py#L225
https://github.com/solus-project/ypkg/blob/master/ypkg2/ypkgcontext.py#L20
https://github.com/solus-project/ypkg/blob/master/ypkg2/ypkgcontext.py#L275

MingcongBai · 2017-05-26T08:27:22Z

Package manager based optimization overlay system will be introduced with Core 5, this is now scrapped. See the discussions list for more.

MingcongBai assigned MingcongBai and Artoria2e5 Dec 11, 2016

MingcongBai added discussion-needed Further discussion needed enhancement Topic/issue involves an AOSC OS enhancement investigating Issue currently under investigation question Question or suggestions needed labels Dec 11, 2016

MingcongBai closed this as completed May 26, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gcc target_clones for math-heavy packages #522

gcc target_clones for math-heavy packages #522

Artoria2e5 commented Dec 11, 2016 •

edited

Loading

MingcongBai commented Dec 11, 2016

Artoria2e5 commented Dec 11, 2016

MingcongBai commented Dec 11, 2016

Artoria2e5 commented Dec 11, 2016 •

edited

Loading

MingcongBai commented Dec 11, 2016

ikeydoherty commented Dec 13, 2016 •

edited

Loading

MingcongBai commented May 26, 2017

gcc target_clones for math-heavy packages #522

gcc target_clones for math-heavy packages #522

Comments

Artoria2e5 commented Dec 11, 2016 • edited Loading

MingcongBai commented Dec 11, 2016

Artoria2e5 commented Dec 11, 2016

MingcongBai commented Dec 11, 2016

Artoria2e5 commented Dec 11, 2016 • edited Loading

MingcongBai commented Dec 11, 2016

ikeydoherty commented Dec 13, 2016 • edited Loading

MingcongBai commented May 26, 2017

Artoria2e5 commented Dec 11, 2016 •

edited

Loading

Artoria2e5 commented Dec 11, 2016 •

edited

Loading

ikeydoherty commented Dec 13, 2016 •

edited

Loading