Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gcc target_clones for math-heavy packages #522

Closed
3 tasks
Artoria2e5 opened this issue Dec 11, 2016 · 7 comments
Closed
3 tasks

gcc target_clones for math-heavy packages #522

Artoria2e5 opened this issue Dec 11, 2016 · 7 comments
Assignees
Labels
discussion-needed Further discussion needed enhancement Topic/issue involves an AOSC OS enhancement investigating Issue currently under investigation question Question or suggestions needed

Comments

@Artoria2e5
Copy link
Member

Artoria2e5 commented Dec 11, 2016

Modern instruction sets often provide significant boosts to math performance. Some time ago I proposed some /usr/lib/<march>/ structure with ld.conf; and recently Bai proposed /usr/<march>/ prefixes as seen on Clear Linux and Solus on Core 5. Both of these solutions make shipping and handling as packages harder. Is there an easier solution? Look, we can't use ICC.

Back then I have looked into GCC 4.8+'s function multi-versioning, but it requires duplicating the functions manually, and only works for C++. With GCC 6 it turns out that it not only can make clones and dispatch automatically, but also works for C. And gcc even has a flag that gives suggestions for automatically tagging functions with FMV. With everything easier now, we should make an attempt at getting all the performance bonus.

  • Align make-fmv-patch to current autobuild3 practices.
    • amd64: target_clones("default", "arch=ivybridge")
  • Either create an autobuild3 pass for obtaining vec info like in PGO, or modify make-fmv-patch so it accepts some path:funcname form for predetermined patching.
  • Use -O3 for mathy packages.
@MingcongBai
Copy link
Member

This is very good information indeed, even better if we could make out a plan with specific changes. Currently I don't quite have the chance (time allowance) to participate in this change - would you be so kind to go ahead and implement some of the changes so I could help testing (glibc would be a very good place to start)?

@MingcongBai MingcongBai added discussion-needed Further discussion needed enhancement Topic/issue involves an AOSC OS enhancement investigating Issue currently under investigation question Question or suggestions needed labels Dec 11, 2016
@Artoria2e5
Copy link
Member Author

According to the LWN article, glibc already manually uses similar dispatch techniques for things like memcpy(), so I doubt the efficacy of doing so. I guess I will start with mpfr or gmp instead.

@MingcongBai
Copy link
Member

Also for the record, Clear Linux and Solus only used /usr/lib/<march> for specifically AVX2 optimizations - and only applied on Glibc - nothing else.

My plan is to use /usr/<march> because this suggests a further generalization of this "optimization split/fork" - and only bin and lib would be included in the split/fork. However, having said that I would be interested to know if any project differentiates in the contents of headers/include based on optimization/instruction set availability?

@Artoria2e5
Copy link
Member Author

Artoria2e5 commented Dec 11, 2016

Hmm. I will try glibc first then.

So far I need a gcc wrapper that collects -fopt-info-vec for me. (gcc overwrites the file every single invocation.)

@MingcongBai
Copy link
Member

According to the LWN article, glibc already manually uses similar dispatch techniques for things like memcpy(), so I doubt the efficacy of doing so.

Hmm, but that's probably the only package Solus and Intel did the trick on, but they have included a patch specifically for AVX2-enabled processors.

I guess I will start with mpfr or gmp instead.

But this could be a very good place to start as well.

@ikeydoherty
Copy link

ikeydoherty commented Dec 13, 2016

Randomly came across this on Google, so I'll chip in.

The glibc patch modifies the dynamic loader to load the optimized AVX2 libraries from /usr/$LIB/avx2 if the CPU supports it. Thus any package that ships ET_DYN files in that directory, will have those files loaded only in the presence of an AVX2-capable CPU. This allows a dual build strategy where the "normal" libraries are shipped, and capable computers automatically leverage the new instructions.

In terms of build system integration, it's a case of doing the build first with AVX2 enabled (-mavx2 in combination with existing -O3 like flags) first, installing, and purging non lib files from the avx2 dir before the next stage of the install.

While it might sound convoluted, you could perhaps look at how I integrated it into ypkg, the Solus build system:

https://github.com/solus-project/ypkg/blob/master/ypkg2/main.py#L225
https://github.com/solus-project/ypkg/blob/master/ypkg2/ypkgcontext.py#L20
https://github.com/solus-project/ypkg/blob/master/ypkg2/ypkgcontext.py#L275

@MingcongBai
Copy link
Member

Package manager based optimization overlay system will be introduced with Core 5, this is now scrapped. See the discussions list for more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion-needed Further discussion needed enhancement Topic/issue involves an AOSC OS enhancement investigating Issue currently under investigation question Question or suggestions needed
Projects
None yet
Development

No branches or pull requests

3 participants