-
Notifications
You must be signed in to change notification settings - Fork 295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed ups from compiling with specific arch #1799
Comments
This is surprisingly difficult to do.... `import platform;
platform.platform()` can tell you if your system feels like it; parsing
`/proc/cpuinfo` can as well, once again, if your system feels like it. On
my lab desktop, `/proc/cpuinfo` will tell me that I have an i7-3820, but
not that it's a Sandy Bridge chip. On my Macbook, `/usr/sbin/sysctl -e
machdep.cpu` will give me a bunch of numerical codes for model, family,
etc, which can probably be translated, but aren't informative on their own.
Best option is probably to let users pass in their own `-march`, but I
don't know if that can be done with `pip` either...
…On Thu, Oct 12, 2017 at 1:07 AM, Tim Head ***@***.***> wrote:
We should discuss how best to deal with the fact that compilers are
getting smarter but you need to tell them what arch you are working with.
For example https://godbolt.org/g/8EyZEJ counts the number of set bits
which on a haswell (any not very old intel CPU) or newer results in a
single instruction specifically made for this. Remove the -march=haswell
to see the long form.
On my desktop compiling khmer with -march=skylake brings a few percent of
speed up.
Not sure what the recommended arch is for binaries distributed via PyPI
but I'd bet it isn't -march=haswell. So we can't just put it into setup.py
.
Credit for making me think about this: https://www.youtube.com/watch?
v=bSkpMdDe4g4 also mentions various other tricks.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1799>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACwxrZ7M92SzHp_aawvjumVMkvM5J8zYks5srcirgaJpZM4P2npF>
.
--
Camille Scott
Graduate Group for Computer Science
Lab for Data Intensive Biology
University of California, Davis
camille.scott.w@gmail.com
|
Doesn't solve the question of what arch we should use when building binaries for others to use. |
You can try to follow what is being done in this lib:
https://github.com/kimwalisch/libpopcnt
(they detect at runtime what is available). Not sure how scalable the
solution is for more instructions,
and not even sure if it is a good idea (since we want to let the compiler
take care of it),
but I thought it was worth throwing this here.
…On Fri, Oct 20, 2017 at 12:08 PM, Tim Head ***@***.***> wrote:
-march=native seems to do the right thing when testing on my laptop
(super old no haswell) and my linux desktop.
Doesn't solve the question of what arch we should use when building
binaries for others to use.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1799 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAAZ8p-LemeD9nxLZNO2R36mSDc4liQZks5suO-igaJpZM4P2npF>
.
|
Can This is getting into the realm of hairy limited-shelf-life-solutions, admittedly. |
Right now I think |
We should discuss how best to deal with the fact that compilers are getting smarter but you need to tell them what arch you are working with. For example https://godbolt.org/g/8EyZEJ counts the number of set bits which on a haswell (any not very old intel CPU) or newer results in a single instruction specifically made for this. Remove the
-march=haswell
to see the long form.On my desktop compiling khmer with
-march=skylake
brings a few percent of speed up.Not sure what the recommended arch is for binaries distributed via PyPI but I'd bet it isn't
-march=haswell
. So we can't just put it intosetup.py
.Credit for making me think about this: https://www.youtube.com/watch?v=bSkpMdDe4g4 also mentions various other tricks.
The text was updated successfully, but these errors were encountered: