Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed ups from compiling with specific arch #1799

Open
betatim opened this issue Oct 12, 2017 · 5 comments
Open

Speed ups from compiling with specific arch #1799

betatim opened this issue Oct 12, 2017 · 5 comments

Comments

@betatim
Copy link
Member

betatim commented Oct 12, 2017

We should discuss how best to deal with the fact that compilers are getting smarter but you need to tell them what arch you are working with. For example https://godbolt.org/g/8EyZEJ counts the number of set bits which on a haswell (any not very old intel CPU) or newer results in a single instruction specifically made for this. Remove the -march=haswell to see the long form.

On my desktop compiling khmer with -march=skylake brings a few percent of speed up.

Not sure what the recommended arch is for binaries distributed via PyPI but I'd bet it isn't -march=haswell. So we can't just put it into setup.py.

Credit for making me think about this: https://www.youtube.com/watch?v=bSkpMdDe4g4 also mentions various other tricks.

@camillescott
Copy link
Member

camillescott commented Oct 18, 2017 via email

@betatim
Copy link
Member Author

betatim commented Oct 20, 2017

-march=native seems to do the right thing when testing on my laptop (super old no haswell) and my linux desktop.

Doesn't solve the question of what arch we should use when building binaries for others to use.

@luizirber
Copy link
Member

luizirber commented Oct 31, 2017 via email

@standage
Copy link
Member

Can setup.py execute some minimal code at compile time to detect the architecture and adjust options accordingly.

This is getting into the realm of hairy limited-shelf-life-solutions, admittedly.

@betatim
Copy link
Member Author

betatim commented Nov 1, 2017

Right now I think -march=native would be good enough for most people (and the speedups seem to be small anyway so not worth adding too much magic?). With maybe some if statements in setup.py to detect when we are building wheels/binaries for distribution in which case you turn it off/set it to what the recommended arch is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants