Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is this version compiled with SSE4, AVX etc. #34

Closed
jhmenke opened this issue Jun 1, 2017 · 7 comments
Closed

Is this version compiled with SSE4, AVX etc. #34

jhmenke opened this issue Jun 1, 2017 · 7 comments

Comments

@jhmenke
Copy link

jhmenke commented Jun 1, 2017

Hello,

when i install tensorflow via "conda install tensorflow", running scripts with it display several warnings about possible optimizations. These would speed up tensorflow significantly and i think are supported by many modern CPUs.

Does this version have SSE etc. enabled?

@jakirkham
Copy link
Member

We do not compile this package from source currently. We repackage Google's wheels on PyPI. So you would have to ask them how they build it.

@jhmenke
Copy link
Author

jhmenke commented Jun 1, 2017

For anyone wondering: The pypi package is built without optimizations. (as of right now)

@jakirkham
Copy link
Member

Thanks for the info.

We have attempted to build this from source and likely will again. That said, we still need to come up with some acceptable set of assembly instructions that will run on a range of hardware. Unless Tensorflow has some way of determining what the hardware can support at runtime, this will likely mean not having every optimization enabled. Though there will at least be a recipe that one can tweak and room for discussing how better to handle additional instructions in a reasonable way.

xref: #12
xref: conda-forge/conda-forge.github.io#49

@pkgw
Copy link
Contributor

pkgw commented May 17, 2018

I just ran into a problem relating to this. On one of my machines, import tensorflow immediately kills my Python process due to an illegal instruction — it seems that the conda-forge packages use AVX instructions, which my particular machine doesn't support (cf here, here, although few commenters actually seem interested in identifying the actual cause of the problem ...).

Update: A snippet from running on another machine; seems that TensorFlow has some kind of CPU capability detection but it isn't actually wired up to anything yet:

[...tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

Update 2: I'll also add that one of the big HPC clusters I use has a bunch of nodes that do not support AVX (I even wrote a tool to figure out which of my binaries were killing things) so the use of AVX here is a substantial pain point. On the other hand, I totally see how we want to use fancy CPU instructions if available ... hard to see how to balance things without someone doing the difficult work of generating code that can choose the right implementation at runtime.

@jjhelmus
Copy link
Collaborator

Tensorflow does not have dynamic code paths based on CPU capabilities. Whatever target micro-architecture is selected at build time is the minimum CPU version required at run time. Starting with the 1.5.1 release, the wheels available on PyPI use AVX instructions. We are re-packaging the wheels for the conda package so these have the same AVX requirements.

If you need conda packages which do not require AVX the conda packages in defaults are built from source and should work on nearly all x86_64 CPUs.

@pkgw
Copy link
Contributor

pkgw commented May 17, 2018

@jjhelmus Ah, good to know that the defaults packages don't require AVX.

@dailybudushu
Copy link

But I can't tell which packages require AVX .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants