Is this version compiled with SSE4, AVX etc. #34

jhmenke · 2017-06-01T14:45:23Z

Hello,

when i install tensorflow via "conda install tensorflow", running scripts with it display several warnings about possible optimizations. These would speed up tensorflow significantly and i think are supported by many modern CPUs.

Does this version have SSE etc. enabled?

jakirkham · 2017-06-01T14:56:14Z

We do not compile this package from source currently. We repackage Google's wheels on PyPI. So you would have to ask them how they build it.

jhmenke · 2017-06-01T14:57:26Z

For anyone wondering: The pypi package is built without optimizations. (as of right now)

jakirkham · 2017-06-01T15:26:22Z

Thanks for the info.

We have attempted to build this from source and likely will again. That said, we still need to come up with some acceptable set of assembly instructions that will run on a range of hardware. Unless Tensorflow has some way of determining what the hardware can support at runtime, this will likely mean not having every optimization enabled. Though there will at least be a recipe that one can tweak and room for discussing how better to handle additional instructions in a reasonable way.

xref: #12
xref: conda-forge/conda-forge.github.io#49

pkgw · 2018-05-17T15:57:37Z

I just ran into a problem relating to this. On one of my machines, import tensorflow immediately kills my Python process due to an illegal instruction — it seems that the conda-forge packages use AVX instructions, which my particular machine doesn't support (cf here, here, although few commenters actually seem interested in identifying the actual cause of the problem ...).

Update: A snippet from running on another machine; seems that TensorFlow has some kind of CPU capability detection but it isn't actually wired up to anything yet:

[...tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

Update 2: I'll also add that one of the big HPC clusters I use has a bunch of nodes that do not support AVX (I even wrote a tool to figure out which of my binaries were killing things) so the use of AVX here is a substantial pain point. On the other hand, I totally see how we want to use fancy CPU instructions if available ... hard to see how to balance things without someone doing the difficult work of generating code that can choose the right implementation at runtime.

jjhelmus · 2018-05-17T16:41:51Z

Tensorflow does not have dynamic code paths based on CPU capabilities. Whatever target micro-architecture is selected at build time is the minimum CPU version required at run time. Starting with the 1.5.1 release, the wheels available on PyPI use AVX instructions. We are re-packaging the wheels for the conda package so these have the same AVX requirements.

If you need conda packages which do not require AVX the conda packages in defaults are built from source and should work on nearly all x86_64 CPUs.

pkgw · 2018-05-17T17:32:21Z

@jjhelmus Ah, good to know that the defaults packages don't require AVX.

dailybudushu · 2019-01-20T05:08:53Z

But I can't tell which packages require AVX .

jakirkham closed this as completed Jun 1, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is this version compiled with SSE4, AVX etc. #34

Is this version compiled with SSE4, AVX etc. #34

jhmenke commented Jun 1, 2017

jakirkham commented Jun 1, 2017

jhmenke commented Jun 1, 2017

jakirkham commented Jun 1, 2017

pkgw commented May 17, 2018 •

edited

Loading

jjhelmus commented May 17, 2018

pkgw commented May 17, 2018

dailybudushu commented Jan 20, 2019

Is this version compiled with SSE4, AVX etc. #34

Is this version compiled with SSE4, AVX etc. #34

Comments

jhmenke commented Jun 1, 2017

jakirkham commented Jun 1, 2017

jhmenke commented Jun 1, 2017

jakirkham commented Jun 1, 2017

pkgw commented May 17, 2018 • edited Loading

jjhelmus commented May 17, 2018

pkgw commented May 17, 2018

dailybudushu commented Jan 20, 2019

pkgw commented May 17, 2018 •

edited

Loading