archspec-enabled packages #1261

jaimergp · 2021-02-16T09:58:52Z

I read the conda-forge documentation and could not find the solution for my problem there.

Comes from [1], [2], [3].

Building package variants for different instructions sets would be helpful for the community. For example, to support AVX for those CPUs that support it, but gracefully fall back to non-AVX variants in other CPUs (e.g. Atom). The current recommendation is to not build with AVX unless upstream handles the missing instructions at runtime.

conda/conda#9930 exposed some parts of archspec as a virtual package __archspec, the output of which can be checked with conda info:

...
          conda version : 4.9.2
    conda-build version : 3.19.1
         python version : 3.7.6.final.0
       virtual packages : __glibc=2.27=0
                          __unix=0=0
                          __archspec=1=x86_64
       base environment : /opt/miniconda  (read only)
...

However, there's no way to leverage this information as a maintainer. What should we do?

Add run_constrained lines the same way we deal with sysroots and glibc? I don't think __archspec itself provides enough information now. How does the maintainer know which instructions are supported there?
@isuruf suggested a cpu_feature metapackage that can restrict this in a better way, with as many variants as architectures I presume? This might put additional burden on the maintainer, who might need to check which architectures support which instructions.

Is there a better way?

Idea 1

A Jinja function that translates instruction sets to an __archspec selection query:

run_constrained:
  - {{ cpu_feature('avx') }}  # this could be a logical expression if needed

would be

run_constrained:
  - __archspec * *sandybridge,*ivybridge,*haswell,*broadwell,*skylake,*mic_knl,*skylake_avx512,*cannonlake,*cascadelake,*icelake,*bulldozer,*piledriver,*steamroller,*excavator,*zen,*zen2
  # or whatever the "any of this would work" query syntax is in the conda selector

If a new architecture is released and it also supports AVX would involve rebuilding packages to add the new constraints.

Idea 2

A cpu_feature metapackage with variants built for instructions: these packages would need to be updated often so their run_constrained metadata is up-to-date with compatible architectures, but wouldn't require rebuilding downstream. How could maintainers specify multiple dependencies at the same time? Would we need to build the cartesian product of all architectures combinations?

I don't think any of these ideas is good enough to be the strategy we want to pursue, but hopefully it is enough to brainstorm about it!

The text was updated successfully, but these errors were encountered:

beckermr · 2021-02-16T14:05:18Z

I've heard there is an upstream pr for conda to expose the exact set of CPU features. I don't know the status of this pr.

@chenghlee ?

jaimergp · 2021-02-16T14:28:48Z

Maybe this one? conda/conda#9461

beckermr · 2021-02-16T14:39:06Z

Good find!

beckermr · 2021-02-16T14:41:10Z

I think idea 2 is closer to what we want. I don't care about Haswell or w/e. I do care about avx or avx512

chrisburr · 2021-02-16T14:59:44Z

How about using the new "feature levels" of GCC 11 and Clang 12 to define the meta-package's build strings? (I don't mean we have to wait for the compilers to be updated, just imitate the same compatibility levels.)

jaimergp · 2021-02-16T15:05:06Z

Is that only for x86-64?

beckermr · 2021-02-16T15:05:28Z

Yes it appears to be only x86_64. We'd need to translate those levels to specific things in archspec IIUIC. We should discuss this at the next core meeting.

beckermr · 2021-02-16T15:49:11Z

@chenghlee @wolfv It appears that archspec implements comparison operators for CPUs based on feature sets. This means you can do things like figure out if a build will run on the CPU you have and specify compatibility as things like >=haswell etc. Is there a way to feed this info into the conda solver that is scalable?

isuruf · 2021-02-16T15:52:34Z

See archspec/archspec#24

beckermr · 2021-02-16T15:55:44Z

Ohhhh nice. Thanks @isuruf!

h-vetinari · 2021-04-10T10:33:24Z

@isuruf

Shouldn't the "feature levels" that @chrisburr mentioned satisfy the requirement of a total ordering? That way, it would also keep the build matrix explosion to a minimum, because it would be a good start to just build for v1, v2, v3.

x86-64: CMOV, CMPXCHG8B, FPU, FXSR, MMX, FXSR, SCE, SSE, SSE2
x86-64-v2: (close to Nehalem) CMPXCHG16B, LAHF-SAHF, POPCNT, SSE3, SSE4.1, SSE4.2, SSSE3
x86-64-v3: (close to Haswell) AVX, AVX2, BMI1, BMI2, F16C, FMA, LZCNT, MOVBE, XSAVE
x86-64-v4: AVX512F, AVX512BW, AVX512CD, AVX512DQ, AVX512VL

h-vetinari · 2021-04-10T10:34:16Z

The other good thing is that these levels agree between GCC & Clang.

h-vetinari · 2021-05-07T15:54:58Z

Ping @isuruf re: using GCC/Clang feature levels for x86.

It depends on how granular we're aiming for the configuration to be, but - aside from keeping the build matrix explosion under control - having just v2/v3 from the above list would already help in the case of conda-forge/faiss-split-feedstock#23

h-vetinari · 2021-06-01T08:04:01Z

See archspec/archspec#24

That issue was closed without further action - what now?

wolfv · 2021-06-01T15:39:10Z

Recently my colleagues (ping @serge-sans-paille @JohanMabille) have implemented a SIMD instruction set detector in xsimd: https://github.com/xtensor-stack/xsimd/blob/master/include/xsimd/config/xsimd_arch.hpp

It also comes with some sort of ordering in the "best_version".

It has some interesting properties:

doesn't rely on the CPU lists / json files to be updated
we can easily wrap it for Python, and works natively in C++ (for mamba + conda)
will be used in Apache Arrow, xtensor and (maybe / hopefully) NumPy

I am not sure if it's "too late" but maybe we could use this library? Either to directly create virtual packages for the different instruction sets (avx2, sse, avx512, neon), or in a different fashion to pre-filter packages.

I am very interested to ship more optimised binaries through conda-forge ... we need to save the environment :)

ngam · 2022-04-29T02:36:55Z

Just a small note to consider as this is implemented in the future:

Some users will likely be setting environments (e.g. conda create -n test xyz) on a device that is not the one for deployment or production. A common example is a login node on an HPC where all interactive work is done, but the actual work is run on compute nodes which don't have internet access (i.e. conda create -n test xyz will timeout). I'd guess similar stuff may happen with containers (e.g. if someone is using the sylabs io remote builder with a conda env inside the sif image).

In my experience, the gracefully-fall-back strategy works alright if one is careful enough, though clearly not a perfect solution and it seems to be causing headaches in certain places.

h-vetinari · 2022-04-29T02:44:24Z

Some users will likely be setting environments (e.g. conda create -n test xyz) on a device that is not the one for deployment or production.

I think that - like for CONDA_OVERRIDE_CUDA - this is a case where it would make sense to provide a similar override for those cases that need to transpose environments between different systems. Again like for the cuda case, the vast majority of users will be able to use the archspec virtual package behind the scenes, without having to do anything explicit in order to get the most appropriate binary.

ngam · 2022-04-29T02:45:47Z

Yes, the override feature will be good enough :)

alippai · 2023-04-08T20:56:17Z

https://www.phoronix.com/news/Fedora-39-RPM-4.19
https://www.phoronix.com/news/openSUSE-TW-x86-64-v3-RPM
This might be relevant, both SUSE and Fedora starts rolling out x86-64-v3 support.

Maybe distributing the v2/v3/v4 binaries would be a great start. With adding x86-64-v3 conda-forge would instantly save some greenhouse gas usage for the Earth 🌍

h-vetinari · 2023-11-15T17:41:18Z

So we have made a big move forward recently by adding the microarch feedstock, and some smaller PRs in many places. We're basically getting ready to actually start building these packages.

However, we need to come up with some common sense rules to avoid CI explosion because the number of packages where the benefits are substantial is expected to be small, but there are likely highly motivated people that want to add it to feedstocks because "it must be faster".

One thing for example that should rule out building for multiple architectures is if the package has some built-in runtime dispatching to microarchitectures (e.g. numpy).

We at least need some documentation (and perhaps some automation?) for this.

h-vetinari · 2024-02-28T23:18:57Z

The very recent archspec 0.2.3 now has windows support, in large part due to @isuruf's work on this. 🥳

Not sure what else is necessary to wire this up though, just tried on a fully up-to-date environment:

>conda info
    [...]
       virtual packages : __archspec=1=x86_64
                          __conda=24.1.2=0
                          __win=0=0

h-vetinari · 2024-03-06T00:20:00Z

Not sure what else is necessary to wire this up though

This should be fixed by conda/conda#13641 in the next conda release.

traversaro · 2024-03-10T10:31:10Z

I started experimenting with microarch-optimized builds in conda-forge/mujoco-feedstock#45, I experienced some problems that I reported in separate issues to avoid having too much content in this one:

Document usage of CONDA_OVERRIDE_ARCHSPEC for microarch-level jobs? #2105
Environment creation fails in mamba and micromamba if it contains the _x86_64-microarch-level package in Linux mamba-org/mamba#3222

beckermr · 2024-03-10T15:59:58Z

Some naive questions etc.

Right now our setup is really targeted at fully native builds where the microarch is available at build and run time and the build and host machine match.
I think the issues here are because we'd like to have something closer to a partial cross compilation setup.
A partial cross compilation setup will require either emulation of instructions targeted at the host but not on the build machine or we turn off tests.
If we do partial cross compilation then we need to either make smithy override the archspec virtual package or we need to remove the run export on the virtual package and create packages in host that export the virtual package constraint to run. Probably there are other solutions here too.
Does qemu support virtualization of x86_64 processor additional instructions on an x86_64 machine without those additional instructions?

beckermr · 2024-03-10T19:09:30Z

Also see this comment in the package description from the original implementation:

When building packages on CI, level=4 will not be guaranteed, so you can only use level<=3 to build.

It appears level 3 is not found on osx too in the linked builds above, but I think this answers the questions.

stuarteberg · 2024-05-29T15:56:18Z

FWIW, we recently added level=4 packages to the graph-tool feedstock (via conda-forge/graph-tool-feedstock#140).

A partial cross compilation setup will require either emulation of instructions targeted at the host but not on the build machine or we turn off tests.

Indeed, we simply disabled tests for level 4.

If we do partial cross compilation then we need to either make smithy override the archspec virtual package or we need to remove the run export on the virtual package and create packages in host that export the virtual package constraint to run. Probably there are other solutions here too.

For level 4, we just added -march=x86-64-v4 to our build flags "by hand" in build.sh, and we also listed the appropriate run dependency directly in our meta.yaml file.

The new packages haven't been live for very long, but they seem to behave as expected.

This was referenced Apr 10, 2021

Figure out better way for dealing with CPU arches conda-forge/faiss-split-feedstock#23

Open

Initial version of libcugraph recipe for conda-forge conda-forge/staged-recipes#14546

Merged

traversaro mentioned this issue Oct 7, 2021

Set USE_HOST_SSE_FLAGS to OFF in recipe that use ign-cmake ? conda-forge/libignition-cmake0-feedstock#19

Open

This was referenced Nov 20, 2021

ENH: More powerful syntax for build variants & optional package-extras conda/conda#11053

Open

Dealing with proliferation of package variants vs. ease of selection #1558

Open

h-vetinari mentioned this issue Apr 28, 2022

Unbundling oneDNN conda-forge/tensorflow-feedstock#183

Open

traversaro mentioned this issue May 23, 2022

Add mujoco recipe conda-forge/staged-recipes#19049

Merged

9 tasks

leofang mentioned this issue May 23, 2022

Specify CPU type as a virtual package conda/conda#11459

Open

3 tasks

xhochy mentioned this issue Jun 8, 2022

setting RUSTFLAGS for linux conda-forge/polars-feedstock#7

Closed

h-vetinari mentioned this issue Dec 14, 2022

Do not set USE_AVX2 conda-forge/katago-feedstock#3

Open

Tobias-Fischer mentioned this issue Feb 17, 2023

Create release for packaging pixel-perfect-sfm in conda-forge cvg/pixel-perfect-sfm#93

Closed

chenghlee mentioned this issue Jul 15, 2023

Supporting microarchitecture-specific builds conda/ceps#59

Open

h-vetinari mentioned this issue Jul 31, 2023

cutlass v3.1.0 conda-forge/cutlass-feedstock#9

Merged

3 tasks

carterbox mentioned this issue Dec 31, 2023

Use micoarch-level package to enable optimized builds for x86-64 instruction sets conda-forge/rav1e-feedstock#8

Open

3 tasks

h-vetinari mentioned this issue Jan 31, 2024

build with PYGAME_DETECT_AVX2=1 conda-forge/pygame-feedstock#20

Open

1 task

traversaro mentioned this issue Mar 10, 2024

Enable microarchitecture-optimized builds conda-forge/mujoco-feedstock#45

Open

5 tasks

traversaro mentioned this issue May 16, 2024

Environment creation fails in mamba and micromamba if it contains the _x86_64-microarch-level package in Linux mamba-org/mamba#3222

Open

3 tasks

traversaro mentioned this issue May 29, 2024

Easier cross-compiling for level 4? conda-forge/microarch-level-feedstock#5

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

archspec-enabled packages #1261

archspec-enabled packages #1261

jaimergp commented Feb 16, 2021 •

edited

Loading

beckermr commented Feb 16, 2021

jaimergp commented Feb 16, 2021

beckermr commented Feb 16, 2021

beckermr commented Feb 16, 2021

chrisburr commented Feb 16, 2021

jaimergp commented Feb 16, 2021

beckermr commented Feb 16, 2021

beckermr commented Feb 16, 2021

isuruf commented Feb 16, 2021

beckermr commented Feb 16, 2021

h-vetinari commented Apr 10, 2021

h-vetinari commented Apr 10, 2021

h-vetinari commented May 7, 2021

h-vetinari commented Jun 1, 2021

wolfv commented Jun 1, 2021

ngam commented Apr 29, 2022

h-vetinari commented Apr 29, 2022

ngam commented Apr 29, 2022

alippai commented Apr 8, 2023

h-vetinari commented Nov 15, 2023

h-vetinari commented Feb 28, 2024

h-vetinari commented Mar 6, 2024

traversaro commented Mar 10, 2024

beckermr commented Mar 10, 2024 •

edited

Loading

beckermr commented Mar 10, 2024

stuarteberg commented May 29, 2024

archspec-enabled packages #1261

archspec-enabled packages #1261

Comments

jaimergp commented Feb 16, 2021 • edited Loading

Idea 1

Idea 2

beckermr commented Feb 16, 2021

jaimergp commented Feb 16, 2021

beckermr commented Feb 16, 2021

beckermr commented Feb 16, 2021

chrisburr commented Feb 16, 2021

jaimergp commented Feb 16, 2021

beckermr commented Feb 16, 2021

beckermr commented Feb 16, 2021

isuruf commented Feb 16, 2021

beckermr commented Feb 16, 2021

h-vetinari commented Apr 10, 2021

h-vetinari commented Apr 10, 2021

h-vetinari commented May 7, 2021

h-vetinari commented Jun 1, 2021

wolfv commented Jun 1, 2021

ngam commented Apr 29, 2022

h-vetinari commented Apr 29, 2022

ngam commented Apr 29, 2022

alippai commented Apr 8, 2023

h-vetinari commented Nov 15, 2023

h-vetinari commented Feb 28, 2024

h-vetinari commented Mar 6, 2024

traversaro commented Mar 10, 2024

beckermr commented Mar 10, 2024 • edited Loading

beckermr commented Mar 10, 2024

stuarteberg commented May 29, 2024

jaimergp commented Feb 16, 2021 •

edited

Loading

beckermr commented Mar 10, 2024 •

edited

Loading