Add a CPU features detection library #2824

Nax · 2016-06-13T16:12:09Z

It implements CPUID using inline assembly, as well as a thin, cached layer of abstraction for common features.

Supported features:

SSE
SSE2
SSE3
SSSE3
SSE4.1
SSE4.2
AVX
AVX2

There is also vendor detection (Intel/AMD) and vendor string reporting.

No tests are provided, as the results of these method calls depends on the CPU model.

It implements CPUID using inline assembly, as well as a thin, cached layer of abstraction for common features. Supported features: * SSE * SSE2 * SSE3 * SSSE3 * SSE4.1 * SSE4.2 * AVX * AVX2 There is also vendor detection (Intel/AMD) and vendor string reporting. No tests are provided, as the results of these method calls depends on the CPU.

jhass · 2016-06-13T16:15:04Z

Mmh, interesting, is that in preparation for something? If not what are the common enough usecases for it to make sense in a languages standard library? Please keep in mind that every code we accept increases the technical debt and thus maintenance burden :) If we decide to not accept it makes a great shard for sure!

Nax · 2016-06-13T16:19:47Z

Indeed! We could use this to speed up critical portions of the standard library by branching at runtime to a different codepath, based on what extended instruction set the hardware provides. glibc does something similar internally to speed up critical parts, but it does not expose it.

The intended use case is for the std lib itself, actually, but other people may find an use for it too!

jhass · 2016-06-13T16:27:12Z

Sorry for being stubborn, but do you have something specific in mind already? Do LLVM intrinsics generate code that does similar things? If yes, what examples are there not covered by intrinsics?

Nax · 2016-06-13T16:37:20Z

This does not generate any code per se, it just allow you to detect what your CPU can and can't do at runtime. The goal here is to prevent the CPU to execute instructions it doesn't have support for. If you generate an AVX2-only codepath, for exemple, and your CPU doesn't support AVX2, it's gonna crash. It's useful if you want to redistribute binaries. With x86_64, you can't use anything more recent than SSE2 if you plan on redistributing binaries, and it severely limits the performances in some heavy number crunching algorithms. This pull request allow you to bypass this restriction, by having more than one codepath, and branching on the fastest one that is supported on your platform, at runtime.

asterite · 2016-06-13T16:43:59Z

@Nax But this is only useful if you write inline assembly, right?

Nax · 2016-06-13T16:46:04Z

@asterite Unless we have something equivalent to C compilers's intrinsics, then yes.

jhass · 2016-06-13T16:46:20Z

Sorry to clarify, I meant whether LLVM intrinsics already do generate code which uses CPU feature detection to optimize.

Nax · 2016-06-13T16:48:46Z

@jhass No, LLVM won't do that. You can ask it to emit extended instructions, but it will not insert runtime checks on it's own.

jhass · 2016-06-13T16:50:39Z

src/cpu.cr

+  @@features = Features::None
+
+  def self.cpuid(fn : Int32, subfn : Int32 = 0)
+    buf = [0_u32, 0_u32, 0_u32, 0_u32]


Should this rather be a StaticArray?

Aren't StaticArray bound to the stack? I supposed they could not be returned from methods.

Ah well, yes. They can be returned but then a copy is returned from my understanding. Might still be faster than a heap allocation?

They can, they are passed by copy

Oh okay, then I will change to StaticArray. Indeed, copying 128 bits should be way faster than allocating.

jhass · 2016-06-13T16:53:02Z

How about scoping this into the System module? System::CPU

Nax · 2016-06-13T16:55:44Z

System::CPU is a nice idea indeed.

Internally use StaticArray for performance.

ysbaddaden · 2016-06-13T18:20:25Z

Is it cross OS? Since it involves ASM, I guess it is. I understand it detects and runs on x86 CPUs, but what about other ones, like ARM or MIPS archs? And what about 32bits?

It could be nice to have, but maybe premature for per feature optimisations in the core/stdlib. Also well have to maintain it, and it may make it more complex to introduce new architectures. Of course great optimisations could help to integrate it nonetheless. We'll need input from @waj and @asterite here.

Nax · 2016-06-13T18:43:55Z

It is cross OS indeed, and should work fine on both x86 and x86_64. For ARM and MIPS, this won't compile. On these archs, we should probably throw an exception when cpuid is called, and report the CPU as unknown with no features.

I don't think this would make new archs more complex to adopt if we do that. I'm gonna add checks to make it compile on non-x86 archs.

jhass · 2016-06-15T23:39:09Z

Filepath should be src/system/cpu.cr now, else looks good to me, probably should have a second good to go from @crystal-lang/crystallers

Nax · 2016-08-05T12:50:26Z

Any news on this?

asterite · 2016-08-05T13:31:09Z

@Nax To merge this we'll need a real, concrete use case

RX14 · 2016-08-05T16:07:15Z

Please do release a shard for this though! I think it can be useful, possibly in a future simd library.

HertzDevil · 2024-02-09T20:14:15Z

For the few people who are looking for this, LLVM itself provides some of the functionality here, although you probably need to dig into its source code to parse those feature names:

require "llvm"

# target_machine.cr
lib LibLLVM
  fun get_host_cpu_features = LLVMGetHostCPUFeatures : Char*
end

LLVM.init_aarch64 # or `init_x86` etc.
features = LLVM.string_and_dispose(LibLLVM.get_host_cpu_features)
features.split(',') # => ["+fp-armv8", "+lse", "+neon", "+crc", "+crypto"]

My guess is LLVM needs this information for -march=native.

jhass added the topic:stdlib label Jun 13, 2016

jhass reviewed Jun 13, 2016
View reviewed changes

Namespaced CPU to System::CPU, now use lazy-init

01823b4

Internally use StaticArray for performance.

Add checks for x86_64 and i686

eb1e42a

Moved cpu.cr to src/system

6e8f86b

Nax closed this Aug 5, 2016

Nax deleted the cpu branch August 5, 2016 15:12

ysbaddaden mentioned this pull request Feb 12, 2024

Fix: Atomics and Locks (ARM, AArch64, X86) #14293

Merged

HertzDevil mentioned this pull request Apr 22, 2024

Expose CPU model (and features?) as compile time flags #14524

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a CPU features detection library #2824

Add a CPU features detection library #2824

Nax commented Jun 13, 2016

jhass commented Jun 13, 2016 •

edited

Nax commented Jun 13, 2016 •

edited

jhass commented Jun 13, 2016

Nax commented Jun 13, 2016

asterite commented Jun 13, 2016

Nax commented Jun 13, 2016

jhass commented Jun 13, 2016

Nax commented Jun 13, 2016

jhass Jun 13, 2016

Nax Jun 13, 2016

jhass Jun 13, 2016

asterite Jun 13, 2016

Nax Jun 13, 2016

jhass commented Jun 13, 2016

Nax commented Jun 13, 2016

ysbaddaden commented Jun 13, 2016

Nax commented Jun 13, 2016

jhass commented Jun 15, 2016

Nax commented Aug 5, 2016

asterite commented Aug 5, 2016

RX14 commented Aug 5, 2016

HertzDevil commented Feb 9, 2024

Add a CPU features detection library #2824

Add a CPU features detection library #2824

Conversation

Nax commented Jun 13, 2016

jhass commented Jun 13, 2016 • edited

Nax commented Jun 13, 2016 • edited

jhass commented Jun 13, 2016

Nax commented Jun 13, 2016

asterite commented Jun 13, 2016

Nax commented Jun 13, 2016

jhass commented Jun 13, 2016

Nax commented Jun 13, 2016

jhass Jun 13, 2016

Choose a reason for hiding this comment

Nax Jun 13, 2016

Choose a reason for hiding this comment

jhass Jun 13, 2016

Choose a reason for hiding this comment

asterite Jun 13, 2016

Choose a reason for hiding this comment

Nax Jun 13, 2016

Choose a reason for hiding this comment

jhass commented Jun 13, 2016

Nax commented Jun 13, 2016

ysbaddaden commented Jun 13, 2016

Nax commented Jun 13, 2016

jhass commented Jun 15, 2016

Nax commented Aug 5, 2016

asterite commented Aug 5, 2016

RX14 commented Aug 5, 2016

HertzDevil commented Feb 9, 2024

jhass commented Jun 13, 2016 •

edited

Nax commented Jun 13, 2016 •

edited