Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a CPU features detection library #2824

Closed
wants to merge 4 commits into from
Closed

Add a CPU features detection library #2824

wants to merge 4 commits into from

Conversation

Nax
Copy link

@Nax Nax commented Jun 13, 2016

It implements CPUID using inline assembly, as well as a thin, cached layer of abstraction for common features.

Supported features:

  • SSE
  • SSE2
  • SSE3
  • SSSE3
  • SSE4.1
  • SSE4.2
  • AVX
  • AVX2

There is also vendor detection (Intel/AMD) and vendor string reporting.

No tests are provided, as the results of these method calls depends on the CPU model.

It implements CPUID using inline assembly, as well as a thin, cached layer of abstraction for common features.

Supported features:

 * SSE
 * SSE2
 * SSE3
 * SSSE3
 * SSE4.1
 * SSE4.2
 * AVX
 * AVX2

There is also vendor detection (Intel/AMD) and vendor string reporting.

No tests are provided, as the results of these method calls depends on the CPU.
@jhass
Copy link
Member

jhass commented Jun 13, 2016

Mmh, interesting, is that in preparation for something? If not what are the common enough usecases for it to make sense in a languages standard library? Please keep in mind that every code we accept increases the technical debt and thus maintenance burden :) If we decide to not accept it makes a great shard for sure!

@Nax
Copy link
Author

Nax commented Jun 13, 2016

Indeed! We could use this to speed up critical portions of the standard library by branching at runtime to a different codepath, based on what extended instruction set the hardware provides. glibc does something similar internally to speed up critical parts, but it does not expose it.

The intended use case is for the std lib itself, actually, but other people may find an use for it too!

@jhass
Copy link
Member

jhass commented Jun 13, 2016

Sorry for being stubborn, but do you have something specific in mind already? Do LLVM intrinsics generate code that does similar things? If yes, what examples are there not covered by intrinsics?

@Nax
Copy link
Author

Nax commented Jun 13, 2016

This does not generate any code per se, it just allow you to detect what your CPU can and can't do at runtime. The goal here is to prevent the CPU to execute instructions it doesn't have support for. If you generate an AVX2-only codepath, for exemple, and your CPU doesn't support AVX2, it's gonna crash. It's useful if you want to redistribute binaries. With x86_64, you can't use anything more recent than SSE2 if you plan on redistributing binaries, and it severely limits the performances in some heavy number crunching algorithms. This pull request allow you to bypass this restriction, by having more than one codepath, and branching on the fastest one that is supported on your platform, at runtime.

@asterite
Copy link
Member

@Nax But this is only useful if you write inline assembly, right?

@Nax
Copy link
Author

Nax commented Jun 13, 2016

@asterite Unless we have something equivalent to C compilers's intrinsics, then yes.

@jhass
Copy link
Member

jhass commented Jun 13, 2016

Sorry to clarify, I meant whether LLVM intrinsics already do generate code which uses CPU feature detection to optimize.

@Nax
Copy link
Author

Nax commented Jun 13, 2016

@jhass No, LLVM won't do that. You can ask it to emit extended instructions, but it will not insert runtime checks on it's own.

@@features = Features::None

def self.cpuid(fn : Int32, subfn : Int32 = 0)
buf = [0_u32, 0_u32, 0_u32, 0_u32]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this rather be a StaticArray?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't StaticArray bound to the stack? I supposed they could not be returned from methods.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah well, yes. They can be returned but then a copy is returned from my understanding. Might still be faster than a heap allocation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They can, they are passed by copy

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh okay, then I will change to StaticArray. Indeed, copying 128 bits should be way faster than allocating.

@jhass
Copy link
Member

jhass commented Jun 13, 2016

How about scoping this into the System module? System::CPU

@Nax
Copy link
Author

Nax commented Jun 13, 2016

System::CPU is a nice idea indeed.

Internally use StaticArray for performance.
@ysbaddaden
Copy link
Contributor

Is it cross OS? Since it involves ASM, I guess it is. I understand it detects and runs on x86 CPUs, but what about other ones, like ARM or MIPS archs? And what about 32bits?

It could be nice to have, but maybe premature for per feature optimisations in the core/stdlib. Also well have to maintain it, and it may make it more complex to introduce new architectures. Of course great optimisations could help to integrate it nonetheless. We'll need input from @waj and @asterite here.

@Nax
Copy link
Author

Nax commented Jun 13, 2016

It is cross OS indeed, and should work fine on both x86 and x86_64. For ARM and MIPS, this won't compile. On these archs, we should probably throw an exception when cpuid is called, and report the CPU as unknown with no features.

I don't think this would make new archs more complex to adopt if we do that. I'm gonna add checks to make it compile on non-x86 archs.

@jhass
Copy link
Member

jhass commented Jun 15, 2016

Filepath should be src/system/cpu.cr now, else looks good to me, probably should have a second good to go from @crystal-lang/crystallers

@Nax
Copy link
Author

Nax commented Aug 5, 2016

Any news on this?

@asterite
Copy link
Member

asterite commented Aug 5, 2016

@Nax To merge this we'll need a real, concrete use case

@Nax Nax closed this Aug 5, 2016
@Nax Nax deleted the cpu branch August 5, 2016 15:12
@RX14
Copy link
Contributor

RX14 commented Aug 5, 2016

Please do release a shard for this though! I think it can be useful, possibly in a future simd library.

@HertzDevil
Copy link
Contributor

For the few people who are looking for this, LLVM itself provides some of the functionality here, although you probably need to dig into its source code to parse those feature names:

require "llvm"

# target_machine.cr
lib LibLLVM
  fun get_host_cpu_features = LLVMGetHostCPUFeatures : Char*
end

LLVM.init_aarch64 # or `init_x86` etc.
features = LLVM.string_and_dispose(LibLLVM.get_host_cpu_features)
features.split(',') # => ["+fp-armv8", "+lse", "+neon", "+crc", "+crypto"]

My guess is LLVM needs this information for -march=native.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants