Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System.Math and System.MathF should be implemented in managed code, rather than as FCALLs to the C runtime #14155

Open
tannergooding opened this Issue Sep 23, 2017 · 10 comments

Comments

Projects
None yet
6 participants
@tannergooding
Copy link
Member

tannergooding commented Sep 23, 2017

As per the title, both System.Math and System.MathF should have most of their extern methods implemented in managed code rather than being FCALLs to the underlying C runtime.

This will ensure:

  • Consistency across operating systems and architectures
  • Implementations can be more readily updated without requiring changes in the runtime proper

Some of the functions (such as Abs, Ceil, Floor, Round, and Sqrt) are simple enough that they can be implemented in managed code today and still maintain the performance characteristics.

Other functions (such as Cos, Sin, and Tan) will need to wait until the hardware intrinsics proposal is more widely available (since maintaining perf numbers will require an implementation to call said intrinsics).

@tannergooding

This comment has been minimized.

Copy link
Member Author

tannergooding commented Sep 23, 2017

FYI. @mellinoe

@jkotas

This comment has been minimized.

Copy link
Member

jkotas commented Sep 24, 2017

I do not think that this is necessarily a good idea. It makes the runtime less portable. The C runtime implementations of these functions are a fine default implementation.

If we want to make the implementation better on some platforms, that's ok - but it should not be the requirement for all platforms.

@tannergooding

This comment has been minimized.

Copy link
Member Author

tannergooding commented Sep 24, 2017

@jkotas, how does it make the runtime "less portable"?

Currently, we are tied to a particular implementation of the C runtime for each platform. This leads to:

  • inconsistencies in behavior when intrinsics are not used
  • requires runtime updates/workarounds to resolve bugs
  • has lead to additional overhead in the PAL layer to ensure other platforms "conform" to the Windows behavior
  • leads to significant perf differences between the OS platforms (when intrinsics are not being used)
  • etc

I would think providing a managed implementation makes it more portable since it means:

  • fewer inconsistencies when intrinsics are not used
  • updates can be made to just corlib (no changes to native code required)
    • This applies to bug fixes, IEEE spec compliances fixes, improvements, etc
  • no additional hacks, overhead, or conformance tests in the PAL layer
  • codegen/perf should be the same between OS platforms on the same hardware
  • etc

For all of these we should obviously ensure that codegen and perf remain on-par with what we have today.

Some of the functions, such as Abs (#14156) are trivial to do that with (since their implementation is so simple)

Others (such as Floor, Ceiling, and Round), are slightly more complicated, but should still be doable with hardware intrinsics (and keeping them equally as performant).

The remaining (such as Cos, Tan, and Sin) are known to require hardware intrinsics to complete this.

@jkotas

This comment has been minimized.

Copy link
Member

jkotas commented Sep 24, 2017

hardware intrinsics

If porting to a new hardware platform requires implementing a ton of intrinsics in the CodeGen, you have added like one man-year to the porting cost. It is what makes the runtime less portable.

@tannergooding

This comment has been minimized.

Copy link
Member Author

tannergooding commented Sep 24, 2017

If porting to a new hardware platform requires implementing a ton of intrinsics in the CodeGen, you have added like one man-year to the porting cost.

It does not, strictly speaking, require this. It only likely requires this for producing the most performant code.

It is also, strictly speaking, entirely possible to set the "intrinsic" for these functions on new hardware architectures to be the CRT implementation by default (provided they are IEEE compliant), if the software fallback's performance is considered too poor.

That being said, we already have the case today, when using the CRT implementation, that the majority of the functions on some platforms are considerably poorer (330% slower in the worst case): #9373.

This proposal gives us a standard baseline of "correctness" in the software implementation where any hardware specific improvements can readily be checked. It also allows us to check that the underlying CRT implementation (if it were to be used as the intrinsic) is compatible with our expectations.

@AndyAyersMS

This comment has been minimized.

Copy link
Member

AndyAyersMS commented Sep 25, 2017

A few places where this might unlock some perf:

  • On Windows x86 the the jit now uses SSE2 internally but X87 at the ABI boundaries, so there is extra overhead involved in calling into native math helpers. There are is a custom ABI library that the C++ compiler uses to get around this (libm_sse2) but we would need to do work in the runtime and jit to enable calling those library routines from managed code. So having managed implementations, especially for the simpler methods with short path lengths, could provide a nice perf boost on X86.

  • In the SYSV X64 ABI there are no callee-saved XMM registers, so the inability of the jit to inline math helpers leads to extra XMM spill/reloads.

@jkotas

This comment has been minimized.

Copy link
Member

jkotas commented Sep 25, 2017

As I have said, I am fine with using C# implementation that depends on intrinsics on platforms where we have deeper codegen investments.

The best implementation of Abs for x64 is actually: public static double Abs(double value) => Abs(value). This assumes that the Abs intrinsic will be force expanded - being done as part of #14020. This implementation is both best performing and also guarantees consistent behavior between the inlined cases and the method being called by reflection or via delegate.

For bring up of new platforms or platforms with less codegen investment, the C runtime implementation is the default. It may not be as good or it can have different behavior in corner cases - but it is good enough. For example, I do not ever want to have a software fallback for cos to be implemented in managed code. I always want to use the C runtime implementation for that.

@fanoI

This comment has been minimized.

Copy link

fanoI commented Sep 26, 2017

Having a managed version will not do the porting easier? For example suppose you want to port Net Core to MIPS you have not worry to port Math / FMath initially and you have time to debug other issues, the Math optimization using Hardware Intrinsics / calling C run time could be done after.

Or I'm missing something?

@jkotas

This comment has been minimized.

Copy link
Member

jkotas commented Sep 26, 2017

For porting, you do not have to worry about the C runtime either (porting CoreCLR to a platform without C runtime is non-scenario).

And the C runtime functions will be better debugged and have better performance than a software based callback written in C# (on platforms without deeper codegen investment).

@migueldeicaza

This comment has been minimized.

Copy link
Member

migueldeicaza commented Sep 26, 2017

There are several problems with this proposal, but to me the major problem is that it moves the maintenance and fixing from the underlying platform to us for what is a very well established, maintained and universally understood API.

The stated goal of "consistency across operating systems and platforms" has been troublesome in the past for things like Unicode string collation. Mono emulated the Windows behavior, while .NET Core took the native approach - and I think that the approach that .NET core took of using libICU instead of trying to emulate the Windows behavior was the right one.

Porting to a new platform already requires an advanced libc to be available, or an equivalent. Not only for the managed bridges, but the runtime itself (CoreCLR and Mono) both consume abs, floor, ceil and for things like sqrt, they tend to be mapped to CPU instructions.

There is the performance issue as well, which Jan just touched on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.