Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exp, @fastmath, SVML vectorization. #21454

Open
DrTodd13 opened this issue Apr 20, 2017 · 22 comments
Open

exp, @fastmath, SVML vectorization. #21454

DrTodd13 opened this issue Apr 20, 2017 · 22 comments
Labels
codegen Generation of LLVM IR and native code maths Mathematical functions performance Must go faster

Comments

@DrTodd13
Copy link
Contributor

In Julia 0.6, I noticed that exp is no longer a call to libm but has been implemented in Julia itself. I wonder if this decision has potential performance implications not far down the road. Through SVML, LLVM is able to provide vectorization for exp, if exp is invoked through a SVML intrinsic or a call into libm. It won't vectorize if the expanded LLVM from Julia's exp implementation is included. We can use fastmath to revert to a call to libm but this raises the question of the semantics of fastmath. It seems like the semantics of fastmath should be a loss of accuracy in exchange for performance. My understanding is that this is indeed the behavior for Julia fastmath adds in that Julia will use the LLVM fastmath flag. I also believe that currently Julia fastmath exp is not consistent in that it does not signal a lower accuracy version and so we would expect Julia's exp to have the same accuracy/performance as fastmath/libm exp?

I have been told that SVML provides three points within the accuracy/performance tradeoff space. We can debate but it seems like fastmath should map to one of the lower two accuracy (higher performance) levels. The question then becomes, how do you get vectorization at the highest accuracy level with SVML? It seems that implementing exp in Julia precludes this possibility unless more code is added to detect potential vectorization with SVML and in that case to revert back to a libm call. Why not just always libm then? In what circumstance is Julia exp superior?

@yuyichao
Copy link
Contributor

yuyichao commented Apr 20, 2017

Through SVML, LLVM is able to provide vectorization for exp, if exp is invoked through a SVML intrinsic or a call into libm.

Have you actually seen this happen? I don't think we have ever lowered it in any way that llvm can recognize.

@Keno
Copy link
Member

Keno commented Apr 20, 2017

Yes, we don't lower exp in a way that LLVM can recognize at the moment. However, it should be fairly simple to add a generic hook to fix that. As you noted, manually calling exp or the llvm intrinsic will work for testing purposes. The julia native implementation is generally faster than libm. In any case, there's no rush on this since we can't drop in SVML at the moment anyway.

@JeffBezanson
Copy link
Sponsor Member

I also believe that currently Julia fastmath exp is not consistent in that it does not signal a lower accuracy version

It seems intuitive to me that fastmath would allow calling a lower-accuracy version but not require it. In this case I believe the intent of the fastmath version was to skip error checks.

I don't think we have ever lowered it in any way that llvm can recognize.

This doesn't really matter --- we could implement exp in such a way that llvm could recognize it, but that would mean skipping the julia implementation, so the point still stands.

@JeffBezanson JeffBezanson added codegen Generation of LLVM IR and native code maths Mathematical functions performance Must go faster labels Apr 20, 2017
@yuyichao
Copy link
Contributor

We can use fastmath to revert to a call to libm

This is most likely an oversight that should be fixed. In fact, the fast math version seems slower.

This doesn't really matter

My point being there should be no regression caused by this change in 0.6. Being able to vectorize it would obviously be even better.

that would mean skipping the julia implementation

Hopefully no since that means a failure to vectorize will create slower code....

@Keno
Copy link
Member

Keno commented Apr 20, 2017

There's no problem with just telling LLVM that our exp function is the same as what it considers exp to be. Just one extra hook in TargetLibraryInfo. Even better we could come up with a generic way of annotating This function is a vectorized version of this other function.

@yuyichao
Copy link
Contributor

That'll be cool. How hard would it be to tell LLVM that a julia function can be vectorized (either because there's no complex control flow in it or because we defined a version that operate on NTuple{...,VecElement{...}} directly)?

@Keno
Copy link
Member

Keno commented Apr 20, 2017

The hooks are already there in TargetLibraryInfo as I said, but may require some hacking to have it do anything other than what is hardcoded right now. For functions without complex control flow, LLVM should be able to figure out by itself that the function can be vectorized, so we should just fix that in LLVM.

@anton-malakhov
Copy link

anton-malakhov commented Apr 20, 2017

We are working on experimental patch to LLVM 4.0 which enables vectorization for all the SVML functions. Here is the list of enabled functions: sin cos pow exp log acos acosh asin asinh atan2 atan atanh cbrt cdfnorm cdfnorminv ceil cosd cosh erf erfc erfcinv erfinv exp10 exp2 expm1 floor fmod hypot invsqrt log10 log1p log2 logb nearbyint rint round sind sinh sqrt tan tanh trunc
Moreover, Intel will soon provide you a license to redistribute SVML the same way as your redistribute MKL in your binary Julia Distribution. It would be very cool if we can enable vectorization of these functions not only in the fastmath mode, SVML provides HA functions for high accuracy as well.

@StefanKarpinski
Copy link
Sponsor Member

We can't actually legally distribute Julia with MKL unless Julia is built without any GPL libraries, which is not a standard build setup, so we won't be able to ship with SVML either. If we get rid of all the GPL libraries from Base Julia (which is a long term goal) then we'll be able to ship with MKL and SVML.

@Keno
Copy link
Member

Keno commented Apr 20, 2017

Of course if Intel wanted to open source SVML under a GPL-compatible license that'd be great (and we could start using it immediately).

@StefanKarpinski
Copy link
Sponsor Member

Ditto with MKL 😀

@anton-malakhov
Copy link

While we are considering open-sourceing SVML (though it might be still limitted and takes time to release).. it's quite unlikely to happen for MKL

@simonbyrne
Copy link
Contributor

I think this is a duplicate of #15265.

@anton-malakhov
Copy link

anton-malakhov commented Apr 21, 2017

@StefanKarpinski @Keno, Viral assured us that "We expect that JuliaPro will start shipping with mkl by juliacon."
Thus our question is w.r.t. integration of SVML into MKL build of Julia Pro distro is still valid and urgent enough.

@Keno
Copy link
Member

Keno commented Apr 21, 2017

As I said SVML is not currently integrable into Julia for technical reasons. Intel NDA prevents me from giving details in this forum. Feel free to email me.

@RoyiAvital
Copy link

RoyiAvital commented Jul 13, 2018

Is there an update to having SVML under Julia?

It seems to be holding back Julia (At part of the reason) in the following test:

https://www.modelsandrisk.org/appendix/speed/

Though Python + Numba is still faster when Julia is using @inbounds and Apple libm (See https://julialang.slack.com/archives/C67910KEH/p1531490464000597?thread_ts=1531475750.000264&cid=C67910KEH).

@StefanKarpinski
Copy link
Sponsor Member

No.

@simonbyrne
Copy link
Contributor

In the long run, it would be neat to have something like ISPC (https://ispc.github.io/) in Julia itself.

@RoyiAvital
Copy link

@simonbyrne , Using SVML and ispc like approach are complementary of each other, aren't they?
Not that I'm an expert on that but I would assume integrating SVML is easier especially when Intel offers assistance.

@Keno Keno closed this as completed Jul 19, 2018
@Keno
Copy link
Member

Keno commented Jul 19, 2018

They are complimentary, but properly integrating SVML requires julia at the frontend level to be aware of vector lanes, which we currently don't have, but would be a prerequisite for exposing a general spmd programming model.

@Keno Keno reopened this Jul 19, 2018
@anton-malakhov
Copy link

@Keno, numba is not aware of vector lanes, still thanks to ability of LLVM to recognize libm calls and transform them into svml calls along allows it to enjoy nice speedups on transcendental functions. I know that Julia goes away from libm calls.. but can you consider having a mixed or opt-in approach to enable them back? E.g. we can start with some` @fastmath(SVML)-like macro which will enable good old libm functions emitting and switch LLVM into SVML mode?

@Keno
Copy link
Member

Keno commented Jul 20, 2018

Sure, that's why I said "properly integrating". A plethora of other hacks are and have always been possible. E.g. we used SVML for Celeste just fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
codegen Generation of LLVM IR and native code maths Mathematical functions performance Must go faster
Projects
None yet
Development

No branches or pull requests

8 participants