Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSVC does not support statement expressions #234

Closed
kdmadej opened this issue Jan 14, 2015 · 20 comments
Closed

MSVC does not support statement expressions #234

kdmadej opened this issue Jan 14, 2015 · 20 comments
Labels
Milestone

Comments

@kdmadej
Copy link

kdmadej commented Jan 14, 2015

When running some accelerate code on windows there are problems with nvcc.

The log:

C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC>nvcc C:\Users\mwurb_000\AppData\Local\Temp\accelerate-cuda-9999\dragon5116.cu
dragon5116.cu
C:/Users/mwurb_000/AppData/Local/Temp/accelerate-cuda-9999/dragon5116.cu(11): error: expected an expression

C:/Users/mwurb_000/AppData/Local/Temp/accelerate-cuda-9999/dragon5116.cu(16): warning: parsing restarts here after previous syntax error

1 error detected in the compilation of "C:/Users/MWURB_~1/AppData/Local/Temp/tmpxft_00000614_00000000-10_dragon5116.cpp1.ii".

The mentioned kernel:

#include <accelerate_cuda.h>
extern "C" __global__ void map(const double* __restrict__ arrIn0_0, const Int32 shIn1_1, const Int32 shIn1_0, const double* __restrict__ arrIn1_0, const Int32 shOut_1, const Int32 shOut_0, double* __restrict__ arrOut_0)
{
const int shapeSize = shOut_1 * shOut_0;
const int gridSize = blockDim.x * gridDim.x;
int ix;

for (ix = blockDim.x * blockIdx.x + threadIdx.x; ix < shapeSize; ix += gridSize) {
const double x0 = arrIn1_0[ix];

arrOut_0[ix] = x0 / ({
const Int32 v0 = 0;

;
arrIn0_0[v0];
});
}
}

I will try and extract the exact haskell code that generates this kernel, although it's a part of a much larger project and it might take me a bit.

@tmcdonell
Copy link
Member

Oh, looks like [nvcc, msvc?] on windows doesn't have support for statement expressions?

Maybe we can achieve the same functionality using C++ lambdas. I am guessing that the quasi-quotation library won't support that though, so this could be tricky.

@kdmadej
Copy link
Author

kdmadej commented Jan 15, 2015

Any ETA for being able to tell if that would actually be tricky? We are quite concerned about running accelerate on windows.

@tmcdonell
Copy link
Member

I am a bit swamped at the moment...
I also don't have a windows machine set up to test on.

As a stop-gap solution, you could always revert the specific commit that introduced those changes. Although the change was made for correctness & performance reasons, it will at least get things compiling for you ...

@mchakravarty
Copy link
Member

To add to @tmcdonell's comment, we are really all Mac/Linux people here. We would like Accelerate to work well on Windows, but we don't have the resources or experience with Windows development to provide proper Windows support.

So, this is an area, where we do depend on help from the wider community of Accelerate users.

As a side note concerning this specific issue, I believe NVIDIA just announced a new CUDA version with many compiler improvements. It might be worth having a look whether statement expression are supported in that new version.

@awson
Copy link

awson commented Jan 18, 2015

Statement expressions are not supported in CUDA 7 for Windows either. NVIDIA supports Visual C only on Windows, which does not support statement expressions anyway.

The good thing on Windows is that NVIDIA drivers work out of the box without any problems on any NVIDIA hardware (Optimus included), while the problems on Linux are numerous and I fear will last forever. I've already spend hours and days fighting with NVIDIA-related things suddenly stopping working under Linux and it is simply a breath of fresh air to switch to Windows which just works. Except the new Accelerate 0.15 does not work anymore.

I was able to patch Accelerate 0.14 to make it generate correct code with no statement expressions, but in Accelerate 0.15 they are used ubiquitously. Is it such a big problem to get rid of statement expressions in Accelerate code generator?

@mchakravarty
Copy link
Member

@awson Statement expressions often do make code generation quite a lot simpler as they allow you to declare local variables for intermediate values.

@awson
Copy link

awson commented Jan 19, 2015

Yes.

And how big is the task to keep expression trees unflatten (==inline them all)? I've made this for 0.14 but intermediate constants usage was contained there, and this is not so in 0.15 AFAIUI.

@tmcdonell tmcdonell changed the title Some generated CUDA kernels are not compiling on Windows MSVC does not support statement expressions Jan 19, 2015
@tmcdonell
Copy link
Member

@awson instead of the code generator being based on generating C expressions, we need to move towards outputting C statements at every step. This is a bit tricky.

An example of the problems with 0.14: if ( i < length(a) && f (a ! i) ) then .... If we flatten this expression, then (!) always being evaluated; i.e. we lost the short-circuiting behaviour of (&&) that should have protected us from that. This kind of problem is why the use of statement expressions was introduced.

I'm happy to look for other approaches that also work on windows. But, as I said above, I don't have a lot of spare cycles at the moment, and I will need some assistance getting up to speed on windows as I do not have a lot of experience here.

@tmcdonell
Copy link
Member

That should really be ( i < length(a) && f (a ! i) ) ? ... : ..., but you get the idea.

@awson
Copy link

awson commented Feb 2, 2015

I think the ticket name is misleading.

Not only a host code (compiled with Visual C++) suffers from this incompatibility, but (and this is much more important) a device code (compiled with cuda front end) also can't be compiled on Windows.

NVidia Cuda front end uses EDG C++ front end both on Linux and Windows but in different compatibility modes -- GCC on Linux and Visual C++ on Windows.

@mwu-tow
Copy link

mwu-tow commented Aug 13, 2015

Hi!
Any chances for updates on this issue?

The lambda-expressions should be relatively similar in form to the statement expressions, ie.

({
const Int32 v0 = 0;

;
arrIn0_0[v0];
})

should be equivalent to:

[&]{
const Int32 v0 = 0;

;
return arrIn0_0[v0];
}()

Perhaps in the worst case the pretty-printer of the kernel code itself could be hacked to print expression statements as lambdas — if fixing the quasi-quotation library is too troublesome?

@tmcdonell
Copy link
Member

I haven't looked at the implementation of the quasi quoting library, language-c-quote, so I can't answer how difficult that functionality would be to add. If you are interested in adding that in, I can look into making the appropriate changes to accelerate-cuda.

@mwu-tow
Copy link

mwu-tow commented Aug 14, 2015

Hi, thank you for the reply!
I have done a quick hack to a language-c-quote pretty printer so the expression statements are printed as C++11 lambdas: mwu-tow/language-c-quote@e670cfd
If this approach is confirmed to work, we can make a somewhat nicer patch to language-c-quote that adds new structures for representing lambdas and use them in accelerate-cuda.

However, I am unable to confirm whether such hack works, as cuda package cannot be installed on windows (cf. tmcdonell/cuda#25 ). Well, something should be done about that issue as well.

There are two doubts related to this approach:

  • whether nvcc on Linux will accept lambdas (is it working in c++11 mode? can appropriate flags be supplied? In worst case we could differentiate pretty-printing depending on the platform)
  • whether generating lambdas can be accepted into language-c-quote, as it is a C++ feature not belonging to the C language.

Still, having the accelerate-cuda working seems currently more important to me than such small aesthetic details.

@mikusp
Copy link
Contributor

mikusp commented Aug 18, 2015

It seems that this hack works on Linux out-of-the-box. I didn't need to pass any additional options to nvcc.

@mwu-tow
Copy link

mwu-tow commented Aug 18, 2015

@mikusp Does this solve your issue described in #258 as well?

@tmcdonell
Copy link
Member

I wonder how this changes the generated code (PTX). In particular we should check whether it adversely affects performance.

But, since it seems like a way forward, then you might want to push this through with language-c-quote. I don't think that silently changing the pretty printing method is a good idea though, so then there is the question of where this part of a "language-c++-quote" or some other auxiliary package. If you can just provide a different pretty printer function, that might be an option, but I have no idea if all the required internals are exposed.

@mwu-tow
Copy link

mwu-tow commented Aug 25, 2015

@tmcdonell
Yes, the question whether the change impacts the generated code is relevant. Such lambdas calls are transparent for the compiler and it "should be" able to easily inline it / optimize it. Still, in such we can never be sure unless we check what the generated code looks like.

Would you be able to see how the replacing expression statement with lambda-call affects the generated code?

I already submitted a WIP pull request to language-c-quote that adds support for parsing certain subset of lambda expressions and related AST structures as CUDA extension — see mainland/language-c-quote/pull/53. It covers all the use-cases needed to replace expression statements. Not a particularly beautiful thing but still much better than hijacking the pretty-printer.

Having a separate pretty printer… does not seem easy to me, at least without writing much of duplicated code or revamping the pretty printing mechanisms on the side of language-c-quote itself. (Though I may be wrong here…)

@tmcdonell
Copy link
Member

I think that the most conclusive way would be to look at the generated PTX code. You are right and it should be possible for the compiler to inline them so that there is no or little difference, but maybe they are lifted out into separate function calls. Checking other metrics like register usage, instruction count, or just performance, could also be indicative before diving straight into the PTX.

@mwu-tow
Copy link

mwu-tow commented Sep 2, 2015

I have published changes needed to use the proposed lambda-expressions support from language-c-quote: https://github.com/mwu-tow/accelerate-cuda/tree/lambda-expressions

Before mainland/language-c-quote/pull/53 goes through, mine fork of language-c-quote is needed: https://github.com/mwu-tow/language-c-quote/tree/lambda-expressions

@tmcdonell tmcdonell modified the milestone: _|_ Apr 14, 2017
@tmcdonell
Copy link
Member

Closing as accelerate-cuda is deprecated and accelerate-llvm-ptx works on windows now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants