Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Digamma function always executed in double precision on GPU? #6080

Open
botev opened this issue Jun 27, 2017 · 2 comments
Open

Digamma function always executed in double precision on GPU? #6080

botev opened this issue Jun 27, 2017 · 2 comments

Comments

@botev
Copy link
Contributor

botev commented Jun 27, 2017

I'm not too competent on the topic, so was wondering if someone can tell me that my assumption is correct, given that in the c_code of the function in Theano it is defined as:

DEVICE double _psi(ga_double x)

How it is possible to make the code generic and potentially run for float32.
Also, is there a good reason not to do that etc...

@abergeron
Copy link
Member

It is true that the code will always run in double the way it's written. We might want to make the function a template so that it can use float32 for faster speed also. However the constants in the code are computed for float64 and I don't how portable that would be to float32.

@botev
Copy link
Contributor Author

botev commented Jun 29, 2017

so at this stage, first I'll ask how exactly is the function C code inserted when compiling a composite Elementwise operation or a CUDA kernel? Is it as a separate function, which can get the dtype as a template or is it something that needs %(dtype)sas per some of other code? I just have no idea how this conversion goes to have some better perspective on the problem.

As from the code of the two operators, it seems that the most unstable computations are of the form 1/x, 1/(x+1) ... and 1/x^2, 1/(x+1)^2... where these are turned off when x is less than a predefined value. As far as numerical stability is concerned the only issue is when the x is really small, thus why the computation is truncated by default. One might need to define potentially a slightly higher truncation threshold for lower precision types, but other than that I don't see why the rest of the computation can not be done in other types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants