[SYSTEMML-1034] implemented gpu solve#476
Conversation
|
Build failed, see build log for details |
|
Refer to this link for build results (access rights to CI server needed): |
|
well, I could indeed imagine such as speedup as we're currently only calling out to commons-math because solve is by far not the bottleneck in ALS or LinregDS (only called for tiny matrices in the rank or number of features). |
|
@mboehm7 - understood, still this PR provides value. The more operations in a loop that are on the GPU, the lesser the ping pong of data between host and device memories. |
|
sure - this is absolutely fine; I'm just setting the expectations straight: for example for LinregDS, it's called once and is even for 1k features in the sub-second range. However, down the road, once we have a distributed solve, there might be more algorithms that could benefit from it. |
|
Refer to this link for build results (access rights to CI server needed): |
|
LGTM. If this is GPU, I feel @nakul02 and @niketanpansare are the owners of this area and need to merge and move forward for our 1.0.0 release. |
|
thanks @deroneriksson ! |
|
I've checked the results from gpu solve and these are correct. |
|
LGTM, Thanks Nakul 👍 |
|
Thanks, I shall merge. |
Implemented the GPU
solve()function.Ping @niketanpansare, @bertholdreinwald, @dusenberrymw
@iyounus - can you please try this out? (and also check for correctness?, I've checked on smaller data)
This will benefit us in some sense. I see it being used in these algorithms (based on a simple grep search):
For me, I seem to get a 30x speedup in an example that I tried on my own machine (core i7 quad core, 32gb ram, GTX1070).
Program:
Output