Jialei gh297 speed up gpu computation on ei grad ei #351

jialeiwang · 2014-08-01T22:58:23Z

********* PEOPLE *************
Primary reviewer: @suntzu86

Reviewers: @sc932

********* DESCRIPTION **************
Branch Name: jialei_gh297_speed_up_gpu_computation_on_EI_gradEI
Ticket(s)/Issue(s): Closes #297

********* TESTING DONE *************
make test
cpplint.py

…_up_gpu_computation_on_EI_gradEI

…_up_gpu_computation_on_EI_gradEI Conflicts: moe/optimal_learning/cpp/gpu/gpp_cuda_math.cu

suntzu86 · 2014-08-05T23:07:56Z

moe/optimal_learning/cpp/gpu/gpp_cuda_math.cu

@@ -134,6 +134,7 @@ __global__ void CudaComputeEIGpu(double const * __restrict__ mu, double const *
  chunk_size = (num_union - 1)/ blockDim.x + 1;
  CudaCopyElements(chunk_size * idx, chunk_size * (idx + 1), num_union,  mu, mu_local);
  __syncthreads();
+  double * normals = &mu_local[num_union];


__restrict__. Also, your other shared_mem double * pointers should be marked restrict too (and const if applicable)

also, please edit the shared memory comments to describe where normals goes.

also have you ever worked with "constant" memory? i'm not sure if that's appropriate for mu and chol_var--yes they are constant, but gpu constant memory has some additional requirements on how a warp of threads accesses the data to get the best performance.

still if it's usable that could reduce some memory pressure

one more: also indicate in docs how the memory is laid out for random and how much memory is required. (i.e., point out that it's sized [num_union][num_threads] so each thread as a block of num_union numbers)

you should mention this stuff in the function's docstring b/c callers have to know how much shared memory to specify when they launch the kernel

two more:

&mu_local[num_union] is the same as mu_local + num_union and the latter is more clear imo

idx is fixed right? Why not set normals = mu_local + num_union + idx * num_union? also again be very specific about the ordering of these matrices in shared memory

suntzu86 · 2014-08-05T23:41:09Z

woohoo speedups!
left you some mostly organizational and doc'ing comments

jialeiwang · 2014-08-06T23:16:38Z

A few things to do next:

check out "constant memory"
reorder grad_chol_var_local to achieve more efficient read, and other similar cases need to optimize in the same way

…_up_gpu_computation_on_EI_gradEI

suntzu86 · 2014-08-08T00:23:52Z

moe/optimal_learning/cpp/gpu/gpp_cuda_math.cu

@@ -99,7 +99,7 @@ __forceinline__ __device__ void CudaCopyElements(int begin, int end, int bound,
 }

 /*!\rst
-  Device code to compute Expected Improvement by Monte-Carlo on GPU
+  GPU kernel function of computing Expected Improvement using Monte-Carlo. 
  \param


there needs to be a newline btwn the last text and param, e.g.,

blah blah blah \param :foo: stuff \output :bar: more stuff

o/w sphinx gets confused

suntzu86 · 2014-08-08T00:35:59Z

couple of docs-only changes. looking good!
see my earlier comment about documenting max problem sizes before shared_mem runs out. (fixed, indicated in docs)
you should ticket your two TODOs above (constant mem and reorganizing grad_chol_var)

…_up_gpu_computation_on_EI_gradEI

suntzu86 · 2014-08-08T21:53:05Z

moe/optimal_learning/cpp/gpu/gpp_cuda_math.cu

+
+  * chol_var_local[num_union][num_union]: copy of chol_var in shared memory for each block
+  * mu_local[num_union]: copy of mu in shared memory for each block
+  * normals[num_union][num_threads]: shared memory for storage of normal random numbers for each block


oops, I goofed with the suggestion here. It should have been:

:chol_var_local[num_union][num_union]: blah blah :mu_local[...]: :etc:

that will format it like the parameter lists; there just isn't a \param shortcut to make a heading.

suntzu86 · 2014-08-08T21:54:00Z

2 more docs-only changes.

Also, could you update CHANGELOG.md?

suntzu86 · 2014-08-08T21:54:44Z

moe/optimal_learning/cpp/gpu/gpp_cuda_math.cu

+  (num_union * num_union + num_union + num_union * num_threads)
+
+  doubles in total in shared memory. The order of the arrays placed in this shared memory is like
+  [chol_var_local, mu_local, normals]


let's put the mathish things and variable names in double backticks:

``(num_union * ...)`` ``[chol_var_local, ...]``

etc

suntzu86 · 2014-08-08T22:20:58Z

shipit

…on_on_EI_gradEI Jialei gh297 speed up gpu computation on ei grad ei

jialeiwang and others added 4 commits July 17, 2014 15:17

put normal random numbers in shared memory

f5e3a14

Merge branch 'master' of github.com:sc932/MOE into jialei_gh297_speed…

f8fc153

…_up_gpu_computation_on_EI_gradEI

update local changes before merge

90b230f

Merge branch 'master' of github.com:sc932/MOE into jialei_gh297_speed…

adaa0c5

…_up_gpu_computation_on_EI_gradEI Conflicts: moe/optimal_learning/cpp/gpu/gpp_cuda_math.cu

suntzu86 reviewed Aug 5, 2014
View reviewed changes

jialeiwang added 2 commits August 6, 2014 19:44

fix w.r.t. eliu's first round of review

2fe0079

Merge branch 'master' of github.com:sc932/MOE into jialei_gh297_speed…

137f82e

…_up_gpu_computation_on_EI_gradEI

suntzu86 reviewed Aug 8, 2014
View reviewed changes

jialeiwang added 3 commits August 8, 2014 16:06

fixes for new comments, docs-only changes

4d41144

Merge branch 'master' of github.com:sc932/MOE into jialei_gh297_speed…

f9a3908

…_up_gpu_computation_on_EI_gradEI

document max problem size before shared_mem runs out

eacea4c

suntzu86 reviewed Aug 8, 2014
View reviewed changes

jialeiwang added 2 commits August 8, 2014 18:10

a bit more fixes of docs, and updated CHANGELOG.md

7f13876

a bit more...

b598989

jialeiwang added a commit that referenced this pull request Aug 8, 2014

Merge pull request #351 from Yelp/jialei_gh297_speed_up_gpu_computati…

1eeb1a4

…on_on_EI_gradEI Jialei gh297 speed up gpu computation on ei grad ei

jialeiwang merged commit 1eeb1a4 into master Aug 8, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jialei gh297 speed up gpu computation on ei grad ei #351

Jialei gh297 speed up gpu computation on ei grad ei #351

jialeiwang commented Aug 1, 2014

suntzu86 Aug 5, 2014

suntzu86 Aug 5, 2014

suntzu86 Aug 5, 2014

suntzu86 Aug 5, 2014

jialeiwang Aug 6, 2014

suntzu86 commented Aug 5, 2014

jialeiwang commented Aug 6, 2014

suntzu86 Aug 8, 2014

jialeiwang Aug 8, 2014

suntzu86 commented Aug 8, 2014

suntzu86 Aug 8, 2014

jialeiwang Aug 8, 2014

suntzu86 commented Aug 8, 2014

suntzu86 Aug 8, 2014

jialeiwang Aug 8, 2014

suntzu86 commented Aug 8, 2014

Jialei gh297 speed up gpu computation on ei grad ei #351

Jialei gh297 speed up gpu computation on ei grad ei #351

Conversation

jialeiwang commented Aug 1, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

suntzu86 commented Aug 5, 2014

jialeiwang commented Aug 6, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

suntzu86 commented Aug 8, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

suntzu86 commented Aug 8, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

suntzu86 commented Aug 8, 2014