Speed up GPU computation on EI & gradEI #297

jialeiwang · 2014-07-15T21:05:46Z

The original GPU implementation is slow, which is due to both malloc and operating on global memory for normal random numbers. To make the improvement, we should not use malloc for each thread, instead, we create shared memory within each block to hold random numbers, and let threads within that block read/write random numbers stored in shared memory.

The other potential improvement is instead of copying (no_of_blocks * no_of_threads) EI or grad_EI from GPU to CPU and averaging them, we can average them on GPU and then simply copy a single piece of EI or grad_EI back to CPU. This improvement has been proven to be negligible for reasonable large q and p(q=4, p=4). We might need to look back again to see if this improvement is worth to implement for large q,p.

jialeiwang added enhancement labels Jul 15, 2014

jialeiwang self-assigned this Jul 15, 2014

jialeiwang removed their assignment Jul 23, 2014

jialeiwang mentioned this issue Aug 1, 2014

Jialei gh297 speed up gpu computation on ei grad ei #351

Merged

jialeiwang closed this as completed in #351 Aug 8, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up GPU computation on EI & gradEI #297

Speed up GPU computation on EI & gradEI #297

jialeiwang commented Jul 15, 2014

Speed up GPU computation on EI & gradEI #297

Speed up GPU computation on EI & gradEI #297

Comments

jialeiwang commented Jul 15, 2014