Accelerate model compression #1274

denghuilu · 2021-11-09T13:47:55Z

Main changes:

Due to the sorted environment matrix, during the execution of model compression process, we can use this rule to reuse GPU kernel registers. Thus accelerate the tabulate_fusion_grad_se_a kernel by a factor of 1.1.
Abstract part of the code.

A rocm implementation will be added later.

njzjz · 2021-11-09T23:02:30Z

#1275 should fix the compilation error.

Merge `source/lib/src/cuda` and `source/lib/src/rocm` into `source/lib/src/gpu`. - Define macros `gpuGetLastError`, `gpuDeviceSynchronize`, `gpuMemcpy`, `gpuMemcpyDeviceToHost`, `gpuMemcpyHostToDevice`, and `gpuMemset` to make them available for both CUDA and ROCm. - Use `<<< >>> syntax` for both CUDA and ROCm. Per ROCm/HIP@cf78d85, it has been supported in HIP since 2018. - Fix several int const numbers that should be double or float. - For tabulate: - Fix `WARP_SIZE` for ROCm. Per pytorch/pytorch#64302, WARP_SIZE can be 32 or 64, so it should not be hardcoded to 64. - Add `GpuShuffleSync`. Per ROCm/HIP#1491, `__shfl_sync` is not supported by HIP. - After merging the code, #1274 should also work for ROCm. - Use the same `ii` for #830 and #2357. Although both of them work, `ii` has different meanings in these two PRs, but now it should be the same. - However, `ii` in `tabulate_fusion_se_a_fifth_order_polynomial` (rocm) added by #2532 is wrong. After merging the codes, it should be corrected. - Optimization in #830 was not applied to ROCm. - `__syncwarp` is not supported by ROCm. - After merging the code, #2661 will be applied to ROCm. Although TF ROCm stream is still blocking (https://github.com/tensorflow/tensorflow/blob/9d1262082e761cd85d6726bcbdfdef331d6d72c6/tensorflow/compiler/xla/stream_executor/rocm/rocm_driver.cc#L566), we don't know whether it will change to non-blocking. - There are several other differences between CUDA and ROCm. --------- Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>

accelerate model compression

ad4c118

denghuilu requested a review from wanghan-iapcm November 9, 2021 13:48

remove redundancy

da64eaa

wanghan-iapcm approved these changes Nov 10, 2021

View reviewed changes

wanghan-iapcm merged commit 36275c8 into deepmodeling:devel Nov 10, 2021

denghuilu mentioned this pull request Mar 16, 2022

fix model compression bug of nan output #1575

Merged

njzjz mentioned this pull request Sep 19, 2023

merge cuda and rocm files #2844

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accelerate model compression #1274

Accelerate model compression #1274

denghuilu commented Nov 9, 2021

njzjz commented Nov 9, 2021

Accelerate model compression #1274

Accelerate model compression #1274

Conversation

denghuilu commented Nov 9, 2021

Main changes:

njzjz commented Nov 9, 2021