Add unit/integration testing #31

casper-hansen · 2023-09-06T17:41:28Z

A nice list of tests that I would like to implement in order to more easily make sure everything works.

test each model architecture by generating 1 token (fused + unfused)
test batched input
test quantization
test CUDA kernels
test multi-GPU
test fusing qkv operations

bdambrosio · 2023-09-06T19:07:33Z

I'll take a shot at some of these. If nothing else I'll learn a lot.
Are you ok with api-level python tests?
I'm esp. interested in multi-GPU, but I'll start w some simpler ones.
Also, any hope of ever getting this to run on T4's (kaggle...). I'd be willing to dive pretty deep, and have the skills, but don't know enough about that level of cuda to know if it's even remotely possible.

casper-hansen · 2023-09-06T19:12:00Z

I would love some help here for implementing the tests. T4 has compute capability 7.5, so it is not compatible with the AWQ CUDA kernel for running the quantized layers as they require 8.0 (Ampere architecture or later).

EDIT: To add support for earlier GPUs, you would have to implement a completely new CUDA kernel because the current one utilizes tensor cores that are 10x faster than CUDA cores. GPUs that are less than 8.0 in compute capability do not have tensor cores (I believe), so it cannot install or run the current CUDA kernel.

bdambrosio · 2023-09-06T19:37:42Z

Ok, will work on tests.

Switching to CUDA core from Tensor cores doesn't sound totally out of the realm, esp since I'm just interested in inference only for that task, but I won't even think about it for a while.
tnx

wanzhenchn · 2023-09-07T02:41:28Z

I would love some help here for implementing the tests. T4 has compute capability 7.5, so it is not compatible with the AWQ CUDA kernel for running the quantized layers as they require 8.0 (Ampere architecture or later).

EDIT: To add support for earlier GPUs, you would have to implement a completely new CUDA kernel because the current one utilizes tensor cores that are 10x faster than CUDA cores. GPUs that are less than 8.0 in compute capability do not have tensor cores (I believe), so it cannot install or run the current CUDA kernel.

@casper-hansen, @bdambrosio
Actually, the T4 GPU also has Tensor Cores (Hardware-Specific), However, its compute capability is 7.5 showed in GPU List.

The real reason that AWQ requires GPU sm_80 or higher lies in the fact that the gemm_cuda_gen.cu kernel uses the '.m16n8k16' feature, which requires GPU architecture sm_80 or higher.

casper-hansen added help wanted Extra attention is needed good first issue Good for newcomers labels Sep 6, 2023

casper-hansen mentioned this issue Sep 6, 2023

📌 AutoAWQ Roadmap #32

Closed

30 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add unit/integration testing #31

Add unit/integration testing #31

casper-hansen commented Sep 6, 2023

bdambrosio commented Sep 6, 2023

casper-hansen commented Sep 6, 2023 •

edited

Loading

bdambrosio commented Sep 6, 2023

wanzhenchn commented Sep 7, 2023 •

edited

Loading

Add unit/integration testing #31

Add unit/integration testing #31

Comments

casper-hansen commented Sep 6, 2023

bdambrosio commented Sep 6, 2023

casper-hansen commented Sep 6, 2023 • edited Loading

bdambrosio commented Sep 6, 2023

wanzhenchn commented Sep 7, 2023 • edited Loading

casper-hansen commented Sep 6, 2023 •

edited

Loading

wanzhenchn commented Sep 7, 2023 •

edited

Loading