Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add unit/integration testing #31

Open
casper-hansen opened this issue Sep 6, 2023 · 4 comments
Open

Add unit/integration testing #31

casper-hansen opened this issue Sep 6, 2023 · 4 comments
Labels
good first issue Good for newcomers help wanted Extra attention is needed

Comments

@casper-hansen
Copy link
Owner

A nice list of tests that I would like to implement in order to more easily make sure everything works.

  • test each model architecture by generating 1 token (fused + unfused)
  • test batched input
  • test quantization
  • test CUDA kernels
  • test multi-GPU
  • test fusing qkv operations
@casper-hansen casper-hansen added help wanted Extra attention is needed good first issue Good for newcomers labels Sep 6, 2023
@casper-hansen casper-hansen mentioned this issue Sep 6, 2023
30 tasks
@bdambrosio
Copy link

I'll take a shot at some of these. If nothing else I'll learn a lot.
Are you ok with api-level python tests?
I'm esp. interested in multi-GPU, but I'll start w some simpler ones.
Also, any hope of ever getting this to run on T4's (kaggle...). I'd be willing to dive pretty deep, and have the skills, but don't know enough about that level of cuda to know if it's even remotely possible.

@casper-hansen
Copy link
Owner Author

casper-hansen commented Sep 6, 2023

I would love some help here for implementing the tests. T4 has compute capability 7.5, so it is not compatible with the AWQ CUDA kernel for running the quantized layers as they require 8.0 (Ampere architecture or later).

EDIT: To add support for earlier GPUs, you would have to implement a completely new CUDA kernel because the current one utilizes tensor cores that are 10x faster than CUDA cores. GPUs that are less than 8.0 in compute capability do not have tensor cores (I believe), so it cannot install or run the current CUDA kernel.

@bdambrosio
Copy link

Ok, will work on tests.

Switching to CUDA core from Tensor cores doesn't sound totally out of the realm, esp since I'm just interested in inference only for that task, but I won't even think about it for a while.
tnx

@wanzhenchn
Copy link

wanzhenchn commented Sep 7, 2023

I would love some help here for implementing the tests. T4 has compute capability 7.5, so it is not compatible with the AWQ CUDA kernel for running the quantized layers as they require 8.0 (Ampere architecture or later).

EDIT: To add support for earlier GPUs, you would have to implement a completely new CUDA kernel because the current one utilizes tensor cores that are 10x faster than CUDA cores. GPUs that are less than 8.0 in compute capability do not have tensor cores (I believe), so it cannot install or run the current CUDA kernel.

@casper-hansen, @bdambrosio
Actually, the T4 GPU also has Tensor Cores (Hardware-Specific), However, its compute capability is 7.5 showed in GPU List.

The real reason that AWQ requires GPU sm_80 or higher lies in the fact that the gemm_cuda_gen.cu kernel uses the '.m16n8k16' feature, which requires GPU architecture sm_80 or higher.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants