Skip to content

vulkan: unpack more values at a time for iquants mat mul #14485

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

netrunnereve
Copy link
Collaborator

This change was taken from @remyoudompheng's #12260 and rebased as the original PR has been abandoned for a while. @remyoudompheng if you'd rather submit this yourself please let me know.

On my RX 470 it's around 10-15% faster.

PR:

  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                    44 runs - 23378.84 us/run -  60.13 GFLOP/run -   2.57 TFLOPS
  MUL_MAT(type_a=iq2_xs,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                     44 runs - 23188.05 us/run -  60.13 GFLOP/run -   2.59 TFLOPS
  MUL_MAT(type_a=iq2_s,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                      38 runs - 27454.82 us/run -  60.13 GFLOP/run -   2.19 TFLOPS
  MUL_MAT(type_a=iq3_xxs,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                    42 runs - 24248.86 us/run -  60.13 GFLOP/run -   2.48 TFLOPS
  MUL_MAT(type_a=iq1_s,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                      44 runs - 23257.50 us/run -  60.13 GFLOP/run -   2.59 TFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                      44 runs - 23611.98 us/run -  60.13 GFLOP/run -   2.55 TFLOPS
  MUL_MAT(type_a=iq4_nl,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                     44 runs - 23516.11 us/run -  60.13 GFLOP/run -   2.56 TFLOPS
  MUL_MAT(type_a=iq3_s,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                      42 runs - 24849.00 us/run -  60.13 GFLOP/run -   2.42 TFLOPS
model size params backend ngl test t/s
llama 8B IQ1_M - 1.75 bpw 2.01 GiB 8.03 B Vulkan 100 pp512 200.65 ± 0.32
llama 8B IQ2_S - 2.5 bpw 2.56 GiB 8.03 B Vulkan 100 pp512 200.98 ± 0.13

Master:

  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                    40 runs - 25567.17 us/run -  60.13 GFLOP/run -   2.35 TFLOPS
  MUL_MAT(type_a=iq2_xs,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                     38 runs - 26685.89 us/run -  60.13 GFLOP/run -   2.25 TFLOPS
  MUL_MAT(type_a=iq2_s,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                      34 runs - 31151.91 us/run -  60.13 GFLOP/run -   1.93 TFLOPS
  MUL_MAT(type_a=iq3_xxs,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                    38 runs - 26525.08 us/run -  60.13 GFLOP/run -   2.27 TFLOPS
  MUL_MAT(type_a=iq1_s,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                      42 runs - 24953.29 us/run -  60.13 GFLOP/run -   2.41 TFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                      38 runs - 26897.55 us/run -  60.13 GFLOP/run -   2.24 TFLOPS
  MUL_MAT(type_a=iq4_nl,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                     44 runs - 23591.05 us/run -  60.13 GFLOP/run -   2.55 TFLOPS
  MUL_MAT(type_a=iq3_s,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                      38 runs - 27017.13 us/run -  60.13 GFLOP/run -   2.23 TFLOPS
model size params backend ngl test t/s
llama 8B IQ1_M - 1.75 bpw 2.01 GiB 8.03 B Vulkan 100 pp512 172.03 ± 0.38
llama 8B IQ2_S - 2.5 bpw 2.56 GiB 8.03 B Vulkan 100 pp512 174.84 ± 0.36

Commit taken from remyoudompheng's PR ggml-org#12260

Co-authored-by: Rémy Oudompheng <remyoudompheng@gmail.com>
@github-actions github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Jul 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant