vulkan: unpack more values at a time for iquants mat mul #14485

netrunnereve · 2025-07-01T16:48:14Z

This change was taken from @remyoudompheng's #12260 and rebased as the original PR has been abandoned for a while. @remyoudompheng if you'd rather submit this yourself please let me know.

On my RX 470 it's around 10-15% faster.

PR:

  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                    44 runs - 23378.84 us/run -  60.13 GFLOP/run -   2.57 TFLOPS
  MUL_MAT(type_a=iq2_xs,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                     44 runs - 23188.05 us/run -  60.13 GFLOP/run -   2.59 TFLOPS
  MUL_MAT(type_a=iq2_s,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                      38 runs - 27454.82 us/run -  60.13 GFLOP/run -   2.19 TFLOPS
  MUL_MAT(type_a=iq3_xxs,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                    42 runs - 24248.86 us/run -  60.13 GFLOP/run -   2.48 TFLOPS
  MUL_MAT(type_a=iq1_s,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                      44 runs - 23257.50 us/run -  60.13 GFLOP/run -   2.59 TFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                      44 runs - 23611.98 us/run -  60.13 GFLOP/run -   2.55 TFLOPS
  MUL_MAT(type_a=iq4_nl,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                     44 runs - 23516.11 us/run -  60.13 GFLOP/run -   2.56 TFLOPS
  MUL_MAT(type_a=iq3_s,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                      42 runs - 24849.00 us/run -  60.13 GFLOP/run -   2.42 TFLOPS

model	size	params	backend	ngl	test	t/s
llama 8B IQ1_M - 1.75 bpw	2.01 GiB	8.03 B	Vulkan	100	pp512	200.65 ± 0.32
llama 8B IQ2_S - 2.5 bpw	2.56 GiB	8.03 B	Vulkan	100	pp512	200.98 ± 0.13

Master:

  MUL_MAT(type_a=iq2_xxs,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                    40 runs - 25567.17 us/run -  60.13 GFLOP/run -   2.35 TFLOPS
  MUL_MAT(type_a=iq2_xs,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                     38 runs - 26685.89 us/run -  60.13 GFLOP/run -   2.25 TFLOPS
  MUL_MAT(type_a=iq2_s,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                      34 runs - 31151.91 us/run -  60.13 GFLOP/run -   1.93 TFLOPS
  MUL_MAT(type_a=iq3_xxs,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                    38 runs - 26525.08 us/run -  60.13 GFLOP/run -   2.27 TFLOPS
  MUL_MAT(type_a=iq1_s,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                      42 runs - 24953.29 us/run -  60.13 GFLOP/run -   2.41 TFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                      38 runs - 26897.55 us/run -  60.13 GFLOP/run -   2.24 TFLOPS
  MUL_MAT(type_a=iq4_nl,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                     44 runs - 23591.05 us/run -  60.13 GFLOP/run -   2.55 TFLOPS
  MUL_MAT(type_a=iq3_s,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=0):                      38 runs - 27017.13 us/run -  60.13 GFLOP/run -   2.23 TFLOPS

model	size	params	backend	ngl	test	t/s
llama 8B IQ1_M - 1.75 bpw	2.01 GiB	8.03 B	Vulkan	100	pp512	172.03 ± 0.38
llama 8B IQ2_S - 2.5 bpw	2.56 GiB	8.03 B	Vulkan	100	pp512	174.84 ± 0.36

Commit taken from remyoudompheng's PR ggml-org#12260 Co-authored-by: Rémy Oudompheng <remyoudompheng@gmail.com>

vulkan: increase LOAD_VEC_A to 8 (IQ1/IQ2) or 4 (IQ3)

5712c2f

Commit taken from remyoudompheng's PR ggml-org#12260 Co-authored-by: Rémy Oudompheng <remyoudompheng@gmail.com>

github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Jul 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vulkan: unpack more values at a time for iquants mat mul #14485

vulkan: unpack more values at a time for iquants mat mul #14485

netrunnereve commented Jul 1, 2025

Uh oh!

Uh oh!

vulkan: unpack more values at a time for iquants mat mul #14485

Are you sure you want to change the base?

vulkan: unpack more values at a time for iquants mat mul #14485

Conversation

netrunnereve commented Jul 1, 2025

Uh oh!

Uh oh!