Skip to content

Conversation

@Qubitium
Copy link
Collaborator

@Qubitium Qubitium commented Apr 30, 2025

Refractor multi-gpu quantization with python threads. Compare to previous efforts:

  • 2-3x less memory usage if you assign multiple GPUS up to 3 for Llama/Qwen style or 5 (tested) for MoE style.
  • Performance hit is much reduced verious pervious method. Small model only saw 1% slowdown.
  • This is not tensor-parallel so it will not make things faster, but actually around 1-5% slower dependng on model, gpu, and gpu to gpu memory latency.

With advent of python 3.13-3.14 and pytorch 2.7 concurrent gil context, it is possible to get Nx GPU improvement to MoE quantiation for next major refractor if we can cleanly decouple the thread/module code with multiple python/torch GIL context without any usage of tensor-parallel which I want to avoid at all-costs for now.

Qubitium added 11 commits April 30, 2025 13:50
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
@Qubitium Qubitium marked this pull request as ready for review May 1, 2025 08:46
Qubitium added 8 commits May 1, 2025 09:21
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
@Qubitium Qubitium merged commit e5ffe12 into main May 2, 2025
5 checks passed
@Qubitium Qubitium deleted the process-threads branch May 2, 2025 05:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants