Skip to content

Conversation

@Qubitium
Copy link
Collaborator

@Qubitium Qubitium commented Sep 17, 2025

Ugly ugly hacks. Do not read this PR for your own sanity. It will be cleaned up later. Lots of debug, random var names, etc. Seriously, don't read the commits.

Current progress:

  • 23.9% cpu memory saving for the entire end-to-end quantization + packing process.
  • 27% cpu memory saving if we only count the quantization stage without packing. Packing introduce more cpu ram usage which can be fixed in future commits.

Target: > 75%

Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
@Qubitium Qubitium marked this pull request as draft September 17, 2025 13:46
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
@Qubitium
Copy link
Collaborator Author

6.4 GB to 4.1 GB quant ( excluding packing ) => 35.9% saving...

Pizza time!

Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
@Qubitium
Copy link
Collaborator Author

6.4 GB to 2.4 GB ( excluding packing ) => 62.5% saving...

Ice cream?

Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
@Qubitium Qubitium marked this pull request as ready for review September 18, 2025 08:00
@Qubitium Qubitium marked this pull request as draft September 18, 2025 08:20
LRL-ModelCloud and others added 3 commits September 18, 2025 16:46
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
@Qubitium Qubitium marked this pull request as ready for review September 19, 2025 00:21
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
@Qubitium
Copy link
Collaborator Author

6.4 GB to 1.7 GB ( excluding packing ) => 73.5% saving...

Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
@Qubitium Qubitium merged commit 785ea82 into main Sep 19, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants