Skip to content

Conversation

@Qubitium
Copy link
Collaborator

@Qubitium Qubitium commented Oct 1, 2025

@avtc Refractored code. May resolve your Q.to() crash.

Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
@Qubitium Qubitium merged commit ec31f3d into main Oct 2, 2025
4 checks passed
@Qubitium Qubitium deleted the fix-q-to branch October 2, 2025 00:45
@avtc
Copy link
Contributor

avtc commented Oct 2, 2025

Confirmed, this pr fixed issue with Q.to()

@Qubitium
Copy link
Collaborator Author

Qubitium commented Oct 2, 2025

Confirmed, this pr fixed issue with Q.to()

Finally! A throne in my backside finally plucked. You can pull main for a gptq quantization quality fix. You can now properly increase batch_size to as high as your gpu can manage to increase throughput at the expensive of more vram usage.

Each gpu/model pair has a sweet spot toggle for batch_size. There is no one size fits all. More batch sometimes is slower but not always.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants