Skip to content

Conversation

@Qubitium
Copy link
Collaborator

@Qubitium Qubitium commented Oct 1, 2025

No description provided.

Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
@Qubitium Qubitium changed the title Replicate Replicate + Turtle state fix Oct 1, 2025
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
@Qubitium Qubitium merged commit 3da0344 into main Oct 1, 2025
5 checks passed
@Qubitium Qubitium deleted the replicate branch October 1, 2025 13:13
@Qubitium
Copy link
Collaborator Author

Qubitium commented Oct 1, 2025

@avtc so to confirm this PR fixed your meta load issues right? But still needed the Q.to() device lock to avoid the cuda memory errors.

Btw, this PR also sped up MoE quantization by 5% or so.

@avtc
Copy link
Contributor

avtc commented Oct 1, 2025

@avtc so to confirm this PR fixed your meta load issues right? But still needed the Q.to() device lock to avoid the cuda memory errors.

Yep, fixed, Yep still need lock Q.to(). Tried with CUDA_LAUNCH_BLOCKING=1 without lock Q.to() but deadlocked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants