Skip to content

Conversation

@avtc
Copy link
Contributor

@avtc avtc commented Nov 4, 2025

@Qubitium Hi!
I was not able to proceed without second retry, so adjusting back. This is the first MoE layer with experts for GLM-4.5-Air with 1534 samples, "balanced" strategy, on 8 x 3090.

image

@Qubitium
Copy link
Collaborator

Qubitium commented Nov 4, 2025

@avtc Adjust first delay to 0.25s (250ms) and second delay to 0.75s. Same total delay but makes first retry 2x as fast which may satisfy 99% of the cases.

See if you can adjust 0.25 even lower and increase seoncond retry timeout to minimize first retry so it satisify 90% of the cases. Like 0.125 first retry and 0.5s 2nd retry etc.

Run the values and check maybe 4-5 layers.

@avtc
Copy link
Contributor Author

avtc commented Nov 4, 2025

will check tomorrow, I think it will also work

@Qubitium Qubitium merged commit f4984c8 into ModelCloud:main Nov 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants