Skip to content

Conversation

@Qubitium
Copy link
Collaborator

@Qubitium Qubitium commented Sep 26, 2025

@nbasyl There is two issues with EoRA and main (lots of refractor and multi-threading changes in prep for pending multi-gpu data parallel inference). You can track this here. I will merge once test_quant_and_eora passes ci test.

  • module offloading to disk broke eora processor compat
  • multi-gpu startup will cause eora to fail/crash.

Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
…uted on primary processor lifecycle

Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
@Qubitium Qubitium marked this pull request as ready for review September 26, 2025 13:43
@Qubitium Qubitium marked this pull request as draft September 26, 2025 13:46
@Qubitium
Copy link
Collaborator Author

Execution bug/order fixed due to threading scope issues. Offloading bug also fixed. But there is a secondary offloadig bug that appears to breaking lora module save. I will check this a bit later.

@Qubitium
Copy link
Collaborator Author

Qubitium commented Sep 26, 2025

Found the save bug. Will fix it soon. The eora needs to join the offload disk life cycle. Right now main enjoys 75%+ cpu memory saving during quant with inline packing and offloading to disk right after processing. eora is missing this bit of logic so need to fix this.

Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
Signed-off-by: Qubitium <Qubitium@modelcloud.ai>
@Qubitium
Copy link
Collaborator Author

merging. ci runs but the quality is bad (not just eora). But other ci tests don't have this quality issue. not sure what's going on but looks to be this ci test specifically.

@Qubitium Qubitium marked this pull request as ready for review September 26, 2025 16:35
@Qubitium Qubitium merged commit f586c8b into main Sep 26, 2025
5 checks passed
@Qubitium Qubitium deleted the fix-eora-compat branch September 26, 2025 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants