Conversation
Greptile SummaryThis PR makes the Qwen3-VL NVFlare example self-contained by vendoring Key changes:
Confidence Score: 3/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Server as FL Server
participant R0 as Client rank-0
participant Rn as Client rank-N
loop FL Round
Server->>R0: "FLModel(params) via flare.receive()"
Note over R0: "Determine exchange mode"
alt "In-memory single-rank (full or LoRA)"
R0->>R0: "round_initial_state_dict = _strip_model_prefix(params)"
R0->>R0: "train(base_hf_id, initial_state_dict, return_state_dict=True)"
R0->>Server: "flare.send(trained params from memory)"
else "Multi-rank full-model checkpoint"
R0->>R0: "save_pretrained(input_model_dir)"
R0-->>Rn: "_dist_barrier"
R0->>Rn: "torchrun train(input_model_dir)"
Rn-->>R0: "DDP training complete"
R0->>R0: "load_state_dict_from_checkpoint(output_model_dir)"
R0->>Server: "flare.send(full params ~4651 MB)"
else "Multi-rank LoRA checkpoint"
R0->>R0: "_save_lora_adapter_for_training(adapter_model.safetensors)"
R0-->>Rn: "_dist_barrier"
R0->>Rn: "torchrun train(adapter_dir, lora_enable=True)"
Rn-->>R0: "DDP training complete"
R0->>R0: "load_state_dict_from_checkpoint(lora_only=True)"
R0->>Server: "flare.send(adapter params ~98 MB)"
end
end
|
|
@greptileai review again |
|
@greptileai review latest changes |
|
@greptileai review the latest version of this PR |
|
/build |
|
@greptileai review |
|
/build |
Fixes # .
Description
Qwen3-VL example: LoRA training fixes
Summary
This PR makes the Qwen3-VL NVFlare example self-contained and stabilizes both full-model and LoRA federated training paths.
Key outcomes:
qwenvltrain/data code.trainable params: 0, empty optimizer groups, grad_fn/backward issues).Changes
1) Self-contained Qwen3-VL example
qwenvl/train/*andqwenvl/data/*into the example.2) LoRA training correctness
train_qwen.py), explicitly enables LoRA params for training.3) Single-process in-memory exchange
WORLD_SIZE=1, client no longer relies on checkpoint round-trip between receive/train/send.4) Multi-rank path performance and safety
adapter_model.safetensors+adapter_config.jsondirectly, removing unnecessary extra base-model load in adapter save helper.5) Optimizer robustness and config alignment
0.0) to avoid mode-dependent mismatch.Testing
isort --check-only,black --check,py_compile) passed.Types of changes
./runtest.sh.