This issue was opened automatically by the Test Playbooks workflow after the test quick-train-llamafactory-lora failed on the main branch.
Failure scope
- Playbook:
llama-factory-finetuning
- Test id:
quick-train-llamafactory-lora
- Device:
halo
- Operating system:
windows
- Runner labels:
self-hosted, Windows, halo
- Runner name:
xsj-aimlab-halo-0
- Commit:
950e710579a751d40d80a40f8c2d323930685d25
- Workflow run: https://github.com/amd/playbooks/actions/runs/27079427346
Hardware / OS to use to reproduce
Run the failing test on a machine that matches the runner labels above (OS = windows, device = halo). The repo's self-hosted runners already advertise these labels; if you reproduce locally, use the same OS family and the same AMD device class.
How to dispatch the same test from CI
Re-run only the failing playbook on the same matrix entry by triggering the workflow with the playbook id:
gh workflow run test-playbooks.yml --repo amd/playbooks -f playbook_id=llama-factory-finetuning
The workflow's matrix narrows down to this (device, platform) combination automatically based on the playbook's tested_platforms.
How to run just this test locally
python .github/scripts/run_playbook_tests.py --playbook llama-factory-finetuning --platform windows --device halo
The runner extracts test blocks from playbooks/*/llama-factory-finetuning/README.md (the failing block starts around line 490).
Failing test (verbatim from the README)
- Setup:
venv\Scripts\activate
- Timeout:
3600s
Set-Location -Path "LlamaFactory"
Copy-Item -Path "examples/train_lora/qwen3_lora_sft.yaml" -Destination "examples/train_lora/qwen3_lora_sft_ci.yaml"
$filePath = "examples/train_lora/qwen3_lora_sft_ci.yaml"
(Get-Content -Path $filePath) -replace 'lora_rank: 8', 'lora_rank: 6' | Set-Content -Path $filePath
(Get-Content -Path $filePath) -replace 'bf16:\s*true', 'fp16: true' | Set-Content -Path $filePath
(Get-Content -Path $filePath) -replace 'dataloader_num_workers:\s*4', 'dataloader_num_workers: 0' | Set-Content -Path $filePath
(Get-Content -Path $filePath) -replace 'output_dir: .*', 'output_dir: saves/qwen3_lora_sft_ci' | Set-Content -Path $filePath
(Get-Content -Path $filePath) -replace 'overwrite_output_dir: false', 'overwrite_output_dir: true' | Set-Content -Path $filePath
(Get-Content -Path $filePath) -replace 'per_device_train_batch_size: .*', 'per_device_train_batch_size: 1' | Set-Content -Path $filePath
(Get-Content -Path $filePath) -replace 'gradient_accumulation_steps: .*', 'gradient_accumulation_steps: 1' | Set-Content -Path $filePath
(Get-Content -Path $filePath) -replace 'num_train_epochs: .*', 'num_train_epochs: 1' | Set-Content -Path $filePath
(Get-Content -Path $filePath) -replace 'logging_steps: .*', 'logging_steps: 1' | Set-Content -Path $filePath
(Get-Content -Path $filePath) -replace 'save_steps: .*', 'save_steps: 5' | Set-Content -Path $filePath
llamafactory-cli train examples/train_lora/qwen3_lora_sft_ci.yaml
Result
- Exit code:
-1
- Runner error: Test timed out after 3600 seconds
This issue is opened and deduplicated by .github/scripts/create_failure_issues.py. Close it once the failure is fixed; subsequent failures with the same scope will reopen a fresh issue.
This issue was opened automatically by the Test Playbooks workflow after the test
quick-train-llamafactory-lorafailed on themainbranch.Failure scope
llama-factory-finetuningquick-train-llamafactory-lorahalowindowsself-hosted,Windows,haloxsj-aimlab-halo-0950e710579a751d40d80a40f8c2d323930685d25Hardware / OS to use to reproduce
Run the failing test on a machine that matches the runner labels above (OS =
windows, device =halo). The repo's self-hosted runners already advertise these labels; if you reproduce locally, use the same OS family and the same AMD device class.How to dispatch the same test from CI
Re-run only the failing playbook on the same matrix entry by triggering the workflow with the playbook id:
The workflow's matrix narrows down to this
(device, platform)combination automatically based on the playbook'stested_platforms.How to run just this test locally
The runner extracts test blocks from
playbooks/*/llama-factory-finetuning/README.md(the failing block starts around line 490).Failing test (verbatim from the README)
venv\Scripts\activate3600sResult
-1This issue is opened and deduplicated by
.github/scripts/create_failure_issues.py. Close it once the failure is fixed; subsequent failures with the same scope will reopen a fresh issue.