nasbackup.sh: verify backup integrity with qemu-img check#12845
nasbackup.sh: verify backup integrity with qemu-img check#12845jmsperu wants to merge 1 commit intoapache:4.20from
Conversation
Add verify_backup() that runs qemu-img check on all qcow2 files after backup completes. Catches corrupt or truncated backup files (e.g. from NFS I/O errors or storage full) before reporting success. Applied to both running and stopped VM backup paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## 4.20 #12845 +/- ##
=========================================
Coverage 16.24% 16.24%
- Complexity 13411 13412 +1
=========================================
Files 5664 5664
Lines 500463 500463
Branches 60779 60779
=========================================
+ Hits 81308 81323 +15
+ Misses 410059 410051 -8
+ Partials 9096 9089 -7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR strengthens KVM NAS backup reliability by adding a post-backup integrity verification step (qemu-img check) so corrupted or truncated qcow2 backups are detected before the backup is reported as successful to CloudStack.
Changes:
- Added
verify_backup()to validate all*.qcow2files in the backup destination usingqemu-img check. - Invoked verification after both running-VM (push) and stopped-VM (
qemu-img convert) backup flows.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| verify_backup() { | ||
| local backup_dir="$1" | ||
| local failed=0 | ||
| for img in "$backup_dir"/*.qcow2; do | ||
| [[ -f "$img" ]] || continue | ||
| if ! qemu-img check "$img" > /dev/null 2>&1; then | ||
| echo "Backup verification failed for $img" |
| if ! qemu-img check "$img" > /dev/null 2>&1; then | ||
| echo "Backup verification failed for $img" | ||
| log -ne "Backup verification FAILED: $img" |
| for img in "$backup_dir"/*.qcow2; do | ||
| [[ -f "$img" ]] || continue | ||
| if ! qemu-img check "$img" > /dev/null 2>&1; then |
… integrity check Adds four optional features to NAS backup operations, configurable at zone scope via CloudStack global settings: - Compression (-c): qcow2 internal compression of backup files Config: nas.backup.compression.enabled (default: false) - LUKS Encryption (-e): encrypt backup files at rest using qemu-img Config: nas.backup.encryption.enabled (default: false) Config: nas.backup.encryption.passphrase (Secure category) - Bandwidth Throttle (-b): limit backup I/O bandwidth via virsh blockjob for running VMs or qemu-img -r for stopped VMs Config: nas.backup.bandwidth.limit.mbps (default: 0/unlimited) - Integrity Check (--verify): qemu-img check after backup creation Config: nas.backup.integrity.check (default: false) All features are disabled by default and fully backward compatible. Settings are read from zone-scoped ConfigKeys in NASBackupProvider, passed to the KVM agent via TakeBackupCommand details map, and translated to nasbackup.sh CLI flags in LibvirtTakeBackupCommandWrapper. Changes: - nasbackup.sh: add -c, -b, -e, --verify flags with encrypt_backup() and verify_backup() helper functions - TakeBackupCommand.java: add details map for passing config to agent - NASBackupProvider.java: add 5 ConfigKeys, populate command details - LibvirtTakeBackupCommandWrapper.java: extract details, build CLI args, handle passphrase temp file lifecycle Combines and supersedes PRs apache#12844, apache#12846, apache#12848, apache#12845
Summary
verify_backup()function that runsqemu-img checkon all qcow2 backup files after backup completesMotivation
Without verification, a backup that was silently corrupted by NFS errors or disk full conditions is reported as successful. The admin only discovers the corruption when attempting to restore — which is the worst possible time.
qemu-img checkis fast (reads metadata only, not full data) and catches structural corruption.Test plan
qemu-img checkpasses in agent log