Skip to content

Convert command's timeout for snapshots commands#13210

Open
erikbocks wants to merge 1 commit into
apache:4.20from
scclouds:fix-snapshot-timeout
Open

Convert command's timeout for snapshots commands#13210
erikbocks wants to merge 1 commit into
apache:4.20from
scclouds:fix-snapshot-timeout

Conversation

@erikbocks
Copy link
Copy Markdown
Collaborator

@erikbocks erikbocks commented May 21, 2026

Description

The #9659 PR introduced the commands.timeout global configuration for granular command timeout definition. If a operator wishes to increase snapshot related timeouts, he could increase the CreateObjectCommand timeout in the commands.timeout configuration. The defined timeouts are set in seconds.

However, normal and incremental snapshots creation flows use qemu-img script to execute some of the necessary operations. These scripts accept timeouts as milliseconds, but receive them as seconds from the CreateObjectCommand. This leads to incorrect timeouts.

Therefore, this PR converts the CreateObjectCommand seconds to milliseconds before passing them to qemu-img scripts.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • Build/CI
  • Test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

First, I set commands.timeout configuration value to CreateObjectCommand=300. This defines this command timeout to 5 minutes. Then, I tried to create a full volume snapshot. The process failed due to a timeout.

Command timeout log
2026-05-21 15:04:44,122 DEBUG [c.c.a.m.ClusteredAgentManagerImpl] (Work-Job-Executor-13:[ctx-fe95241a, job-77/job-78, ctx-11fcb589]) (logid:d462b71a) Wait time setting on org.apache.cloudstack.storage.command.CreateObjectCommand is 300 seconds
Failure log
2026-05-21 15:04:56,776 DEBUG [utils.script.Script] (AgentRequest-Handler-1:[]) (logid:d462b71a) Executing command [qemu-img convert -O qcow2 -U --image-opts driver=qcow2,file.filename=/mnt/4244ccc6-8f06-3f9c-87bc-a7ba8c1caae9/5331f95a-ec63-4a79-a28f-7642ed095875 /mnt/507fca2c-a424-3ffc-b5f6-f7fe9e7c17e7/snapshots/2/4/2ad278b7-e1fa-4331-ae36-4ba67db4097b ].
2026-05-21 15:04:57,080 WARN  [utils.script.Script] (AgentRequest-Handler-1:[]) (logid:d462b71a) Process [14516] for command [qemu-img convert -O qcow2 -U --image-opts driver=qcow2,file.filename=/mnt/4244ccc6-8f06-3f9c-87bc-a7ba8c1caae9/5331f95a-ec63-4a79-a28f-7642ed095875 /mnt/507fca2c-a424-3ffc-b5f6-f7fe9e7c17e7/snapshots/2/4/2ad278b7-e1fa-4331-ae36-4ba67db4097b ] timed out. Output is [].

I installed the packages with the timeout conversion to my local environment, then tried to create another snapshot. Using a debug breakpoint, I validated that the qemu-img script instance had the converted timeout. The new snapshot was created successfully,


When trying to create a incremental snapshot, the same error occurred. Then, I installed the packages with the necessary changes and tried to create another incremental snapshot. This process also used debug breakpoints to validate the scripts timeout.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 21, 2026

Codecov Report

❌ Patch coverage is 16.66667% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 18.09%. Comparing base (1fe486f) to head (c60546b).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...ud/hypervisor/kvm/storage/KVMStorageProcessor.java 16.66% 5 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #13210      +/-   ##
============================================
- Coverage     18.09%   18.09%   -0.01%     
- Complexity    16732    16733       +1     
============================================
  Files          6037     6037              
  Lines        542780   542812      +32     
  Branches      66464    66471       +7     
============================================
  Hits          98233    98233              
- Misses       433499   433531      +32     
  Partials      11048    11048              
Flag Coverage Δ
uitests 3.51% <ø> (ø)
unittests 19.26% <16.66%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Member

@winterhazel winterhazel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good (did not test).

@erikbocks does this apply for 4.20/4.22? If yes, can you rebase?

@erikbocks
Copy link
Copy Markdown
Collaborator Author

@winterhazel

This change can be applied to both 4.20 and 4.22. However, the other changes can only be applied to the 4.22 branch, as they are dependent on code introduced in #9270.

I will put the 4.22 changes to another PR, and rebase this one to the 4.20 branch.

@winterhazel
Copy link
Copy Markdown
Member

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@winterhazel a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link
Copy Markdown

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 17972

@DaanHoogland
Copy link
Copy Markdown
Contributor

could this be combined with #13212 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants