Skip to content

fix(linstor): surface ambiguous template fallbacks and legacy orphan cleanup#13078

Open
jmsperu wants to merge 1 commit intoapache:mainfrom
jmsperu:feature/linstor-template-hygiene
Open

fix(linstor): surface ambiguous template fallbacks and legacy orphan cleanup#13078
jmsperu wants to merge 1 commit intoapache:mainfrom
jmsperu:feature/linstor-template-hygiene

Conversation

@jmsperu
Copy link
Copy Markdown
Contributor

@jmsperu jmsperu commented Apr 28, 2026

Summary

Two small visibility improvements to the LINSTOR template-handling path. Both preserve existing behaviour and only add log output that surfaces conditions operators have currently no easy way to detect.

1. `LinstorUtil.findResourceDefinition` — log on ambiguous fallback

When clone-from-template runs, the method scans for a resource whose name starts with the template prefix AND whose aux properties include `_cs-template-for-`. If no exact-property match exists, the method falls back to the first matching resource by name and returns it silently.

In setups with multiple resource groups on the same controller (or with legacy templates cached before the ref-count convention was added), this fallback can return a template that belongs to the wrong resource group, and the clone produces an unexpected result.

This change keeps the fallback behaviour but logs a WARN naming:

  • the requested resource name and resource group
  • the fallback resource that was chosen
  • the actual `_cs-template-for-*` aux properties present on the fallback (so operators can see what RGs do claim it)

2. `LinstorStorageAdaptor.deRefOrDeleteResource` — log on legacy-template orphan cleanup

The ref-count branch already deletes resources that have zero remaining `_cs-template-for-` aux properties. Two conditions reach this branch:

  • a normal resource (no template-for properties expected) — already commented in code
  • a legacy template with no template-for properties (predates the ref-count convention)

Currently both look identical in the logs. Operators upgrading from older versions can't tell how many orphan legacy templates were cleaned up.

This change logs an INFO line in the second case, identifying the resource as a legacy template and naming the resource group context. Behaviour unchanged.

Test plan

  • CI build + unit tests
  • Manual: deploy a VM from template in a single-RG setup — no new logs
  • Manual: deploy a VM from template in a multi-RG setup where the template lacks the exact aux property — observe the new WARN
  • Manual: evict a legacy template (no aux properties) — observe the new INFO line during cleanup

Why three small PRs

Per the suggestion from earlier review, splitting LINSTOR plugin improvements into focused PRs:

All three target the LINSTOR primary-storage plugin and can be reviewed independently.

…e orphans

Two small visibility improvements that make existing template
behaviour easier to audit, especially after upgrading from versions
that predated the ref-count convention.

LinstorUtil.findResourceDefinition
  When no resource has the exact "_cs-template-for-<rscGrpName>" Aux
  property, the method silently returned the first resource whose name
  starts with the requested prefix. With multiple resource groups on a
  single controller, this can route a clone to the wrong template. Now
  logs a WARN naming the requested rscGrpName, the fallback resource,
  and the actual aux properties present. Behaviour unchanged: still
  returns the first match. Operators can act on the warning by setting
  the property explicitly or removing the unrelated definition.

LinstorStorageAdaptor.deRefOrDeleteResource
  When deleting a resource that has zero `_cs-template-for-` aux
  properties AND whose name starts with the template-name prefix the
  caller is acting on, log an INFO line. These are legacy templates
  cached before the ref-count convention was introduced — they get
  picked up by the existing "if expectedProps == 0" branch and
  deleted. Surfacing them lets operators see how many orphans existed
  at upgrade time and confirm the cleanup happened.
@sureshanaparti
Copy link
Copy Markdown
Contributor

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves operator visibility in the LINSTOR template-handling path by adding targeted log lines around (1) ambiguous template selection fallback and (2) legacy/orphan template cleanup, while intending to preserve existing behavior.

Changes:

  • Add a WARN log in LinstorUtil.findResourceDefinition when falling back to the first name-prefix match due to missing expected _cs-template-for-<rscGrp> aux property.
  • Add an INFO log in LinstorStorageAdaptor.deRefOrDeleteResource intended to surface deletion of legacy templates that have zero _cs-template-for- aux properties.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
plugins/storage/volume/linstor/src/main/java/org/apache/cloudstack/storage/datastore/util/LinstorUtil.java Adds WARN logging to make ambiguous template fallback selection visible (includes chosen fallback and related aux keys).
plugins/storage/volume/linstor/src/main/java/com/cloud/hypervisor/kvm/storage/LinstorStorageAdaptor.java Adds INFO logging intended to distinguish legacy-template orphan cleanup during deref/delete logic.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +515 to +524
// Surface the case where a resource that LOOKS like a template (resource name
// starts with the requested prefix) has zero `_cs-template-for-` aux properties
// even though we never decremented one — that's a legacy template predating the
// ref-count convention. Logging it before deletion lets operators audit how
// many such orphans existed at upgrade time.
if (expectedProps == 0 && rd.getName().toLowerCase().startsWith(rscName.toLowerCase())) {
logger.info("Linstor: deleting resource {} which has no _cs-template-for- aux properties " +
"(legacy template predating the ref-count convention, or a stale orphan). " +
"Resource group context: {}", rd.getName(), rscGrpName);
}
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new INFO log for “legacy template predating the ref-count convention” will currently fire for all non-template resource deletions too. existingRDs is already built via getRDListStartingWith(api, rscName), so rd.getName().toLowerCase().startsWith(rscName.toLowerCase()) is always true; whenever expectedProps == 0 (normal volume/resource with no _cs-template-for- props), this will emit the legacy-template message and add significant log noise/misleading attribution. Tighten the condition so it only matches the intended legacy-template-orphan case (e.g., by using a reliable template discriminator, passing an isTemplate flag from the caller, or checking for a template-specific naming/property convention), and avoid logging for ordinary resource deletions.

Suggested change
// Surface the case where a resource that LOOKS like a template (resource name
// starts with the requested prefix) has zero `_cs-template-for-` aux properties
// even though we never decremented one — that's a legacy template predating the
// ref-count convention. Logging it before deletion lets operators audit how
// many such orphans existed at upgrade time.
if (expectedProps == 0 && rd.getName().toLowerCase().startsWith(rscName.toLowerCase())) {
logger.info("Linstor: deleting resource {} which has no _cs-template-for- aux properties " +
"(legacy template predating the ref-count convention, or a stale orphan). " +
"Resource group context: {}", rd.getName(), rscGrpName);
}

Copilot uses AI. Check for mistakes.
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 29, 2026

Codecov Report

❌ Patch coverage is 0% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 18.02%. Comparing base (6f4445c) to head (70eed01).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
...cloudstack/storage/datastore/util/LinstorUtil.java 0.00% 10 Missing ⚠️
.../hypervisor/kvm/storage/LinstorStorageAdaptor.java 0.00% 6 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #13078      +/-   ##
============================================
- Coverage     18.02%   18.02%   -0.01%     
+ Complexity    16621    16620       -1     
============================================
  Files          6029     6029              
  Lines        542184   542194      +10     
  Branches      66451    66452       +1     
============================================
- Hits          97740    97735       -5     
- Misses       433428   433442      +14     
- Partials      11016    11017       +1     
Flag Coverage Δ
uitests 3.52% <ø> (ø)
unittests 19.18% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@blueorangutan
Copy link
Copy Markdown

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 17649

@DaanHoogland
Copy link
Copy Markdown
Contributor

@rp- , will you have a look at this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants