Skip to content

THREESCALE-14651 Fix HPA reconciler to skip API delete calls when HPA already absent#1175

Open
urbanikb wants to merge 1 commit into
3scale:masterfrom
urbanikb:THREESCALE-14651
Open

THREESCALE-14651 Fix HPA reconciler to skip API delete calls when HPA already absent#1175
urbanikb wants to merge 1 commit into
3scale:masterfrom
urbanikb:THREESCALE-14651

Conversation

@urbanikb
Copy link
Copy Markdown
Contributor

@urbanikb urbanikb commented May 7, 2026

Summary

Fixes THREESCALE-14651: once an HPA was successfully deleted, ReconcileHpa() was calling DeleteResource() directly on every subsequent reconcile cycle, hitting a 404 indefinitely.

  • Replace DeleteResource() with TagObjectToDelete + ReconcileResource, which does a cache-backed Get first and is a no-op when the object is already absent — consistent with how PDB and monitoring resources are handled
  • Affects all three HPA targets: backend-listener, backend-worker, apicast-production
  • Add table-driven tests covering enabled, disabled, and async-disable annotation scenarios for all three targets

Test plan

  • Unit tests pass (make test-unit)
  • Manually verified on fyre cluster: disabled all three HPAs via spec.backend.listenerSpec.hpa, spec.backend.workerSpec.hpa, spec.apicast.productionSpec.hpa — no repeated delete attempts in operator logs after HPAs removed

Manual testing instructions

Tested with following initial setup:

export NAMESPACE=3scale-test
make NAMESPACE=$NAMESPACE cluster/prepare/local

cat << EOF | oc create -f -
kind: Secret
apiVersion: v1
metadata:
  name: s3-credentials
  namespace: $NAMESPACE
data:
  AWS_ACCESS_KEY_ID: c29tZXRoaW5nCg==
  AWS_BUCKET: c29tZXRoaW5nCg==
  AWS_REGION: dXMtd2VzdC0xCg==
  AWS_SECRET_ACCESS_KEY: c29tZXRoaW5nCg==
type: Opaque
EOF

DOMAIN=$(oc get routes console -n openshift-console -o json | jq -r '.status.ingress[0].routerCanonicalHostname' | sed 's/router-default.//')
cat << EOF | oc create -f -
kind: APIManager
apiVersion: apps.3scale.net/v1alpha1
metadata:
  name: 3scale
  namespace: $NAMESPACE
spec:
  wildcardDomain: $DOMAIN
  system:
    fileStorage:
      simpleStorageService:
        configurationSecretRef:
          name: s3-credentials
  externalComponents:
    backend:
      redis: true
    system:
      database: true
      redis: true
EOF

While running locally with make run.

Once system was provisioned, there were no HPAs:

% oc get hpa -n 3scale-test
No resources found in 3scale-test namespace.

Test scenarios:

  1. enable HPAs on listener/worker/prod apicast
borisurbanik@MacBookPro 3scale-operator % oc patch apimanager 3scale -n 3scale-test --type=merge \
    -p '{"spec":{"backend":{"listenerSpec":{"hpa":true},"workerSpec":{"hpa":true}},"apicast":{"productionSpec":{"hpa":true}}}}'
apimanager.apps.3scale.net/3scale patched
borisurbanik@MacBookPro 3scale-operator % oc get hpa -n 3scale-test
NAME                 REFERENCE                       TARGETS                        MINPODS   MAXPODS   REPLICAS   AGE
apicast-production   Deployment/apicast-production   memory: 50%/85%, cpu: 0%/85%   1         5         1          79s
backend-listener     Deployment/backend-listener     memory: 13%/85%, cpu: 0%/85%   1         5         1          80s
backend-worker       Deployment/backend-worker       memory: 19%/85%, cpu: 0%/85%   1         5         1          79s
  1. annotate with disable-async - as expected the backend HPAs have been removed
borisurbanik@MacBookPro 3scale-operator % oc patch apimanager 3scale -n 3scale-test --type=merge \
    -p '{"metadata":{"annotations":{"apps.3scale.net/disable-async":"true"}}}'
apimanager.apps.3scale.net/3scale patched
borisurbanik@MacBookPro 3scale-operator % oc get hpa -n 3scale-test
NAME                 REFERENCE                       TARGETS                        MINPODS   MAXPODS   REPLICAS   AGE
apicast-production   Deployment/apicast-production   memory: 50%/85%, cpu: 0%/85%   1         5         1          8m44s
  1. remove annotation and hpa flag - all HPAs removed
borisurbanik@MacBookPro 3scale-operator %   oc patch apimanager 3scale -n 3scale-test --type=json \
    -p '[{"op":"remove","path":"/metadata/annotations/apps.3scale.net~1disable-async"},{"op":"replace","path":"/spec/backend/listenerSpec/hpa","value":false},{"op":"replace","path":"/spec/backend/workerSpec/hpa","value":false},{"op":"replace","path":"/spec/apicast/productionSpec/hpa","value":false}]'
apimanager.apps.3scale.net/3scale patched
borisurbanik@MacBookPro 3scale-operator % oc get hpa -n 3scale-test
No resources found in 3scale-test namespace.

🤖 Co-authored with Claude Code

@urbanikb urbanikb requested a review from a team as a code owner May 7, 2026 17:41
@urbanikb urbanikb changed the title THREESCALE-14651 (THREESCALE-14224): Fix HPA reconciler to skip API delete calls when HPA already absent THREESCALE-14651: Fix HPA reconciler to skip API delete calls when HPA already absent May 8, 2026
@urbanikb urbanikb changed the title THREESCALE-14651: Fix HPA reconciler to skip API delete calls when HPA already absent THREESCALE-14651 Fix HPA reconciler to skip API delete calls when HPA already absent May 8, 2026
@urbanikb urbanikb force-pushed the THREESCALE-14651 branch from 286bd6d to d8fbaaf Compare May 8, 2026 05:15
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 8, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 44.03%. Comparing base (4963add) to head (d8fbaaf).
⚠️ Report is 18 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1175      +/-   ##
==========================================
+ Coverage   41.84%   44.03%   +2.18%     
==========================================
  Files         203      204       +1     
  Lines       20859    20923      +64     
==========================================
+ Hits         8729     9213     +484     
+ Misses      11350    10913     -437     
- Partials      780      797      +17     
Flag Coverage Δ
unit 44.03% <100.00%> (+2.18%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
apis/apps/v1alpha1 (u) 63.56% <ø> (+5.01%) ⬆️
apis/capabilities/v1alpha1 (u) 3.50% <ø> (ø)
apis/capabilities/v1beta1 (u) 20.21% <ø> (ø)
controllers (i) 12.08% <88.23%> (+2.76%) ⬆️
pkg (u) 63.75% <81.90%> (+2.03%) ⬆️
Files with missing lines Coverage Δ
...e/amp/operator/base_apimanager_logic_reconciler.go 56.80% <100.00%> (+4.17%) ⬆️

... and 10 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

tkan145
tkan145 previously approved these changes May 13, 2026
Comment thread pkg/3scale/amp/operator/base_apimanager_logic_reconciler_test.go
Comment thread pkg/3scale/amp/operator/base_apimanager_logic_reconciler_test.go
@tkan145 tkan145 dismissed their stale review May 13, 2026 03:21

Missing test cases

@urbanikb urbanikb force-pushed the THREESCALE-14651 branch from d8fbaaf to 52b1142 Compare May 13, 2026 23:11
Comment thread pkg/3scale/amp/operator/base_apimanager_logic_reconciler_test.go Outdated
Comment thread pkg/3scale/amp/operator/base_apimanager_logic_reconciler_test.go Outdated
Comment thread pkg/3scale/amp/operator/base_apimanager_logic_reconciler_test.go Outdated
Comment thread pkg/3scale/amp/operator/base_apimanager_logic_reconciler_test.go Outdated
ReconcileHpa() was calling DeleteResource() directly when HPA is disabled,
which issues a Client().Delete() API call every reconcile cycle regardless
of whether the HPA exists. After the first successful delete, subsequent
cycles hit a 404 on every cycle indefinitely.

Replace with TagObjectToDelete + ReconcileResource, which does a
cache-backed Get first and is a no-op when the object is already gone.
This is consistent with how PDB and monitoring resources are handled.

Add table-driven tests covering all three HPA targets (backend-listener,
backend-worker, apicast-production) across enabled, disabled, and
async-disable annotation scenarios.

Related: THREESCALE-14224

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@urbanikb urbanikb force-pushed the THREESCALE-14651 branch from 52b1142 to 61ef18d Compare May 14, 2026 14:33
@tkan145
Copy link
Copy Markdown
Contributor

tkan145 commented May 14, 2026

/retest

1 similar comment
@tkan145
Copy link
Copy Markdown
Contributor

tkan145 commented May 15, 2026

/retest

@tkan145
Copy link
Copy Markdown
Contributor

tkan145 commented May 15, 2026

/lgtm
/approve

Thanks. CI is failing however, and it seems the prow is currently a bit unstable. I will approve this PR but please postpone the merge until Monday.

@tkan145
Copy link
Copy Markdown
Contributor

tkan145 commented May 17, 2026

/retest

@tkan145
Copy link
Copy Markdown
Contributor

tkan145 commented May 18, 2026

Integration tests are failing because the route never available, similar results when running local tests.

Waiting for availability of Route with host 'api-3scale-apicast-production.test1.127.0.0.1.nip.io'
Waiting for availability of Route with host 'api-3scale-apicast-production.test1.127.0.0.1.nip.io'
Waiting for availability of Route with host 'api-3scale-apicast-production.test1.127.0.0.1.nip.io'
Waiting for availability of Route with host 'api-3scale-apicast-production.test1.127.0.0.1.nip.io'
Waiting for availability of Route with host 'api-3scale-apicast-production.test1.127.0.0.1.nip.io'
Waiting for availability of Route with host 'api-3scale-apicast-production.test1.127.0.0.1.nip.io'

Can you check whether it's something from us or zync is actually broken

@urbanikb
Copy link
Copy Markdown
Contributor Author

It did create routes locally - it seems the timout of 900s might be too short

/retest

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 18, 2026

@urbanikb: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/test-integration 61ef18d link true /test test-integration

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@tkan145
Copy link
Copy Markdown
Contributor

tkan145 commented May 19, 2026

The latest porta image is not backward compatible, I ran integration tested locally with 2.16 porta image and the test passed.
Feel free to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants