Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(health): route health check stuck in 'Progressing' (#10037) #12959

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

hille721
Copy link

@hille721 hille721 commented Mar 22, 2023

There was already a PR which tries to fix that the Openshift route health check stuck in 'Progressing' (#8170). But there is still a case when this current implementation of the healthcheck does not work:

status:
  ingress:
    - conditions:
        - lastTransitionTime: '2023-03-24T07:58:05Z'
          status: 'True'
          type: Admitted
        - message: xxx successfully claimed in Infoblox
          reason: ClaimedInInfoblox
          status: 'True'
          type: Admitted
      host: xxx
      routerCanonicalHostname: xxx
      routerName: default
      wildcardPolicy: None

As you can see there are two conditions with type: Admitted and status: 'True' which results that the counter numTrue will be greater than numIngressRules. But the current health check wants an equals to (https://github.com/argoproj/argo-cd/blob/master/resource_customizations/route.openshift.io/Route/health.lua#L20).
It can be simple fix if we make the == to >=.

Fixes #10037

Note on DCO:

If the DCO action in the integration test fails, one or more of your commits are not signed off. Please click on the Details link next to the DCO action for instructions on how to resolve this.

Checklist:

  • Either (a) I've created an enhancement proposal and discussed it with the community, (b) this is a bug fix, or (c) this does not need to be in the release notes.
  • The title of the PR states what changed and the related issues number (used for the release note).
  • I've included "Closes [ISSUE #]" or "Fixes [ISSUE #]" in the description to automatically close the associated issue.
  • I've updated both the CLI and UI to expose my feature, or I plan to submit a second PR with them.
  • Does this PR require documentation updates?
  • I've updated documentation as required by this PR.
  • Optional. My organization is added to USERS.md.
  • I have signed off all my commits as required by DCO
  • I have written unit and/or e2e tests for my change. PRs without these are unlikely to be merged.
  • My build is green (troubleshooting builds).
  • My new feature complies with the feature status guidelines.
  • I have added a brief description of why this PR is necessary and/or what this PR solves.

Please see Contribution FAQs if you have questions about your pull-request.

fixes argoproj#10037

Signed-off-by: Christoph Hille <38468371+hille721@users.noreply.github.com>
@codecov
Copy link

codecov bot commented Mar 22, 2023

Codecov Report

Patch and project coverage have no change.

Comparison is base (6bef7fb) 47.98% compared to head (10e8a79) 47.98%.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #12959   +/-   ##
=======================================
  Coverage   47.98%   47.98%           
=======================================
  Files         246      246           
  Lines       42221    42221           
=======================================
  Hits        20259    20259           
  Misses      19939    19939           
  Partials     2023     2023           

see 1 file with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@crenshaw-dev crenshaw-dev changed the title fix: route health check stuck in 'Progressing' (fixes #10037) fix(health): route health check stuck in 'Progressing' (#10037) May 28, 2023
Copy link
Collaborator

@crenshaw-dev crenshaw-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hille721 would you mind adding a test for this health check to prevent regressions?

Copy link
Member

@ishitasequeira ishitasequeira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hille721 I don't think that just changing == to >= would fix the issue you are describing.
If there are multiple hosts included as part of the Ingress spec, then status for each host should have all the conditions to be True.

Your check will fail if one host has both status conditions false and one host has both conditions true. In this case the status will still show as Healthy wherein it needs to be Degraded.

@hille721 WDYT?

@hille721
Copy link
Author

hille721 commented Jun 9, 2023

Your check will fail if one host has both status conditions false and one host has both conditions true. In this case the status will still show as Healthy wherein it needs to be Degraded.

@ishitasequeira Same happen with current implementation, as if I understand your scenario correctly, you will end up with numIngressRules=2 and numTrue=2. Which is equal and will lead to status Healthy.

But yeah maybe we should extend the condition to if numTrue >= numIngressRules and numFalse != 0

@hille721
Copy link
Author

hille721 commented Jun 9, 2023

@hille721 would you mind adding a test for this health check to prevent regressions?

Hi @crenshaw-dev,
is there already a test which I can extend. Otherwise I'm not sure how to do that, as I don't know the scenarios how the route status could look like

@ishitasequeira
Copy link
Member

Your check will fail if one host has both status conditions false and one host has both conditions true. In this case the status will still show as Healthy wherein it needs to be Degraded.

@ishitasequeira Same happen with current implementation, as if I understand your scenario correctly, you will end up with numIngressRules=2 and numTrue=2. Which is equal and will lead to status Healthy.

But yeah maybe we should extend the condition to if numTrue >= numIngressRules and numFalse != 0

Yes, the current implementation does not consider the scenario raised in #10037. Adding if numTrue >= numIngressRules and numFalse != 0 should surely help.

@hille721 would you mind adding a test for this health check to prevent regressions?

Hi @crenshaw-dev, is there already a test which I can extend. Otherwise I'm not sure how to do that, as I don't know the scenarios how the route status could look like

For adding a test case, you would need to add a file in the testdata folder similar to https://github.com/argoproj/argo-cd/blob/10e8a79be3899c2cf0aee19d069f52260c33c364/resource_customizations/route.openshift.io/Route/testdata/degraded.yaml.

The Yaml file would include the route definition which defines the scenario you have specified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Openshift Routes are "always" Progressing
3 participants