Skip to content

Multi-cluster Istio components status #8170

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Mar 28, 2025
Merged

Conversation

hhovsepy
Copy link
Contributor

@hhovsepy hhovsepy commented Feb 17, 2025

Describe the change

Kiali Masthead which shows Home cluster name and the components statuses are merged now.
It shows single icon with tooltip:
Screenshot from 2025-03-17 14-39-49
Added multicluster support of components statuses:
Screenshot from 2025-03-17 12-12-44
Screenshot from 2025-03-17 12-13-23
Always showing sorted the worst status cluster the first.

From Control plane card removed "Remote Cluster" label as all cluster's are listed in Masthead.

Steps to test the PR

Test for primary-remote and multy-primary cases.
Regression test for single cluster case.

Automation testing

Modified the cypress and unit tests.

Issue reference

#7724

@hhovsepy hhovsepy force-pushed the issue7724 branch 2 times, most recently from 0f5e8bb to 231ddb7 Compare February 20, 2025 13:41
@hhovsepy
Copy link
Contributor Author

Screenshot from 2025-03-12 12-27-22
Screenshot from 2025-03-12 12-27-17

@hhovsepy
Copy link
Contributor Author

Screenshot from 2025-03-17 12-13-23
Screenshot from 2025-03-17 12-12-44

@hhovsepy hhovsepy changed the title Issue7724 Multi-cluster Istio components status Mar 17, 2025
@hhovsepy hhovsepy marked this pull request as ready for review March 17, 2025 13:01
@hhovsepy hhovsepy requested review from ferhoyos and nrfox March 17, 2025 13:45
@hhovsepy hhovsepy self-assigned this Mar 17, 2025
@jshaughn jshaughn moved this from 📋 Backlog to 👀 In review in Kiali Sprint 25-10 | Kiali v2.13 Mar 17, 2025
Comment on lines 116 to 117
// if no control plane and no any other control plane which manages external
if len(istiodStatus) == 0 && !managesExternal {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this covers the case where a controlplane may be managing an external cluster but it's not the current cluster. e.g.

You have 3 clusters: A, B, C. A manages B but nothing manages C.

I don't think you need to look at ManagesExternal. You just need to know if there's a controlplane managing this cluster or not e.g. if cluster not in controlPlanes.managedClusters.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also not sure if this is always an error. You could have an "observability" cluster for instance where the observability components live separate from any istio components.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A problem with showing an error when the istiod is not found is that a user may not have access to the cluster where the istiod lives so this would show an error even though the istiod does exist.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have somewhere documented all those combinations? or it is only here? https://istio.io/latest/docs/setup/install/multicluster/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have changed the logic

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have somewhere documented all those combinations? or it is only here? https://istio.io/latest/docs/setup/install/multicluster/

It's just there but there but you can also combine some of those e.g. external control plane + primary-remote.

@hhovsepy hhovsepy requested a review from nrfox March 18, 2025 12:45
Copy link
Contributor

@nrfox nrfox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code looks good. Any new multi-cluster cypress tests that should be added for this to test the new masthead functionality?

@hhovsepy
Copy link
Contributor Author

hhovsepy commented Mar 18, 2025

code looks good. Any new multi-cluster cypress tests that should be added for this to test the new masthead functionality?

there are existing multicluster tests which are passing now, when all components are running fine, can be added a new ones to check that all cluster are in tooltip

@nrfox
Copy link
Contributor

nrfox commented Mar 18, 2025

can be added a new ones to check that all cluster are in tooltip

Since we're adding new functionality, it'd be good to also add new tests to cover it.

Added component unhealthy test
Then user sees a tooltip with text "west"
Then user does not see any "Not" in the tooltip
Then user does not see any "Unreachable" in the tooltip
Then user hovers out the cluster icon
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this last assertion testing? Seems like it performs an action but does no assertions afterward?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was like a cleanup, to close the tooltip, will move it to separate after

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@multi-cluster
@component-health
Scenario: Istio components unhealthy
When user scales to "0" the "prometheus" in namespace "istio-system"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A faster way to do this is to just intercept and mock the status response to change one of them to unhealthy. I'm worried that actually scaling prom up/down could affect other tests after this one that rely on metrics data like graph tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by mocking, we could not really test all those changes in the backend, it will be only UI functionality check,
but I agree that up/down of prometheus can cause some flake tests in backend, maybe I can up/down another component which is less sensitive, for instance 'grafana'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nrfox thanks for the review, made 'grafana' up/down scale instead of 'prometheus', and '@component-health-upscale' after operaton upscales it.

When user hovers over the cluster icon
Then user does not see any "Not" in the tooltip
Then user does not see any "Unreachable" in the tooltip
Then user hovers out the cluster icon
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here about this action at the end. Seems like it's not being used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@hhovsepy hhovsepy requested a review from nrfox March 27, 2025 08:40
Copy link
Contributor

@ferhoyos ferhoyos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the code and tested locally, everything looks fine for me, good job @hhovsepy 🙂

However, clarifying that this is not related to this PR, when I forced Grafana to be unreachable to trigger a warning, the Mesh page did not show this warning. Instead, it displayed a valid connection.

image

Does it happen the same to you?

@hhovsepy
Copy link
Contributor Author

I reviewed the code and tested locally, everything looks fine for me, good job @hhovsepy 🙂

However, clarifying that this is not related to this PR, when I forced Grafana to be unreachable to trigger a warning, the Mesh page did not show this warning. Instead, it displayed a valid connection.
Does it happen the same to you?

Thanks for the review and cooperation @ferhoyos , yes this happens to me as well. Probably we need another issue logged.

@hhovsepy hhovsepy merged commit 0ba7415 into kiali:master Mar 28, 2025
19 of 20 checks passed
@github-project-automation github-project-automation bot moved this from 👀 In review to ✅ Done in Kiali Sprint 25-10 | Kiali v2.13 Mar 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

3 participants