Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mgr/dashboard: remove daemon list for badges #46565

Closed
wants to merge 3 commits into from

Conversation

Pegonzal
Copy link
Contributor

@Pegonzal Pegonzal commented Jun 8, 2022

Signed-off-by: Pedro Gonzalez Gomez pegonzal@redhat.com
Fixes: https://tracker.ceph.com/issues/55895

Changes the UI for the rbd-mirroring daemons, from a table to badges color coded depending the daemon health property.

rbd-mirror_badges.mp4

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

Signed-off-by: Pedro Gonzalez Gomez <pegonzal@redhat.com>
@Pegonzal Pegonzal requested a review from a team as a code owner June 8, 2022 10:52
@Pegonzal Pegonzal added this to In progress in Dashboard via automation Jun 8, 2022
@Pegonzal Pegonzal requested review from nSedrickm and Sarthak0702 and removed request for a team June 8, 2022 10:52
Dashboard automation moved this from In progress to Review in progress Jun 8, 2022
@Sarthak0702
Copy link
Member

jenkins test make check

Signed-off-by: Pedro Gonzalez Gomez <pegonzal@redhat.com>
Copy link
Contributor

@pereman2 pereman2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits

… pipe function and updated unit test

Signed-off-by: Pedro Gonzalez Gomez <pegonzal@redhat.com>
@epuertat
Copy link
Member

Interesting clean-up @Pegonzal !

image

This led me to think about a couple of things:

  • In general Dashboard pages should be simple and following a single component/pattern: table, form, multi-card layout (landing page), etc. Why? Because each pattern serves a single purpose or concern: the table is for displaying a collection of items and their status/details, the form is for creating/editing each item settings, etc. However in the rbd-mirroring page I still see different purposes:
    • Connecting-configuring multiple clusters (bootstrapping)
    • Listing the rbd-mirroring daemons and status (but nothing can be changed there, so shouldn't this be on the services section?).
    • Listing the RBD pools (but RBD pools are also listed in the Pools section...)
    • Listing the mirror-enabled RBD images (but RBD images are also listed in the Block section...)
    • Rather than displaying multiple tables in a page, the tabs allow users to navigate through them, but only showing 1 table at a time.
  • From a user perspective, what's their workflow? They probably had their RBD pool configured, with tons of RBD images serving as disks for virtual machines, and they now want to ensure that if this Ceph cluster becomes unavailable, the VMs will be able to fail over to other Ceph cluster. So:
    • They want to easily sync this current cluster to a newer one (the bootstrap process fulfills this)
    • They would want to receive alerts if something is not working or is underperforming: rbd images out of sync or not table to keep the pace (no rbd-mirror available, bandwidth or IOPS bottlenecks, split brain situations, etc.)
    • If that happens, they might want to have a deeper look at it and check the RBD image stats (progress, syncing status, etc).

So all the above considered:

  • Now if no rbd-mirroring daemon and/or no RBD pools, a new empty state page will be displayed prompting to click on a button that configures the pre-requisites (RBD pool and rbd-mirror service).
  • Then, we should immediately display the import-export form (this probably calls for a Wizard)
  • After that, what should we display? I'd say: just a single table, probably the list of mirrored RBD images, pools and daemons:
    image
  • We have some RBD mirror metrics exported to Prometheus (not sure if all that we need, and more will come with the new exporter and per-image metrics), but none of them is used in the monitoring/ceph-mixins/prometheus_alerts.yml to trigger alerts, so perhaps we should add some:
ceph_prioritycache_cache_bytes{ceph_daemon="rbd-mirror.4203"} 2845415832.0
ceph_prioritycache_heap_bytes{ceph_daemon="rbd-mirror.4203"} 15409152.0
ceph_prioritycache_mapped_bytes{ceph_daemon="rbd-mirror.4203"} 15114240.0
ceph_prioritycache_target_bytes{ceph_daemon="rbd-mirror.4203"} 4294967296.0
ceph_prioritycache_unmapped_bytes{ceph_daemon="rbd-mirror.4203"} 294912.0
ceph_objecter_op_active{ceph_daemon="rbd-mirror.4203"} 0.0
ceph_objecter_op_r{ceph_daemon="rbd-mirror.4203"} 2.0
ceph_objecter_op_rmw{ceph_daemon="rbd-mirror.4203"} 0.0
ceph_objecter_op_w{ceph_daemon="rbd-mirror.4203"} 0.0
# HELP ceph_rbd_mirror_replay Replays
# TYPE ceph_rbd_mirror_replay counter
ceph_rbd_mirror_replay{ceph_daemon="rbd-mirror.4203"} 0.0
# HELP ceph_rbd_mirror_replay_bytes Replayed data
# TYPE ceph_rbd_mirror_replay_bytes counter
ceph_rbd_mirror_replay_bytes{ceph_daemon="rbd-mirror.4203"} 0.0
# HELP ceph_rbd_mirror_replay_latency_sum Replay latency Total
# TYPE ceph_rbd_mirror_replay_latency_sum counter
ceph_rbd_mirror_replay_latency_sum{ceph_daemon="rbd-mirror.4203"} 0.0
# HELP ceph_rbd_mirror_replay_latency_count Replay latency Count
# TYPE ceph_rbd_mirror_replay_latency_count counter
ceph_rbd_mirror_replay_latency_count{ceph_daemon="rbd-mirror.4203"} 0.0
# HELP ceph_rbd_mirror_snapshot_replay_bytes Replayed data
# TYPE ceph_rbd_mirror_snapshot_replay_bytes counter
ceph_rbd_mirror_snapshot_replay_bytes{ceph_daemon="rbd-mirror.4203"} 0.0
# HELP ceph_rbd_mirror_snapshot_snapshots Snapshots
# TYPE ceph_rbd_mirror_snapshot_snapshots counter
ceph_rbd_mirror_snapshot_snapshots{ceph_daemon="rbd-mirror.4203"} 0.0
# HELP ceph_rbd_mirror_snapshot_snapshots_time_sum Snapshots time Total
# TYPE ceph_rbd_mirror_snapshot_snapshots_time_sum counter
ceph_rbd_mirror_snapshot_snapshots_time_sum{ceph_daemon="rbd-mirror.4203"} 0.0
# HELP ceph_rbd_mirror_snapshot_snapshots_time_count Snapshots time Count
# TYPE ceph_rbd_mirror_snapshot_snapshots_time_count counter
ceph_rbd_mirror_snapshot_snapshots_time_count{ceph_daemon="rbd-mirror.4203"} 0.0
  • All the alerts we are now triggering are pure Dashboard-generated notifications (in the absence of Prometheus alerts, better something than nothing). I like the red-orange badge thing, but we should ensure it gives user a hint on what's going on and how to fix it (the following badge doesn't have any description/tooltip):
    image

(fetchData)="refresh()"
[status]="tableStatus">
</cd-table>
<ng-container *ngIf="empty">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's always use the ngIf="this else that".

}

daemonStatus(daemon_status: string): string {
const display_names = {success: 'UP', error: 'DOWN', warning: 'WARNING', info: 'UNKNOWN'};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is rather an enum.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont see how would this work as an enum since im getting the values from the keys in the HTML

(fetchData)="refresh()"
[status]="tableStatus">
</cd-table>
<ng-container *ngIf="empty">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to cache data yourself. Angular works better when you check the data you want. Angular does the memoization for you.

Suggested change
<ng-container *ngIf="empty">
<ng-container *ngIf="data">

Comment on lines +42 to +52
const daemon_states = { info: 0, error: 0, warning: 0, success: 0 };
this.empty = this.data.length > 0 ? false : true;
for (let i = 0; i < this.data.length; i++) {
const health_color = this.data[i]['health_color'];
daemon_states[health_color]++;
}
return daemon_states;
}

daemonStatus(daemon_status: string): string {
const display_names = {success: 'UP', error: 'DOWN', warning: 'WARNING', info: 'UNKNOWN'};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure this mapping is ok? error and down seem 2 different conditions (down means that the service is not running, while error means the service is running but encountered a condition that impedes it from continuing).

And info = unknown is a bit shocking to me...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yo are right, I think it should be 'error'.

I also agree that info = unknoen is a little bit wierd but thats what I'm getting from the backend. From get_daemon_health function at src/pybind/mgr/dashboard/controllers/rbd_mirroring.py

data: [];
columns: {};
data: any[];
daemons: {};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's name things clearly: this is count/histogram of the daemon statuses, not the daemons themselves. Additionally, a proper type should be here

Suggested change
daemons: {};
daemonCount: Record<'up' | 'down' | ..., number>;

Or:

enum DaemonStatus {
  up,
  down,
}

let countDaemon: {[Key in DaemonStatus as string]: number} = {up: 1, down: 2}; 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I Will do it. I'll be more careful with the naming

@Pegonzal
Copy link
Contributor Author

Thanks for your comments @epuertat! I have some more questions and coments regarding them:

Since this PR was ment to clean up some of the information regarding the daemons, and display the daemons health status in a more visual and simpler way should we tackle the next improvements in other issues/PRs?

                         -----------------------------------------
  • Listing the rbd-mirroring daemons and status (but nothing can be changed there, so shouldn't this be on the services section?).

Couldn't that be a problem since we need an orchestrator to see that information from the services section?

  • In general Dashboard pages should be simple and following a single component/pattern: table, form, multi-card layout (landing page), etc. Why? Because each pattern serves a single purpose or concern: the table is for displaying a collection of items and their status/details, the form is for creating/editing each item settings, etc. However in the rbd-mirroring page I still see different purposes:

That was the goal, to make it simple. A table is a great way to display information, but it seemed as there was not enough meaningful information inside the Daemons table to justify a whole table when there are two more tables displaying information aswell. There might be too much going at the same time, which most of the time might not be needed.

  • Rather than displaying multiple tables in a page, the tabs allow users to navigate through them, but only showing 1 table at a time.

This could be a great solution overall. However I think two main issues might rise from it.
Right now the Images table already has three different tabs, so how could we manage that with the proposed change to tabs.
And again, would it be really necesary a whole table to display the daemons data. Is it something the user could be interested?

  • Now if no rbd-mirroring daemon and/or no RBD pools, a new empty state page will be displayed prompting to click on a button that configures the pre-requisites (RBD pool and rbd-mirror service).

Would this PR cover that #46527 ?

  • We have some RBD mirror metrics exported to Prometheus (not sure if all that we need, and more will come with the new exporter and per-image metrics), but none of them is used in the monitoring/ceph-mixins/prometheus_alerts.yml to trigger alerts, so perhaps we should add some:

Perfect, will look into that

  • All the alerts we are now triggering are pure Dashboard-generated notifications (in the absence of Prometheus alerts, better something than nothing). I like the red-orange badge thing, but we should ensure it gives user a hint on what's going on and how to fix it (the following badge doesn't have any description/tooltip):
    image

Could be a great improvement to let the user what is going on. Will do.

@epuertat
Copy link
Member

Thanks for your comments @epuertat! I have some more questions and coments regarding them:

Since this PR was ment to clean up some of the information regarding the daemons, and display the daemons health status in a more visual and simpler way should we tackle the next improvements in other issues/PRs?

Sure, feel free to take/drop the suggestions.

  • Listing the rbd-mirroring daemons and status (but nothing can be changed there, so shouldn't this be on the services section?).

Couldn't that be a problem since we need an orchestrator to see that information from the services section?

While technically we could get some service information without orchestrator (Ceph tracks internally the status of Ceph services) you're right that this page won't work without orchestrator.

  • In general Dashboard pages should be simple and following a single component/pattern: table, form, multi-card layout (landing page), etc. Why? Because each pattern serves a single purpose or concern: the table is for displaying a collection of items and their status/details, the form is for creating/editing each item settings, etc. However in the rbd-mirroring page I still see different purposes:

That was the goal, to make it simple. A table is a great way to display information, but it seemed as there was not enough meaningful information inside the Daemons table to justify a whole table when there are two more tables displaying information aswell. There might be too much going at the same time, which most of the time might not be needed.

My idea of simplification here is more about adhering to generic patterns and removing ad-hoc (per-component) customizations, so the goal would be that every Dashboard page is just a table + form component. Anything else that adds specificities to a page would 'complicate' it. If we find that these new daemon labels make sense for ALL components, yes, let's go for it. But again, it's a suggestion.

  • Rather than displaying multiple tables in a page, the tabs allow users to navigate through them, but only showing 1 table at a time.

This could be a great solution overall. However I think two main issues might rise from it. Right now the Images table already has three different tabs, so how could we manage that with the proposed change to tabs. And again, would it be really necesary a whole table to display the daemons data. Is it something the user could be interested?

Agree on the daemons table. Tables should be displaying a medium-large amount of data. For just displaying 1-2 daemons it makes it a waste of space.

  • Now if no rbd-mirroring daemon and/or no RBD pools, a new empty state page will be displayed prompting to click on a button that configures the pre-requisites (RBD pool and rbd-mirror service).

Would this PR cover that #46527 ?

If there are no daemons, then the "welcome screen" would replace this page.

@Pegonzal
Copy link
Contributor Author

Then, after our discussion and your comments @epuertat my conclusion would be as follows:

This PR is a great implementation because it can show relevant information (the daemon status) to the user in a clear and simple way at everymoment at the top of the page but this doesn't follow the simplicity of following a single component/pattern we are looking for with our Dashboard pages. So it should either be discarded or added to other pages where might be needed to keep the consistency but adding another pattern witch will reduce the simplicity of the pages. So which of the two options would be the best?

However is it clear for all of us that this page would be beneffited from some kind of cleanup. Then the next things could be added/modified to do so:

  • if no rbd-mirroring daemon and/or no RBD pools, a new empty state page will be displayed prompting to click on a button that configures the pre-requisites (RBD pool and rbd-mirror service).
  • Then, we should immediately display the import-export form (through a Wizard)
  • Display a single table, the list of mirrored RBD images, pools and daemons:
    image
  • Add Prometheus metrics to trigger alerts
  • Add hints for the user to know whats going on

@pereman2 since you were also in favor of this PR, I would also like to know your thoughs about this conclusions.

Thank you guys for your help and comments and I'll wait for your responde about this.

@pereman2
Copy link
Contributor

@epuertat I think this is a problem of effectively displaying relevant data. I understand that you want to have consistency across components and try not to do ad-hoc implementations.
In this case we don't have single data source in rbd-mirroring like in pools, osds, hosts... The closest thing would be RGW where wwe have a link to a daemons table where we see a lot of information related to that rgw daemon, but that's the thing, there is relevant info in that table. Relevant info includes a whole list of details, realms, performance counters...

On the other side, we can agree that rbd-mirror is a different beast which works in a peculiar way. In the daemons table the relevant info is: name maybe and the state of the daemon because you cannot opererate if all dameons are down or with issues or whatever. If you were about to do something on an mirrored image and all daemons are down, you'd prefer not having to go to another tab before performing another action or checking if something is happening with daemons. If we have the same layout as we've always had, it wouldn't be a problem but still, you would have a huge table taking unnecessary space. With the current approach in this PR I believe you can have the same info the table provides, in a compact way without having to context switch between tables which is something I think it's important.

If your problem is consistency and that we would have to implement this everywhere next time, I don't think we should not change something because it doesn't stick to our usual bloated unreadable tables, we should strive to display the data in the most efficient way possible and it has to be relevant.

Copy link
Member

@nizamial09 nizamial09 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I agree that the rbd-mirroring page needs to be improved a lot, we should still be following some UX patterns. I know that we are kind of limited in that area but atleast we can try to follow the general theme and improve upon it. Our goal here in the end is to show what is relevant to the user upfront. And while showing the Daemons table in the first view is not mostly relevant, we can still show it. Just not in the front.

I think the Images (or the status of the images) is the important thing here and one that user will continue to come to look at even after setting rbd-mirror up. So just show the info in the Images tab first and then do everything in a different tab would make a lot of sense here. It'll adhere to the UX we have in place too.

While the Daemons page doesn't give a lot of info, it still shows which daemon is up and which is not. I don't like the idea of this being lost under a tooltip TBH.

Picture speaks a thousand words here so my vision is to have something like this in place?
Screenshot from 2022-06-29 20-40-22

I really don't want to lose this opportunity to do some cleanups around this page. So lets try to bring a good design that we all can agree on.

Let me know your thoughts on this too. @Pegonzal @epuertat @pereman2

Copy link
Contributor

@pereman2 pereman2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nizamial09 I'm ok with that approach. The only concern with is having again 3 tables displayed in a row manner. Feels weird.

@epuertat
Copy link
Member

I agree with @pereman2. We should find a way to display 1 piece of information at once (that's how some view abstractions work: pages, tabs) and avoid showing too much information simultaneously (collapsible/accordion components are acceptable, but risky if abused).

I'm fine with any improvement that reduces/condensates the amount of information displayed.

My comment today about adding badges/labels/chips (whichever we call them) is that we should prefer highlighting information that requires attention:

  • Using a red label for the "critical" alert label in the alerts table column is better than not highlighting anything.
  • Using a red label for displaying the count of critical alerts in the tab header, menu entry and similar places is even better (it aggregates information and directs the user to the place where the detailed info is).
  • Using the labels to display the number of UP/DOWN servers is ok too (better than the non highlighted version), but I had 2 issues with that: the placement was not clear (floating between the breadcrumbs and the tabs), and if we can detect the undesired situation (1 server DOWN), why not reporting that instead of forcing the user to digest and process both the ok info (X servers UP) and not-ok info (X servers DOWN)?

@epuertat epuertat moved this from Review in progress to Inactive in Dashboard Sep 2, 2022
@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

1 similar comment
@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Mar 28, 2023
@github-actions
Copy link

This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution!

@github-actions github-actions bot closed this Apr 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Dashboard
  
Inactive
5 participants