Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update grafana-dashboard.json #1916

Merged
merged 2 commits into from May 31, 2023

Conversation

itay-grudev
Copy link
Contributor

@itay-grudev itay-grudev commented Apr 12, 2023

New Cluster Overview

A new cluster overview section at the top that features:

  • Base backup and last archived WAL information
  • Total CPU and memory usage across all nodes
  • An Alerts section
  • Last failover/switchover time
  • Cluster PostgreSQL version
  • Transactions per second
  • Volume usage and total database size
  • Replication/Write/Flush/Replay lag

image

Server Health

  • Added instance zone
  • Displaying the full PostgreSQL version
  • Bug Fix: When there are redundant kube-state-metrics instances there are duplicate status gauges
  • Bug Fix: Max Connections not displaying a progress bar in the gauge due to missing min and max values.

Before:

image

After:

image

Configuration

The configuration parameters are transposed such that every row now contains a parameter, while every column contains the parameter setting across individual database instances. This makes it much easier to scroll through settings.

Before:

image

After:

image

New Storage & IO Space and Inode Usage metrics

Added volume space and inode usage gauges.

image

Accumulated Tuple/IO graph to make the data more comprehensible

Before:

image

After:

image

General

  • Fixed tooltips and set shared crosshairs of all graphs.
  • Bug Fix: Cluster variable query picking up any metric ending with cluster

@github-actions github-actions bot added backport-requested ◀️ This pull request should be backported to all supported releases release-1.18 release-1.19 labels Apr 12, 2023
@github-actions
Copy link
Contributor

❗ By default, the pull request is configured to backport to all release branches.

  • To stop backporting this pr, remove the label: backport-requested ◀️ or add the label 'do not backport'
  • To stop backporting this pr to a certain release branch, remove the specific branch label: release-x.y

@jsilvela
Copy link
Contributor

jsilvela commented May 5, 2023

Sorry for the delay. Been deploying this locally.
There is a problem with the datasource references as you have them in your files.

"datasource": {
        "type": "prometheus",
        "uid": "${DS_PROMETHEUS}"
}

These will not work properly with the Grafana as we deploy from the quickstart, using the kube-prometheus-stack.
We can substitute the "${DS_PROMETHEUS}" with "prometheus", as that uid is generated for the data source in the kube-prometheus-stack
deployment.

@jsilvela
Copy link
Contributor

jsilvela commented May 5, 2023

Since the JSON file you generated, in particular, you took from a grafana deployed with a different mechanism, I think there is value in keeping it as-is.
The YAML though, I would like to alter to make sure beginners following the quickstart will get the dashboard working out of the box.
So, I plan to add a bit more context in the documentation, and perhaps fix your files but add your JSON untouched but with a new name.

@jsilvela
Copy link
Contributor

jsilvela commented May 5, 2023

On the dashboard itself, the amount of work and polish is ... 👏 bravo.

@jsilvela
Copy link
Contributor

jsilvela commented May 8, 2023

With Itay's latest commit, the dashboard works out of the box if following the quickstart document.
Doing a detail review of the panels...

@sxd
Copy link
Member

sxd commented May 23, 2023

@itay-grudev fallowing @jsilvela advice, I think we should add the disclaimer into the quickstart with this PR adding that some features may fails because they need a really new/early version of Grafana.

I'm also adding my self as a reviewer to this PR because I really want to try ti! :D

@sxd sxd self-assigned this May 23, 2023
@benoitschipper
Copy link

benoitschipper commented May 23, 2023

@sxd and @itay-grudev ill happily test/review what I can. Already have the dashboard running :)

@itay-grudev
Copy link
Contributor Author

@sxd I discussed offline with @jsilvela that the alpha metrics are enabled by default on Prometheus. I think it's a limitation of KinD that the volume data is not reported. I've tested it on AWS EKS and Digital Ocean and there it works fine.

I still need to add the change @benoitschipper proposed.

@sxd
Copy link
Member

sxd commented May 23, 2023

@itay-grudev I've added a warning message that will help users to understand why the graphs doesn't have data, can you review it please! I understand that those are limited in KinD, but lots of users use KinD for testing and at least we need to be nice and put the warning message

@benoitschipper
Copy link

benoitschipper commented May 24, 2023

@sxd I discussed offline with @jsilvela that the alpha metrics are enabled by default on Prometheus. I think it's a limitation of KinD that the volume data is not reported. I've tested it on AWS EKS and Digital Ocean and there it works fine.

I still need to add the change @benoitschipper proposed.

Ill check/review it by testing it as soon as you have added the regex proposal so I can confirm if it works.

@itay-grudev itay-grudev force-pushed the grafana-dashboard branch 3 times, most recently from ae65888 to 4df2bb3 Compare May 25, 2023 16:00
@itay-grudev
Copy link
Contributor Author

@benoitschipper Done. Test if it works for you now.

@benoitschipper
Copy link

@benoitschipper Done. Test if it works for you now.

Will do!

Copy link

@benoitschipper benoitschipper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@itay-grudev I loaded the Dashboard and went through all the graphs on a live cluster in two seperate clusters. All worked without me making any changes (only the name). Works like a charm!

image

@sxd sxd added the ok to merge 👌 This PR can be merged label May 27, 2023
Itay Grudev added 2 commits May 31, 2023 18:02
Signed-off-by: Itay Grudev <igrudev@clustermarket.com>
Signed-off-by: Itay Grudev <igrudev@clustermarket.com>
@jsilvela jsilvela merged commit aea4365 into cloudnative-pg:main May 31, 2023
13 checks passed
cnpg-bot pushed a commit that referenced this pull request May 31, 2023
* Base backup and last archived WAL information
* Total CPU and memory usage across all nodes
* An Alerts section
* Last failover/switchover time
* Cluster PostgreSQL version
* Transactions per second
* Volume usage and total database size
* Replication/Write/Flush/Replay lag
* Added instance zone
* Displaying the full PostgreSQL version
* Bug Fix: When there are redundant `kube-state-metrics` instances there are duplicate status gauges
* Bug Fix: Max Connections not displaying a progress bar in the gauge due to missing `min` and `max` values.
* Transposed configuration section
* Added volume space and inode usage gauges.
* Fixed tooltips and set shared crosshairs of all graphs.
* Bug Fix: Cluster variable query picking up any metric ending with cluster

---------

Signed-off-by: Itay Grudev <igrudev@clustermarket.com>
(cherry picked from commit aea4365)
cnpg-bot pushed a commit that referenced this pull request May 31, 2023
* Base backup and last archived WAL information
* Total CPU and memory usage across all nodes
* An Alerts section
* Last failover/switchover time
* Cluster PostgreSQL version
* Transactions per second
* Volume usage and total database size
* Replication/Write/Flush/Replay lag
* Added instance zone
* Displaying the full PostgreSQL version
* Bug Fix: When there are redundant `kube-state-metrics` instances there are duplicate status gauges
* Bug Fix: Max Connections not displaying a progress bar in the gauge due to missing `min` and `max` values.
* Transposed configuration section
* Added volume space and inode usage gauges.
* Fixed tooltips and set shared crosshairs of all graphs.
* Bug Fix: Cluster variable query picking up any metric ending with cluster

---------

Signed-off-by: Itay Grudev <igrudev@clustermarket.com>
(cherry picked from commit aea4365)
cnpg-bot pushed a commit that referenced this pull request May 31, 2023
* Base backup and last archived WAL information
* Total CPU and memory usage across all nodes
* An Alerts section
* Last failover/switchover time
* Cluster PostgreSQL version
* Transactions per second
* Volume usage and total database size
* Replication/Write/Flush/Replay lag
* Added instance zone
* Displaying the full PostgreSQL version
* Bug Fix: When there are redundant `kube-state-metrics` instances there are duplicate status gauges
* Bug Fix: Max Connections not displaying a progress bar in the gauge due to missing `min` and `max` values.
* Transposed configuration section
* Added volume space and inode usage gauges.
* Fixed tooltips and set shared crosshairs of all graphs.
* Bug Fix: Cluster variable query picking up any metric ending with cluster

---------

Signed-off-by: Itay Grudev <igrudev@clustermarket.com>
(cherry picked from commit aea4365)
@itay-grudev itay-grudev deleted the grafana-dashboard branch June 2, 2023 08:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-requested ◀️ This pull request should be backported to all supported releases no-issue ok to merge 👌 This PR can be merged release-1.18 release-1.19 release-1.20
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants