-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mgr/dashboard: add prometheus federation config for mullti-cluster monitoring #54964
mgr/dashboard: add prometheus federation config for mullti-cluster monitoring #54964
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some initial impressions!
src/pybind/mgr/cephadm/templates/services/prometheus/prometheus.yml.j2
Outdated
Show resolved
Hide resolved
8e64de8
to
9ad8cd4
Compare
9ad8cd4
to
bc95834
Compare
jenkins retest this please |
jenkins test make check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comments. Can't speak much to the changes to the actual prometheus conf, but generally the code looks okay outside of the tests failing.
96182fd
to
c21d7ac
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just left some minor comments + some other more specific to security. Plz, I'd like to know if we have done any security assessment of what are the implications of enabling the security + this new feature. Is the system still secure? if not what security issues could we face when enabling this new feature and what should we do to overcome them.
relabel_configs: | ||
- source_labels: [__address__] | ||
target_label: cluster | ||
replacement: {{ cluster_fsid }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section assumes you are using secure communication. Is this the case? what security implications has this new feature? are we taking them into account? have we did any security assessment for the impact?
@@ -964,6 +964,18 @@ def _set_prometheus_access_info(self, username: Optional[str] = None, password: | |||
except ArgumentError as e: | |||
return HandleCommandResult(-errno.EINVAL, "", (str(e))) | |||
|
|||
@_cli_write_command('orch prometheus set-target') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if I give a target like http://<ip>:port
it kind of kills the prometheus daemon and i had to remove the target and restart prometheus module to get it working. Should it fail like that for a simple error. And if this is a crucial mistake, then it should have a proper validation set-up or we might end up breaking a deployment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
atleast some helpers mentioning how the prometheus target should look like would be helpful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nizamial09 , this issue is being tracked here - https://tracker.ceph.com/issues/64369, Will open a separate PR for the mentioned issues soon.
5fa6d16
to
d7faaaf
Compare
@adk3798 any updates? |
d7faaaf
to
852ecb8
Compare
@aaSharma14 i saw these in the unit test failures
|
failures were all the |
852ecb8
to
f8c0940
Compare
monitoring Signed-off-by: Aashish Sharma <aasharma@redhat.com>
f8c0940
to
82b50b4
Compare
jenkins test api |
Rendering the dashboards and alerts with showMultiCluster=True allows for them to work with multiple clusters storing their metrics in a single Prometheus instance. This works via the cluster label and that functionality already existed. This just fixes some inconsistencies in applying the label filters. Additionally this contains updates to the tests to have them succeed with with both configurations and avoid the introduction of regressions in regards to multiCluster in the future. There also are some consistency cleanups here and there: * `datasource` was not used consistently * `cluster` label_values are determined from `ceph_health_status` * `job` template and filters on this label were removed to align multi cluster support solely via the `cluster` label * `ceph_hosts` filter now uses label_values from any ceph_metadata metrici to now show all instance values, but those of hosts with some Ceph component / daemon. * Enable showMultiCluster=True since `cluster` label is now always present, via ceph#54964 Fixes: https://tracker.ceph.com/issues/64321 Signed-off-by: Christian Rohmann <christian.rohmann@inovex.de>
Rendering the dashboards and alerts with showMultiCluster=True allows for them to work with multiple clusters storing their metrics in a single Prometheus instance. This works via the cluster label and that functionality already existed. This just fixes some inconsistencies in applying the label filters. Additionally this contains updates to the tests to have them succeed with with both configurations and avoid the introduction of regressions in regards to multiCluster in the future. There also are some consistency cleanups here and there: * `datasource` was not used consistently * `cluster` label_values are determined from `ceph_health_status` * `job` template and filters on this label were removed to align multi cluster support solely via the `cluster` label * `ceph_hosts` filter now uses label_values from any ceph_metadata metrici to now show all instance values, but those of hosts with some Ceph component / daemon. * Enable showMultiCluster=True since `cluster` label is now always present, via ceph#54964 Fixes: https://tracker.ceph.com/issues/64321 Signed-off-by: Christian Rohmann <christian.rohmann@inovex.de>
Rendering the dashboards and alerts with showMultiCluster=True allows for them to work with multiple clusters storing their metrics in a single Prometheus instance. This works via the cluster label and that functionality already existed. This just fixes some inconsistencies in applying the label filters. Additionally this contains updates to the tests to have them succeed with with both configurations and avoid the introduction of regressions in regards to multiCluster in the future. There also are some consistency cleanups here and there: * `datasource` was not used consistently * `cluster` label_values are determined from `ceph_health_status` * `job` template and filters on this label were removed to align multi cluster support solely via the `cluster` label * `ceph_hosts` filter now uses label_values from any ceph_metadata metrici to now show all instance values, but those of hosts with some Ceph component / daemon. * Enable showMultiCluster=True since `cluster` label is now always present, via ceph#54964 Fixes: https://tracker.ceph.com/issues/64321 Signed-off-by: Christian Rohmann <christian.rohmann@inovex.de>
Rendering the dashboards with showMultiCluster=True allows for them to work with multiple clusters storing their metrics in a single Prometheus instance. This works via the cluster label and that functionality already existed. This just fixes some inconsistencies in applying the label filters. Additionally this contains updates to the tests to have them succeed with with both configurations and avoid the introduction of regressions in regards to multiCluster in the future. There also are some consistency cleanups here and there: * `datasource` was not used consistently * `cluster` label_values are determined from `ceph_health_status` * `job` template and filters on this label were removed to align multi cluster support solely via the `cluster` label * `ceph_hosts` filter now uses label_values from any ceph_metadata metrici to now show all instance values, but those of hosts with some Ceph component / daemon. * Enable showMultiCluster=True since `cluster` label is now always present, via ceph#54964 Fixes: https://tracker.ceph.com/issues/64321 Signed-off-by: Christian Rohmann <christian.rohmann@inovex.de>
Rendering the dashboards with showMultiCluster=True allows for them to work with multiple clusters storing their metrics in a single Prometheus instance. This works via the cluster label and that functionality already existed. This just fixes some inconsistencies in applying the label filters. Additionally this contains updates to the tests to have them succeed with with both configurations and avoid the introduction of regressions in regards to multiCluster in the future. There also are some consistency cleanups here and there: * `datasource` was not used consistently * `cluster` label_values are determined from `ceph_health_status` * `job` template and filters on this label were removed to align multi cluster support solely via the `cluster` label * `ceph_hosts` filter now uses label_values from any ceph_metadata metrici to now show all instance values, but those of hosts with some Ceph component / daemon. * Enable showMultiCluster=True since `cluster` label is now always present, via ceph#54964 Improves: https://tracker.ceph.com/issues/64321 Signed-off-by: Christian Rohmann <christian.rohmann@inovex.de>
Rendering the dashboards with showMultiCluster=True allows for them to work with multiple clusters storing their metrics in a single Prometheus instance. This works via the cluster label and that functionality already existed. This just fixes some inconsistencies in applying the label filters. Additionally this contains updates to the tests to have them succeed with with both configurations and avoid the introduction of regressions in regards to multiCluster in the future. There also are some consistency cleanups here and there: * `datasource` was not used consistently * `cluster` label_values are determined from `ceph_health_status` * `job` template and filters on this label were removed to align multi cluster support solely via the `cluster` label * `ceph_hosts` filter now uses label_values from any ceph_metadata metrici to now show all instance values, but those of hosts with some Ceph component / daemon. * Enable showMultiCluster=True since `cluster` label is now always present, via ceph#54964 Improves: https://tracker.ceph.com/issues/64321 Signed-off-by: Christian Rohmann <christian.rohmann@inovex.de>
Rendering the dashboards with showMultiCluster=True allows for them to work with multiple clusters storing their metrics in a single Prometheus instance. This works via the cluster label and that functionality already existed. This just fixes some inconsistencies in applying the label filters. Additionally this contains updates to the tests to have them succeed with with both configurations and avoid the introduction of regressions in regards to multiCluster in the future. There also are some consistency cleanups here and there: * `datasource` was not used consistently * `cluster` label_values are determined from `ceph_health_status` * `job` template and filters on this label were removed to align multi cluster support solely via the `cluster` label * `ceph_hosts` filter now uses label_values from any ceph_metadata metrici to now show all instance values, but those of hosts with some Ceph component / daemon. * Enable showMultiCluster=True since `cluster` label is now always present, via ceph#54964 Improves: https://tracker.ceph.com/issues/64321 Signed-off-by: Christian Rohmann <christian.rohmann@inovex.de>
Rendering the dashboards with showMultiCluster=True allows for them to work with multiple clusters storing their metrics in a single Prometheus instance. This works via the cluster label and that functionality already existed. This just fixes some inconsistencies in applying the label filters. Additionally this contains updates to the tests to have them succeed with with both configurations and avoid the introduction of regressions in regards to multiCluster in the future. There also are some consistency cleanups here and there: * `datasource` was not used consistently * `cluster` label_values are determined from `ceph_health_status` * `job` template and filters on this label were removed to align multi cluster support solely via the `cluster` label * `ceph_hosts` filter now uses label_values from any ceph_metadata metrici to now show all instance values, but those of hosts with some Ceph component / daemon. * Enable showMultiCluster=True since `cluster` label is now always present, via ceph#54964 Resolves: rhbz#2275936 Improves: https://tracker.ceph.com/issues/64321 Signed-off-by: Christian Rohmann <christian.rohmann@inovex.de> (cherry picked from commit 2457451)
Rendering the dashboards with showMultiCluster=True allows for them to work with multiple clusters storing their metrics in a single Prometheus instance. This works via the cluster label and that functionality already existed. This just fixes some inconsistencies in applying the label filters. Additionally this contains updates to the tests to have them succeed with with both configurations and avoid the introduction of regressions in regards to multiCluster in the future. There also are some consistency cleanups here and there: * `datasource` was not used consistently * `cluster` label_values are determined from `ceph_health_status` * `job` template and filters on this label were removed to align multi cluster support solely via the `cluster` label * `ceph_hosts` filter now uses label_values from any ceph_metadata metrici to now show all instance values, but those of hosts with some Ceph component / daemon. * Enable showMultiCluster=True since `cluster` label is now always present, via ceph#54964 Improves: https://tracker.ceph.com/issues/64321 Signed-off-by: Christian Rohmann <christian.rohmann@inovex.de>
Rendering the dashboards with showMultiCluster=True allows for them to work with multiple clusters storing their metrics in a single Prometheus instance. This works via the cluster label and that functionality already existed. This just fixes some inconsistencies in applying the label filters. Additionally this contains updates to the tests to have them succeed with with both configurations and avoid the introduction of regressions in regards to multiCluster in the future. There also are some consistency cleanups here and there: * `datasource` was not used consistently * `cluster` label_values are determined from `ceph_health_status` * `job` template and filters on this label were removed to align multi cluster support solely via the `cluster` label * `ceph_hosts` filter now uses label_values from any ceph_metadata metrici to now show all instance values, but those of hosts with some Ceph component / daemon. * Enable showMultiCluster=True since `cluster` label is now always present, via ceph#54964 Improves: https://tracker.ceph.com/issues/64321 Signed-off-by: Christian Rohmann <christian.rohmann@inovex.de> (cherry picked from commit 090b8e1)
Introduce prometheus fedeartion in ceph dashboard. This is done by adding a
federate
job to the prometheus configuration. We can add/remove targets (remote cluster's prometheus service endpoint) to this job to scrape data from different clusters. These targets are getting added in the prometheus config file by exposing two new orch clis -Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
x
between the brackets:[x]
. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows
jenkins test rook e2e