-
Notifications
You must be signed in to change notification settings - Fork 6.1k
Adding support for ceph mgmt-gateway #57535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I will not review the patch but I will take this opportunity to bike-shed the name a little bit: cluster-gateway doesn't make it clear that this is mainly for stuff like the dashboard and grafana, etc. |
@phlogistonjohn no worries and the I think it's a good time to dicuss about the service name :) The rev-proxy is includes monitoring stack (prometheus, alertmanager, grafana, ..). In addition, it can include any service app that we would run for cluster mgmt in the future. The ideal name I think could be "ingress" (to be aligned with k8s) but it's already in use as you know. I'm OK with going with another main as long as it describes better what's the purpose of the service 👍 |
ee4f57f
to
cab9d77
Compare
0495dd3
to
9a01460
Compare
6cd2f1f
to
3f7b6f0
Compare
08633bc
to
0a371f3
Compare
jenkins retest this please |
8eb1d1b
to
f375a8e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. Approval is for docs only. I've made a number of nitpicky wording requests.
a6698e7
to
e3beb1a
Compare
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
bb2f7f0
to
db12a75
Compare
jenkins test rook e2e |
1 similar comment
jenkins test rook e2e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Always tough to review PRs of this size, but given you already addressed my last set of review comments and nothing jumped out at me as an issue when checking over again, I approve
66a1043
to
abd8685
Compare
adding mgmt-gateway, a new cephadm service based on nginx, to act as the front-end and single entry point to the cluster. This gateway offers unified access to all Ceph applications, including the Ceph dashboard and monitoring tools (Prometheus, Grafana, ..), while enhancing security and simplifying access management through nginx. Fixes: https://tracker.ceph.com/issues/66095 Signed-off-by: Redouane Kachach <rkachach@ibm.com>
Signed-off-by: Redouane Kachach <rkachach@ibm.com>
Signed-off-by: Redouane Kachach <rkachach@ibm.com>
Failures:
Generally a good run, nothing to block merging most of the PRs |
Adding support for ceph mgmt-gateway Reviewed-by: Adam King <adking@redhat.com> Reviewed-by: Anthony D'Atri <anthonyeleven@users.noreply.github.com>
TODO
review default SSL/TLS configuration(addressed by mgr/cephadm: adding mTLS for ceph mgmt-gateway and backend services communication #58402)
secure_monitoring_stack
is enabled(addressed by mgr/cephadm: adding mTLS for ceph mgmt-gateway and backend services communication #58402)
fix the nginx image(now image from quay is used)This pull request introduces a new design for Ceph applications based on a modular, service-based architecture. A new cephadm service
mgmt-gateway
based on nginx an open-source, high-performance web server known for its scalability, efficiency, and versatility. It will act as the new front-end and single point entry to the cluster, providing unified access to all Ceph applications, including the Dashboard and monitoring applications. In addition, Nginx enhances security and simplifies access management due to its robust community and high security standards.Benefits of the new service
High availability enhancements
The current cephadm/dashboard implementation lacks HA when it comes to monitoring services. Even when cephadm is able to deploy N instances of services such as grafana, prometheus or alertmanager when configuring the dashboard (using
dashboard set-<service>-api-host
API) it just picks the last configured daemon. In case this daemons goes down there's no automated fail-over to use redundant healthy instance. The following diagram reflects the current architecture (notice dashboard is configured to access directly the different monitoring services).This problem is solved by using upstream HA features provided by nginx. The proposed solution makes sure of a dedicated internal server to act as rev-proxy for monitoring services. Dashboard is configured to use nginx end-points instead of using directly ip/host of the monitoring daemons. Following is a diagram of the new architecture:
As we can see in the above diagram, in the new architecture there are two servers:
External server: this server is responsible of attending and routing external user requests. The idea is for this server is use it also for any extra processing we would like to perform for external users such as authentication, authorization, etc. This server relies on nginx upstream feature to group the monitoring applications (by category). HA mechanism is implemented by selecting one of the available healthy servers.
Internal server: this server is responsible of attending and routing internal requests only. Similarly to the external case, this server relies on nginx upstream feature to provide monitoring HA this time for internal services. This server uses its own self-signed certificates to secure the communication with other internal clients.
Usage
cephadm:
ceph orch apply mgmt-gateway --placement=<your-destination-node>
Or by providing a detailed spec file (for custom certificates i.e):
Example of the generated nginx config:
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
x
between the brackets:[x]
. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows
jenkins test rook e2e