-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] [DO NOT REVIEW] adding support for ceph admin-gateway #57535
base: main
Are you sure you want to change the base?
Conversation
I will not review the patch but I will take this opportunity to bike-shed the name a little bit: cluster-gateway doesn't make it clear that this is mainly for stuff like the dashboard and grafana, etc. |
@phlogistonjohn no worries and the I think it's a good time to dicuss about the service name :) The rev-proxy is includes monitoring stack (prometheus, alertmanager, grafana, ..). In addition, it can include any service app that we would run for cluster mgmt in the future. The ideal name I think could be "ingress" (to be aligned with k8s) but it's already in use as you know. I'm OK with going with another main as long as it describes better what's the purpose of the service 👍 |
ee4f57f
to
cab9d77
Compare
0495dd3
to
9a01460
Compare
6cd2f1f
to
3f7b6f0
Compare
92b71e3
to
1a9436e
Compare
Fixes: https://tracker.ceph.com/issues/66095 Signed-off-by: Redouane Kachach <rkachach@ibm.com>
Fixes: https://tracker.ceph.com/issues/66095 Signed-off-by: Redouane Kachach <rkachach@ibm.com>
Signed-off-by: Redouane Kachach <rkachach@ibm.com>
adding support for dynamic re-configuration. admin-gateway should be reconfigured automatically anytime there's a change on alertmanager, grafana or prometheus since url_prefix of these services depends on the presence (or not) of the admin-gateway. Similarly, these services must be reconfigured whenever the admin-gateway is added or removed. Signed-off-by: Redouane Kachach <rkachach@ibm.com>
there should be only once instance of admin-gateway service Signed-off-by: Redouane Kachach <rkachach@ibm.com>
adding support to populate nginx configuration automatically by adding all the currently active mgrs. Nginx redirection mechanism is used to choose automatically the active mgr instance. This way, we redirect the user to the right instance in case of mgr failover. Signed-off-by: Redouane Kachach <rkachach@ibm.com>
so far, we've been using only the first instance of each monitoring service (e.g., alertmanager, prometheus, etc) to configure nginx locations. In real deployments, multiple instances of each service may be active simultaneously. This change uses nginx's 'upstream' feature to configure all running instances as backends. This allows nginx to automatically choose a healthy instance to process incoming requests. Signed-off-by: Redouane Kachach <rkachach@ibm.com>
Signed-off-by: Redouane Kachach <rkachach@ibm.com>
so far the implementation was relying on a complex redirection mechansim. The new mechaism makes use of nginx backends feature to define a set of available dashboard servers. This way nginx automatically can pick up the active server. Signed-off-by: Redouane Kachach <rkachach@ibm.com>
when the adming-gateway is removed we have to restore the dashboard default configuration for standby behavior Signed-off-by: Redouane Kachach <rkachach@ibm.com>
Signed-off-by: Redouane Kachach <rkachach@ibm.com>
certificates (and private key) must not be generated when https is disabled. Additionally, grafana protocol must be the same as the rev-proxy. Signed-off-by: Redouane Kachach <rkachach@ibm.com>
Signed-off-by: Redouane Kachach <rkachach@ibm.com>
let's use 80 and 443 as default ports. Use can customize the port by using the spec.port option. Signed-off-by: Redouane Kachach <rkachach@ibm.com>
Signed-off-by: Redouane Kachach <rkachach@ibm.com>
Signed-off-by: Redouane Kachach <rkachach@ibm.com>
214e7ae
to
e26b4ad
Compare
Signed-off-by: Redouane Kachach <rkachach@ibm.com>
secure_monitoring_stack dependency is added so whenever the value of this configuration variable is changed we reconfigure the nginx to use the corresponding protocol. Signed-off-by: Redouane Kachach <rkachach@ibm.com>
jenkins retest this please |
so far we have been using only one file with all the configuration. This has the benefit of maintaining everything within the same file but it can be really complex when more advanced configuration is added for authentication for example. The new approach splits the configuration into three files: main, external and internal server configuration for better maintainability. Signed-off-by: Redouane Kachach <rkachach@ibm.com>
TODO
secure_monitoring_stack
is enabledThis pull request introduces a new design for Ceph applications based on a modular, service-based architecture. A new cephadm service
admin-gateway
based on nginx an open-source, high-performance web server known for its scalability, efficiency, and versatility. It will act as the new front-end and single point entry to the cluster, providing unified access to all Ceph applications, including the Dashboard and monitoring applications. In addition, Nginx enhances security and simplifies access management due to its robust community and high security standards.Benefits of the new service
High availability enhancements
The current cephadm/dashboard implementation lacks HA when it comes to monitoring services. Even when cephadm is able to deploy N instances of services such as grafana, prometheus or alertmanager when configuring the dashboard (using
dashboard set-<service>-api-host
API) it just picks the last configured daemon. In case this daemons goes down there's no automated fail-over to use redundant healthy instance. The following diagram reflects the current architecture (notice dashboard is configured to access directly the different monitoring services).This problem is solved by using upstream HA features provided by nginx. The proposed solution makes sure of a dedicated internal server to act as rev-proxy for monitoring services. Dashboard is configured to use nginx end-points instead of using directly ip/host of the monitoring daemons. Following is a diagram of the new architecture:
As we can see in the above diagram, in the new architecture there are two servers:
External server: this server is responsible of attending and routing external user requests. The idea is for this server is use it also for any extra processing we would like to perform for external users such as authentication, authorization, etc. This server relies on nginx upstream feature to group the monitoring applications (by category). HA mechanism is implemented by selecting one of the available healthy servers.
Internal server: this server is responsible of attending and routing internal requests only. Similarly to the external case, this server relies on nginx upstream feature to provide monitoring HA this time for internal services. This server uses its own self-signed certificates to secure the communication with other internal clients.
Usage
cephadm:
ceph orch apply admin-gateway --placement=<your-destination-node>
Or by providing a detailed spec file (for custom certificates i.e):
Example of the generated nginx config:
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
x
between the brackets:[x]
. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows
jenkins test rook e2e