New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mgr/cephadm: allow ingress services with same frontend port and different VIP #53008
Conversation
src/cephadm/cephadm.py
Outdated
@@ -3231,15 +3238,22 @@ def fetch_custom_config_files(ctx: CephadmContext) -> List[Dict[str, Any]]: | |||
return [] | |||
|
|||
|
|||
def fetch_tcp_ports(ctx: CephadmContext) -> List[int]: | |||
def fetch_tcp_ports(ctx: CephadmContext) -> Tuple[List[int], Dict[str, str]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of returning a tuple, how about returning a new type (something like PortSelection
perhaps)? The new type would have methods (or properties) for port listing or port-to-ip mappings. This type can encapsulate the knowledge/logic around the daemons port configuration.
As an additional benefit, instead of adding a new parameter to many functions you can just change the type of the existing daemon_ports
argument. Then only functions that use daemon_ports need to change, other ones that just pass the argument along can continue to be the same code (modulo the type signature update).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I'm thinking about this some more, is the information in port_ips
a strict superset of the information in tcp_ports
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only if we set it up so port_ips
has an entry for every port in tcp_ports
. Currently, it's only adding entries for things it knows are binding to special IPs (which is just the frontend port for haproxy)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I was wondering if the info in port_ips was a superset we could just convert to port_ips everywhere. Maybe that'd be a good idea in the future but I won't push for it in this PR. My other comment still stands, I think encapsulating these items together (since they are still logically coupled) would be a good thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's definitely an argument to be made that we want to only have 1 object we pass down and tcp_ports and port_ips should be merged somehow into one thing. That would require a more in depth change as we'd have to then modify the handling of tcp_ports in the binary and how tcp_ports was generated to begin with in the mgr module. I can at least take a look at what that sort of change would look like and at the very least handle the suggestion from your first comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've modified this to use the EndPoint class that was already defined in cephadm and combines a port and ip. I also ended up needing an additional commit to fix some error handling in port_in_use/attempt_bind. Not sure how that didn't come up the first time I was trying this, probably a messy test env after so many failed haproxy deployments and upgrades.
2877d42
to
95ba35e
Compare
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
95ba35e
to
322160b
Compare
4 failures, 2 dead jobs
Nothing to block merging. Also verified neither of the 2 PRs in the run caused selinux denials in the test_extra_daemon_features test by running 20 instances of it on a variety of distros https://pulpito.ceph.com/adking-2023-08-21_18:29:22-orch:cephadm-wip-adk4-testing-2023-08-21-1021-distro-default-smithi/ |
This is mostly for checking for port conflicts. Currently, we just check if the port is bound to on any IP on the host. This mechanism should allow certain daemon types to specify a port -> IP mapping that will be passed to the cephadm binary. That mapping will then be used by cephadm to only check for the port being bound to on that specific IP rather than any IP on the host. The end result is we could have daemons bound to the same port on different IPs on the same node. It's expected that daemon types will set this up as part of their prepare_create or generate_config functions where they may have more info about the specific IPs and ports they need. Signed-off-by: Adam King <adking@redhat.com>
If we know what IP the frontend_port will be binding to, we can pass that down through the port_ips mapping so cephadm will only check if that port on that specific VIP if in use. This allows multiple haproxy daemons to be bound to the same port on different VIPs on the same host. Note that you still must use a different monitor port for the two different ingress services as that port is bound to on the actual IP of the host. Only the frontend port can be the same for haproxies on the same host as long as the VIP is different. Fixes: https://tracker.ceph.com/issues/57614 Signed-off-by: Adam King <adking@redhat.com>
Before it was always converting the OSError to our self-defined "Error" class. This causes an issue with the port_in_use function that has special handling for OSError when the errno is EADDRNOTAVAIL or EAFNOSUPPORT. Since the error being raised was no longer an OSError it wasn't being caught and checked properly in port_in_use. This has the additional property of being necessary to check port availability for haproxy on its VIP. If we fail deployment when EADDRNOTAVAIL is raised, it becomes difficult to deploy the ingress service. If we deploy haproxy first it fails because the VIP isn't available yet (since keepalive isn't up) and it fails saying the port it wants to bind to is unavailable (specifically EADDRNOTAVAIL). If we try to deploy keepalive first it fails because it needs to know the location of the haproxy daemons in order to build its config file. This has worked in the past by just having the haproxy fail to bind at first and then fix itself once the keepalive daemon is deployed. That no longer works if the haproxy daemon fails to deploy because cephadm is reporting the port it needs is unavailable. Since EADDRNOTAVAIL when deploying haproxy likely means the VIP is not up rather than something else is taking up the port it needs, fixing the handling of this allows ingress deployment to work while also allowing multiple haproxy daemons on the same host to use the same frontend port bound to different VIPs. Signed-off-by: Adam King <adking@redhat.com>
322160b
to
b2f133f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
If we know what IP the frontend_port will be binding
to, we can pass that down through the port_ips mapping
so cephadm will only check if that port on that specific
VIP if in use. This allows multiple haproxy daemons
to be bound to the same port on different VIPs on the
same host.
Note that you still must use a different monitor port
for the two different ingress services as that port
is bound to on the actual IP of the host. Only the
frontend port can be the same for haproxies on the
same host as long as the VIP is different.
Fixes: https://tracker.ceph.com/issues/57614
Signed-off-by: Adam King adking@redhat.com
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows