-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make ansible_host a list of possible addresses #97
Comments
instead of hacking this into ansible_host .. an inventory plugin could resolve it, see ansible/ansible#32857 as an example. Instead of querying the network, go over provided list and return to ansible 'first reachable IP' for that host. |
I don't think you fully understand what we need here. In the case of Cisco ACI we would need to connect to one APIC in the cluster, authenticate to it and check the status of the APIC in the cluster (is it read-only or read-write) to understand whether this APIC is one we can use for communication. This means the the inventory plugin should be connecting, authenticating and quering the device before returning it in the inventory. If during the playbook run things change, this would fail hard. That's not what we are looking for, we are looking for a solution where the connection is aware of this. This means the inventory needs to include a list of hosts, the persistent connection is testing which APIC in the cluster can be used, and in case the cluster somehow no longer has quorum, start using the right APIC from the cluster (not necessarily the one it was using). So doing this once before the playbook-run, or some time in advance is NOT going to work. It will not offer any redundancy. |
vars plugin then |
Doesn't make sense either, sorry. It's part of making and maintaining the persistent connection. It does not belong in a vars plugin, and it wouldn't work because vars are evaluated early as well. |
@sivel Why the thumbs-down ? |
@dagwieers no, they are not evaluated early, that changed in 2.4 though im confused, if they are changing .. it wont be a persistent connection ... |
@bcoca They are not evaluated within the connection plugin or module when a connection error, or a cluster state change, has happened. So if the connection plugin or the module only received a single host, and that host is not (no longer) valid as a host, how would the vars plugin kick in and provide a different host ? So yes, a vars plugin is evaluated too early. |
I don't like the idea of this. I'd much rather be explicit. This was proposed before, we came back to saying to use (at the time) This is much the same as "I don't know if I boot strapped the host yet, and I change the SSH port, which is correct?" I even have my playbook used as the response to that request:
In the end I don't think it should be core functionality of connections. Use |
@dagwieers the point was doing the validation on the plugin and returning onlly the 'valid' data for connection to consume |
So again, both @bcoca and @sivel are not reading into what we need. I guess I am not explaining myself. We have 3 or more APICs in a cluster. Each of these APICs can be used, but you only know by connecting to one and see if they are part of the cluster. (authenticate, query for node status in cluster). This is related to the ACI REST connection plugin (currently it is part of the module, but that's going to change). At any point in time the node you are talking to could no longer be a node that you can use for making changes, because it may be isolated from the cluster for whatever reason. Sure it could suddenly by down as well, that's the easier case I guess. So any solution where you would check before running a task, or where you provide one working APIC is not a redundant solution, because the very next moment that system may not be working and we still have at least 2 other nodes that are working fine. (A single module is doing multiple requests to the APIC, so every one of these requests could fail and require an evaluation to use a different APIC) So the solution for real redundancy cannot come from the inventory, or from a variable, it needs to come from the layer that makes the connection, manages the connection or reconnects when there are issues. That layer needs to be aware of the existing nodes, the node being used and the fallback options. So any solution that makes the decision beforehand will never work. Because it means that the layer that actually needs the whole picture only knows about a single node. FAIL |
valid is in the eye of the beholder, it was valid at the time of testing, but may no longer be valid at the time of using. And valid here could mean making the actual connection and requesting the status from the system, which I don't want to duplicate in an inventory plugin or vars plugin, not only because it does not make sense, but because it is irrelevant. |
cc @rsmeyers |
In my eyes Dag's explanation seems very valid. We btw could have a similar functionality requirement with other controllers, for example OpenStack controllers, we have a similar behaviour there. |
@dagwieers no need to re-implement, can use same connection code ... not sure how things can change that fast between test and usage .. but then wouldn't that also change mid connection and between commands? I think what I am missing is what would cause these seemingly uncontrolled changes from one second to the next and is it really ansible's job to compensate for them constantly? in any case, it should be possible to build this into the specific plugins w/o modifying ansible core. |
@dagwieers just tested, there is nothing in core ansible that validates that |
@bcoca So on one hand we have the need for ACI, but I am looking at the general case as well. Multi-homed systems, etc... So I don't want to do this only for ACI, but also allow this for other connection types, like SSH and/or WinRM (use case #2).
It does not matter what is causing it, but Ansible should be able to work redundantly if there's a high-available setup. It's not my wish, it's what customers are demanding. But as I already indicated, this could be planned downtime (migration, upgrade, ...), network-related issues, hardware failure, software errors, or whatever reason people demand redundancy for highly-critical systems. |
sorry, but for me 'highly available setup' would mean that the connection info should always work ... I seem to not be getting 'somethign' here. in any case you can easily do a proof of concept with a new 'ssh_cluster_aware' connection plugin |
@bcoca More and more we see high available setups not using a VIP anymore, as this adds a complexity which in the end could also go wrong, so they put it on the client to determine who is active and available, hence, this needs to be controlled from Ansible side |
My thoughts are suppose I have a list of IPs for a given physical server. SSH may only be listening on some of them, or based on where the control host is not all may be reachable. From a security stance it's not stellar, but in the event that the list gets out of date and somehow starts spanning multiple hosts there will be a SSH hostkey missmatch error. However this leaves the door open in the case where the entire list is new and you don't know if the host key is the one you want. |
Proposal: host-redundancy
Author: Dag Wieers @dagwieers
Date: 2018-02-01
Motivation
Some infrastructure is redundantly set up to not rely on a single node/route/interface. We would like a redundant way to access those resources when using Ansible.
Problems
Use case Docker Modules #1: Some systems are designed for redundancy, e.g. ACI works with a cluster of APICs (3 or more) to manage the ACI fabric. Any of these APICs can be used, and in case one of the APICs is unfit for use (planned or unplanned, isolated or quorum), Ansible could be using a fully-fit APIC to perform the required tasks.
Use case Adding Docker module proposals #2: Some systems have more than one interface for redundancy (or convenience), e.g. when managing Windows laptops they can have either a Wired or Wireless IP address, but it's still the same system. We would like to target this system, no matter whether it's connected by wire or wireless. (Because we don't know which one is being used)
Solution proposal
ansible_host
could be defined as a list and Ansible would attempt to use the next address if the previous is unreachable. Possibly providing a methodology on how to select the order to try (e.g. consecutively, randomly, ...)delegate_to
The text was updated successfully, but these errors were encountered: