In order to run Ansible Playbooks and Ansible Tower Job Templates against your target devices, you must configure Ansible properly to connect to the target devices. Ansible typically is configured to communicate to target devices/servers (managed nodes) directly from the Ansible Control Node or Ansible Tower Node. This requires no special settings except the basic user, password, ssh key, etc to access the target. However the situation gets a little complex when you are required to go through one or more bastion or jump hosts.
A more advanced and real-world scenarios involves two completely separate infrastuctures from two different organizations who may be partnering and require integration between their infrastructure. This often means Ansible and/or Ansible Tower is sitting in one of the networks/infrastructure and must manage resources in the other infrastructure. Due to security and various other possible reasons, this connection can have multiple jump hosts involved.
This gets more complex when you additionally have to consider the fact that Ansible uses a different connection type when running network automation as compared with typical platform automation over standard SSH.
Lastly, we have to consider that these jump hosts will be required not only for network or platform automation but we will need to jump even if we wish to access an API service. This is the case when we want to pull dynamic inventory from a system that can only be accessed from the final jumphost. How do handle this? As typically Ansible requires/assumes a direct connection with respect to their dynamic inventory plugins.
We also have to handle the likelihood of each jumphost requiring different SSH port, different SSH key, and so on.
So let's look at how to solve this.
Jumps over SSH can be configured by using one of the two methods, both of which require setting the Ansible ansible_ssh_common_args
inventory variable to pass some ssh parameters into the connection request.
The simplest solution is to use an ssh config
file. For Ansible this requires setting the ansible_ssh_common_args
variable in your Inventory as such. This will tell SSH to load a specific config file for all the ssh settings.
ansible_ssh_common_args: '-F ssh-config'
And here is an example of the ssh config file that defines multiple jumps. Notice the use of the ProxyJump
option to reference a different host.
However in some cases you may not be able to use an ssh config
file as this presents a possible security risk if other users can access the same ssh config
file.
Instead of using an ssh config file, the jumps can alternatively be defined by using the ProxyCommand
SSH option. Note that you cannot use ProxyJump
(a newer option that is meant to simplify the ProxyCommand argument) because this argument does not allow defining multiple jumps on the commandline with different ssh keys for each jump.
For a single jump, it's fairly straightforward and we set the ansible_ssh_common_args
variable within our inventory to point to the single jumphost. Notice that it's critical to add the extra options -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null
to ignore host key checking for both the jumphost and the target host/device. If you do not add this, you may get a strange "banner" error message from SSH that is very difficult to analyze.
ansible_ssh_common_args= -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -o ProxyCommand="ssh -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -i {{ lookup('env', 'JH1_SSH_PRIVATE_KEY') }} -W %h:%p -q {{ jh1_ssh_user }}@{{ jh1_ip }}"
This uses an environment variable and extra variables to allow for a flexible solution. These can be set on the commandline or optimally using Ansible Tower's custom Credential Type, which will be explained later in this document. Additionally, a more advanced ProxyCommand will be required for handling 2 or 3 jumphosts. This is shown in the inventory file.
For platform automation, Ansible uses the ssh
connection plugin that supports OpenSSH.
For network automation, Ansible has various connection plugins but network_cli
is recommended. This plugin allows you to use either paramiko
or libssh
for the transport method. The libssh
transport is preferred as it will be the standard going forward for future releases of Ansible. The libssh
was first introduced with the Ansible collection netcommon
. For more information see new libssh connection plugin for ansible network.
We have therefore defined the collection within our collections/requirements.yml. Additionally we need to tell Ansible where to download the collections by adding options in the setup.cfg.
Now we can download the collections and install the required python library to prepare for using libssh.
# Download required collections
ansible-galaxy collection install -r collections/requirements.yml -f
# Install python library dependency with libssh
pip install ansible-pylibssh
As previously stated, having one or more jumphosts often impacts accessing any API services on that destination infrastructure. For example, if we want to pull dynamic inventory from SolarWinds, Service Now, or others.
We can handle this in a similar manner as we did above with server/network automation endpoints. However, slightly different approach. Instead of focusing on formulating the ansible_ssh_common_args
variable to perform the jumps, we need to modify an existing dynamic inventory script and adapt it to support multiple jumps.
In a recent case with a customer, we needed to pull dynamic inventory from SolarWinds, which was not directly available. We had an existing python-based dynamic inventory script that worked well but required a direct connection.
The following changes were applied to the original python script. These changes can be easily adapted to any inventory script by understanding the techniques and applying them to your own situation.
-
The jumpssh python module provides the ability to perform one or more jumps and prepare a requests connection from a specific jumphost
-
Code was added to allow import of
jumpssh
and create arequests
session -
To allow flexibility in the inventory script, code was added to pull jumphost connectivity information using environment variables
-
Code is flexible and only prepares as many jumps as defined by the environment variables; if you only define 1 jump, then only 1 jump is configured.
-
Environment variables are injected either from commandline or using Ansible Tower custom Credential Type
-
Add the
jumpssh
python module to your Ansible virtual environment or existing Ansible Tower virtual environmentpip install jumpssh
It is important to understand that the python code establishes a direct session with the target system over one or more jumps. However Ansible still is executing the python locally - it is not performed on the jumphost.
The following information covers the overarching solution. In our specific case we wanted the following:
- Pull dynamic inventory data from SolarWinds
- Target the network devices using
network_cli
andlibssh
To support those requirements, the following actions were done.
- Create
Credential Type
within Ansible Tower using the following inputs and injectors. - Create
Credential
within Ansible Tower based on the new Credential Type- Set all necessary jumphost fields
- If only 1 jumphost, then only set fields pertaining to jumphost 1
- Set all fields for SolarWinds
- Add the
jumpssh
python module to your Ansible Tower virtual environment - Create new
Inventory Script
within Ansible Tower- Paste the Python inventory script
Now that we have created all the necessary objects, we can create the Inventory within Ansible Tower.
- Create
Inventory
within Ansible Tower.- Set the
Variables
field with the properansible_ssh_common_args
value (depending on how many jumphosts you have). See above for explanation.
- Set the
- Create
Inventory Source
within the same Inventory object-
Select
Custom Script
for Source -
Select the correct Ansible Environment that contains the
jumpssh
module -
Set the
Credential
field to the Credential created earlier -
Set
Custom Inventory Script
to the new inventory script that was created earlier -
Enable both
Overwrite
andOverwrite Variables
options -
(Optional) Enable the
Update on Launch
option to force inventory sync whenever related Job Template is launched -
Set the
Environment Variables
field with any necessary SolarWinds settings. Review all settings from env.sh. For example:--- SW_HOSTNAME_FIELD: SysName SW_CATEGORY_FIELD: MachineType SW_QUERY: "SELECT SysName, DNS, IP, MachineType FROM Orion.Nodes" SW_HOSTVAR_FIELDS: "SysName,DNS,MachineType"
-
Save the inventory source
-
- Synchronize the Inventory Source to pull data back from SolarWinds
- Review the data returned and customize the environment variables as needed to get expected results
The following was used to develop and test functionality for multiple jump hosts.
In order to test and demo this functionality vagrant
was used to spin up the jumphosts.
To test network automation using Ansible network_cli
connection type, run the network device emulator fake-switches
on one of your jump hosts. Depending on your situation, this may be the third jumphost or second, etc.
# Start machines
vagrant up
In order to fully develop and test functionality against a network device, the emulator fake-switches was used as a dummy network device that could accept basic network switch commands.
To run fake-switches
from one of the jumphosts you will need to provision the server using the custom shell script.
# Start machines
vagrant up
# Login to final jumphost server
vagrant ssh jh3
# Provision server to install fake-switches
chmod +x python3.sh
./python3.sh
# Start network device emulator
fake-switches --listen-host localhost --listen-port 3080 --hostname switch.example.com
Start a new terminal window, connect to the machine and test you can ssh to the network device service.
vagrant ssh jh3
ssh root@switch.example.com -p 3080 -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null
Additionally, SolarWinds was not available at the time so a REST API emulator was created that handles logins, and returns inventory structure similar to what would be expected from SolarWinds. It uses flask
to rapidly develop a web service that accepts requests.
It can easily be adapted for other systems or services.
To run the emulator, login to the final jump host server and run the python script.
# Login to final jumphost server
vagrant ssh jh3
# Provision server to install python if not done already
chmod +x python3.sh
./python3.sh
# Start REST API emulator
python restapi.py
-
The
jumpssh
python module usesparamiko
for ssh connections and I hit this error initially which happens when your ssh private key file is not inpem
format. The solution was to convert my private key file topem
format using this command:ssh-keygen -f my-rsa-key -m pem -p
-
It helps a lot to use this online YAML validator to ensure the complex jump host string is valid before using it in Ansible Tower.
-
https://stackoverflow.com/questions/49701471/ansible-cisco-ios-command-module-unable-to-set-terminal-parameters Ansible network modules require the ability to run some
terminal
commands. Ensure your network device with your credential actually supports these commands by using a direct PuTTy session to login and manually run them. It could be that your credential does not have permission to run these commands. -
The formatting for the
ansible_ssh_common_args
variable is different when used in Ansible Tower versus on the command line! Be careful when formulating your own string. -
This error shows up sometimes and it's difficult to determine the root cause: Error reading SSH protocol banner. One possible cause of this error is an incorrect or malformed SSH private key file. Convert your key files to the right format using the following example command:
ssh-copy-id -f -o 'IdentityFile ~/.ssh/vagrant_rsa' -i ./key-jh1.pub vagrant@jh1.example.com
-
Debug Ansible Tower issues by disabling cleanup of temporary execution environments. This will allow you to see what Ansible Tower is generating locally. Follow the steps below.
# Add this line to Ansible Tower configuration to disable cleanup
vi /etc/tower/conf.d/postgres.py
AWX_CLEANUP_PATHS = False
# Restart Ansible Tower to load new configuration
ansible-tower-service restart
# Debug existing Job Template
- Run Job Template in Tower
- The Job Template output window will state the /tmp folder created for this job
- Login to Tower server
- Sudo to `awx` user: `sudo su - awx`
- Navigate to the /tmp folder: `cd /tmp/awx_250_0z8b10uf/`
- List files: `ls -l`
- Examine the environment variables: `cat env/envvars`
- Determine the tmp files used for the private keys, for example: `"JH1_SSH_PRIVATE_KEY": "/tmp/awx_250_0z8b10uf/tmpm7dahf7w"`
- Test connectivity using this private key to first jumphost: `ssh -i /tmp/awx_250_0z8b10uf/tmpm7dahf7w -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null vagrant@jh1.example.com`. In my case I got:
```
Warning: Permanently added 'jh1.example.com,192.168.34.10' (RSA) to the list of known hosts.
Load key "/tmp/awx_250_0z8b10uf/tmpm7dahf7w": invalid format
vagrant@jh1.example.com: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
```
- Edit the private key and retest until it works
- IBM Multiple Jumphost in Ansible Tower https://developer.ibm.com/recipes/tutorials/multiple-jumphosts-in-ansible-tower-part-1/ https://github.com/thinkahead/DeveloperRecipes/tree/master/Jumphosts
- Deep dive with network connection plugins - AnsibleFest 2019 https://www.ansible.com/hubfs//AnsibleFest%20ATL%20Slide%20Decks/Deep%20dive%20with%20network%20connection%20plugins%20-%20AnsibleFest%202019.pdf
- Fake Switches Python Tool https://github.com/internap/fake-switches
- New LibSSH Connection Plugin for Ansible Network https://www.ansible.com/blog/new-libssh-connection-plugin-for-ansible-network
- Ansible Connection Plugins from Netcommon Collection https://github.com/ansible-collections/ansible.netcommon/tree/main/plugins/connection
- Ansible 2.9 Network Platform Options https://docs.ansible.com/ansible/2.9/network/user_guide/platform_index.html#settings-by-platform