New RA: sshtunnel #113

wants to merge 7 commits into


None yet

2 participants


This new resource agent adds functionality to automatically create ssh connections (tunnels)
between all configured pacemaker nodes and/or other ssh servers.
By using this RA, the admin could benefit from using transparent ssh, connection multiplexing and
other advantages of ssh prestablished connections.

I rewrite some code as fghass asked for.

The basics of this RA:
It get information about how many and the name of every node in the pacemaker cluster.
Then it creates a SSH connection to every node, except the local node executing the RA.

If you deploy this as a clone, you will get all nodes connected via SSH. If you previously configured your ssh clients and servers properly (key exchange, enable multiplexing, etc..) all tools and services running in the cluster that are based on SSH will benefit from this previously stablished connection (tunnel): those services maybe rsync, recurrent scp, remote commands, and others.
Also, it can connect to other external ssh servers (none of the cluster nodes) so it's easy to deploy vpn-like ssh tunnels and have it in HA.
Both options at the same time, or just one of two, (pacemaker nodes interconnect and external ssh servers connection) is possible as well.

There is no concrete problem to solve but (from my point of view) a nice, interesting and optional functionality.

Arturo Borre... and others added some commits Jul 6, 2012
Arturo Borrero Gonzalez High: sshtunnel: new resource agent
This new resource agent adds functionality to automatically create ssh connections (tunnels)
between all configured pacemaker nodes and/or other ssh servers.
By using this RA, the adming could benefit from using transparent ssh, connection multiplexing and
other benefits from ssh prestablished connections.
@aborrero aborrero Reinplementing several things I don't like. 8c07eea
Arturo Borrero Gonzalez sshtunnel RA now fully tested an operational cfbd1c7
Arturo Borrero Gonzalez Fixed typo 4b6d4b7
Arturo Borrero Gonzalez fixed another typo 93bc2fa
Arturo Borrero Gonzalez Fixed another error in naming convention inside the RA d27f56a
Arturo Borrero Gonzalez Tiny fixes c901caf

There is an issue I don't know how to solve. I need to trigger some actions when a node failback or failover, and I don't find the way to do it..

The case:

You have nodes: A, B and C.
A is up and running with sshtunnel
B is up and running with sshtunnel
C is offline.

A and B are connected via ssh. All monitor operations are working well, because how monitor works: obtain the names of nodes running (A,B) and check if there is a ssh connection to them.

Now C comes online (failback). Then, monitor operations fails in either A and B, because monitor obtain the names of nodes running (now A,B,C), but there isn't a ssh connection to node C from A or B.

Using sshtunnel in the other way, where you just choose the ssh server to connect doesn't cause any problem.

Could anyone help? Any idea?


If the RA is run as a clone, I think that instances get notifications if another instance starts or stops. Not sure about that though.

Otherwise, it seems like the only way to establish tunnel to new node(s): monitor fails, the resource is stopped, then started.


Ok, so I think we need some kind of additional notification to be sent to Resource Agents in some situations, like a node failover/failback, operations fails or whatever.

I willsend a suggestion/bug report for that here Engine so sshtunnel and others RAs could benefit.

The RA is not intended to run always as a clone.


Sort of difficult to do the code review now as there were a few large patches in between. I'll write remarks here. Please do a new pull request afterwards.

  • sshtunnel_validate_all is invoked twice, not necessary
  • sshtunnel_validate_all will sometimes fail also for the stop op, but that's usually wrong (see ocf-rarun for a correct way to handle this)
  • line 27: need to use $HA_VARRUN/
  • the pid_dir parameter is not needed (just use HA_VARRUN)
  • crm_node_binary also (doubt that this will ever be used without pacemaker)
  • [style] lines 30-38: use ': ${...}' to set defaults
  • setting IFS not really necessary, it will by default split string on whitespace
  • [style] lines 57-59: one could do test_pidfile $OCF_RESKEY_pid_dir/sshtunnel_${node}.pid and handle all details there
  • line 69: best to just use the monitor function for test; and do that in a loop, i.e. start all tunnels, then do sth like
    while sleep 1; do sshtunnel_monitor && break; done
  • line 91: it's safe to expect empty string output if there are no nodes defined or found (the test looks very confusing); you can just "for node in $( get_server_list ); do .... done"; actually, if no server is defined/found, the RA should probably say "not configured" or "not installed" (in validate)
  • line 111: perhaps better to log all tunnels that don't work (remove break)
  • lines 138-141, 146: overkill
  • line 142: there is ocf_is_decimal()
  • line 270: normally shouldn't ever happen that a node name is empty
  • line 277: why -w? shouldn't it be -e or similar
  • lines 296-323: code duplication; and, after all, is this necessary; this RA has absolute control over PID files; if there's a tunnel, there should be a pid file and vice verse
  • line 337: no need to run in background, kill returns immediately
  • line 367: echo superfluous (it also makes the code wrong)
  • line 369: echo (almost) always exits with success
  • line 379: eval unnecessary (and dangerous); you can remove obtain_nodes completely, and just do crm_node -p
  • line 393: better use pgrep(1)
  • lines 394, 396: superfluous

Ok, i'm starting now to work. Maybe it takes me several months to check all.

Thanks for your review.


Any news here?

aborrero commented Jan 2, 2015

I'm no longer using this RA, you can safely drop the pull request.


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment