Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stackable connection plugins #25

Open
bcoca opened this Issue Jul 23, 2016 · 17 comments

Comments

Projects
None yet
5 participants
@bcoca
Copy link
Member

bcoca commented Jul 23, 2016

Proposal: stackable connection plugins

Author: Brian Coca @bcoca

Date: 2016/07/33

  • Status: New
  • Proposal type: core design
  • Estimated time to implement: weeks

Motivation

A single 'connection' does not always get us to where we want to. This is NOT an attempt to solve ssh proxying, but a more complex ssh + chroot or ssh + docker to access hosts.

Problems

  • Having container hosts with containers and not being able to make the management daemon (if it exists) respond to the network nor running ssh on the containers themselves.
  • A container host that cannot have installed ansible itself (atomic/coreos/etc), but it WILL need any dependency of the stackable plugin.

Solution proposal

  • being able to mark existing connection plugins (or a new type of plugin) to be 'shippable' with ansiballz and execute the module within it's context.
- user: name=insidechroot state=present
  connection: ssh+chroot
  • these plugins should have minimal dependencies and they should be provided on the target 'intermediate' hosts.

Dependencies

  • fine grained connection variable passing: currently we pass all vars and this can be huge, only connection vars and 'connection plugin specific' vars should be passed (ansible_winrm_stuff for winrm plugin).
  • ansiballz being able to execute the plugin and wrap the module inside.

Problems

  • action plugins cannot easily be made to work with this system, so copy/template/etc will not work or at best be very limited, fragile and error prone.
  • no support for duplicate transports (ssh + ssh), a plugin can only be used once in chain (hence no ssh proxy support, that is already available through proxycommand).
  • error handling becomes fragile the more plugins involved (ssh + docker + chroot), making debugging all that much harder.
  • possible complex interactions with proposed auth plugins

Documentation

Plugins should clearly state target dependencies and required 'connection data'.
The 'location' of the dependencies has to be made very clear in the general docs.

@mscherer

This comment has been minimized.

Copy link
Contributor

mscherer commented Jul 23, 2016

Well, you could make ssh+paramiko work, thus solving the proxy issue :)

@Infiniverse

This comment has been minimized.

Copy link

Infiniverse commented Mar 20, 2017

Hey @bcoca, has there been any more thoughts on this since it was proposed (~ 1 yr ago)? It appears that there are several attempts by people to solve this, but not generically as proposed.

We've got lots of FreeBSD jails (actually iocages) running remotely, which we would like to manage - but they need an ssh connection to the box first.

@bcoca

This comment has been minimized.

Copy link
Member Author

bcoca commented Mar 20, 2017

@Infiniverse sadly there doesn't seem a good way around the problems above, mostly for action plugins like 'copy' or 'template', direct actions on the target itself are much easier to deal with.

@Infiniverse

This comment has been minimized.

Copy link

Infiniverse commented Mar 20, 2017

@bcoca It seems a bit strange, because in principle it ought to be really easy. The connector interface appears simple enough, what it appears to be missing is the ability to run a connector at the far end of the ssh connection - i.e. use the ssh connector to copy the Jail (for instance) connector to the far end and then connect them together. Without this ability it's never going to work properly.

What's ansiballz? (Ah, ok, just read up on it.)

What's the challenge in packining up, sending and executing a connector plugin at the remote end of a connection?

@bcoca

This comment has been minimized.

Copy link
Member Author

bcoca commented Mar 21, 2017

in part, as the plugins currently depend a on ansible code itself, they are not designed to be independent as modules are, the other big issues are file transfer and error propagation.

@Infiniverse

This comment has been minimized.

Copy link

Infiniverse commented Mar 21, 2017

Ok, hmm, so until plugins can be transported like modules are there isn't going to be a generic solution to this. How much work would it be to extend the "shippability" to plugins?

Doing a deeper search on this issue, it appears to have been upsetting folk for years and years :) And there doesn't yet appear to be a good consensus as to how to proceed.

@Infiniverse

This comment has been minimized.

Copy link

Infiniverse commented Mar 21, 2017

The other way to proceed is to layer connections on top of ssh, in much the same way that iocage is layered on top of jail. It might break streamlining, but that might be a price worth paying.

With at least chroot and jail, the remote location for files is about specifying a per host prefix so that the play books can work with locations from the perspective of the root of the chroot/jail. Then it's just a matter of getting the execute context setup correctly before executing.

If we teach the ssh connection how to be a bit more general about how it acts then it ought to be able to wrap it inside of the chroot/jail/iocage/etc connectors.

Have you seen any discussion on this approach anywhere?

@bcoca

This comment has been minimized.

Copy link
Member Author

bcoca commented Mar 21, 2017

my thought was adding a property to connection plugins stackable=False|True and redesign them to be shippable, that is not a big issue.

@Infiniverse

This comment has been minimized.

Copy link

Infiniverse commented Mar 21, 2017

@bcoca

This comment has been minimized.

Copy link
Member Author

bcoca commented Mar 21, 2017

this is in limbo until we find a good way around the problems I listed above, pushing the plugin is the easy part, handling action plugins/chained file transfers and error propagation in a consistent and resilient way ... that is not as easy.

@dw

This comment has been minimized.

Copy link

dw commented Apr 27, 2018

There is an initial connection delegation ("stacking") implementation in Mitogen that covers most of the problems mentioned in this proposal, but introduces one or two of its own:

  • It has work-alike lxc, lxd, jail and docker connection types (just enable the extension in ansible.cfg)
  • Error handling and reporting is identical to locally created connections, regardless of nesting depth
  • File transfers never touch the disk of any intermediary, and function independently of nesting
  • The connection implementations exist in a layer completely distinct from the rest of Ansible, and don't need a copy of Ansible preinstalled anywhere, just a Python interpreter.

The workalike connection types require Python to be installed in the target container unlike how (I think) the existing plugins work. For example, the "raw" module wouldn't work to permit installing Python in the container. That's actually avoidable, but it's more work :)

@dagwieers

This comment has been minimized.

Copy link
Member

dagwieers commented Apr 27, 2018

@dw Avoidable by moving to Go ? :-)

@dw

This comment has been minimized.

Copy link

dw commented Apr 27, 2018

It's hairy but it's doable, and kept nicely buried away out of sight :) dw/mitogen#223

@bcoca

This comment has been minimized.

Copy link
Member Author

bcoca commented Apr 27, 2018

@dw when you bypass tools like sftp for file transfer you loose a lot of safeguards, retries and verification. Did you reimplement those? Will it work with non ssh transports (winrm)?

@dw

This comment has been minimized.

Copy link

dw commented Apr 27, 2018

I did a quick review of scp.c to see what functionality is missing. While Ansible prefers sftp, that's 5500 LOC to read rather than 1140 :)

  • 1-160: boilerplate, most interesting part is bandwidth limiting. FileService does this, but it's hard-coded for a pipe that saturates at 112MiB/sec. dw/mitogen#212 would make that configurable but for a different reason. An identical KB/s-style knob could be exposed instead
  • 161-237: redundant subprocess utility functions
  • 238-311: redundant stream/subprocess setup. This stuff is done at a lower layer in Mitogen, file transfer uses primitives that simply deliver messages or raise exceptions
  • 312-362: local-local copies, a special case of something handled generically and without special cases in Mitogen. Code that consumes the library doesn't need care who it exchange messages with unless it really wants to know
  • 363-528: redundant boilerplate, arg parsing.
  • 529-561: selection of transfer direction. FileService is unidirectional until dw/mitogen#213 is done, which would permit selection with ~4 LOC worth of if statement, just like scp here.
  • 562-593: cleanup, bandwidth management
  • 594-609: missing feature! Timestamps aren't preserved. dw/mitogen#226
  • 610-673: URI parsing, more setup.
  • 674-693: special case for streaming copy, didn't know scp supported this. Mitogen handles zero/one/multi hop generically.
  • 694-755: redundant junk
  • 756-814: special case for local-local: call out to /bin/cp.
  • 815-848: missing feature! FileService would attempt to open sockets and FIFOs. dw/mitogen#227
  • 849-876: misc setup, progress bars
  • 877-915: the actual file transfer
  • 916-962: support for recursion. Ansible doesn't expose an API allowing for recursion, easy to add if it did
  • 963-1272: utterly unreadable receive loop, mostly equivalnt to 13 LOC in target.py::_get_file().
  • 1273-1423: misc utility functions.

That's a total of 2 known bigger bugs, and 2 new bugs of about 15 minutes each. In terms of LOC this replaces 1100 lines of memory-unsafe C with 77 lines of easily understood and memory-safe Python server (with 131 comment lines), and 23 lines of client. Finally it's important to note the size of scp.c is due in large part to the requirement for explicit error handling in C. That is done implicitly (and for the most part, perfectly) by Python. While scp.c is an old and extremely well-audited piece of code, I'd be willing to bet a round of drinks that FileService/_get_file() already handles errors that scp fails to catch.

I found ansible/ansible#18643 . The original intention for file transfer was to use Mitogen's fakessh support. It still needs to become battle-ready to support transparent multi-hop synchronize, however the same functionality could also support the option to fall back to scp or sftp just like today, configured using the same ansible.cfg keys as before.

I found https://github.com/ansible/ansible/pull/20187/files . It's not clear if this is handling connection retries or file transfer retries. In either case, the extension has no equivalent retry logic but it could be added easily, and since retry is not an inherent property of scp/sftp, it's doesn't really fit into this comparison.

Searching git log --grep '\bscp\b' --grep '\bsftp\b' doesn't seem to reveal any further interesting bugs that weren't related to coping with Ansible's integration with scp/sftp (e.g. fixing error handling), rather than specific issues being addressed. I'd love more examples if you have any.

For me the real worry with an in-process approach is upper-bound performance, however I was surprised to discover that thanks to SSH single-threaded crypto, the bottleneck (by a factor of ~4x on my laptop) is SSH itself rather than Python.

I have a half-hacked example of something FileService /could/ do that scp will never be able to do: channel bonding. It simply opens duplicate SSH connections and round-robins the queue it decides to send the next chunk on. The result is multi-core crypto, and scp faster than scp, with nothing installed on the target. I'll check it into the examples/ directory sometime next week, but it's a tiny example of the kind of flexibility gained here that was never possible before.

edit: forgot one more, which is timeouts. scp times out when SSH times out if configured. This stuff isn't tested enough yet. That and a bunch more improvements are in dw/mitogen#212

edit: no, it won't work for WinRM, but neither will scp/sftp :)

@bcoca

This comment has been minimized.

Copy link
Member Author

bcoca commented Apr 27, 2018

the reason we prefer sftp is because it implements many of the safeguards i mention, which probably accounts for the difference in code size.

The fileservice limitation is something Openssh has had for a long time, in environments that needed the throughput I've normally just patched it, just not really important since most of the time our playloads are tiny and we should not hit it for internal use, those that are using Ansible to distribute large files or huge number of them ... well, they already had this issue and it is not the focus of this tool. It would be interesting to have a 'fast file copy' that worked over ssh, but that is a tangent at this point.

My question about other connections is not related to scp/sftp specifically but on being able to handle files across the connections, which is the main reason this proposal stalled.

@dw

This comment has been minimized.

Copy link

dw commented Apr 28, 2018

The two smaller bits of scp functionality were added, alongside a safeguard scp lacks: writes proceed to a uniquely named temporary file which can be synced and renamed over the target. It's no longer possible for simultaneous ansible-playbook runs writing the same file to leave a corrupted file, or for the file to become inconsistent if it is being written during a crash.

Simultaneous scp to a file when the contents mismatch can result in any of (depending on when the overlap began):

  1. a corrupt file with holes in it (i.e. a sparse file)
  2. a correct but otherwise truncated file
  3. a corrupt file with data incorporated from both attempts.

And in every case with scp, the partially written file is observable to that file's users during the transfer. I can't begin to fathom how many web site playbooks exist out there where half-uploaded and truncated static assets are served to end users while ansible-playbook is running.

Without knowledge of the specific safeguards sftp has that you're referring to (they don't appear to be spelled out anywhere above), time will be made to review the sftp source and incorporate any additional checks found into FileService.

The opportunity was used to add one more optimization: removal of the final roundtrip for files under 32KB. It's not exposed to Ansible yet (no suitable interface), but small files can be written with 0 roundtrips, assuming a step exists afterwards to gather return values and verify success.

Finally it's worth note that scp.c is absolutely littered with calls to stat() where the return value isn't checked, but control logic proceeds using the uninitialized struct st regardless. These are all failure conditions that Python handles for free that apparently have never been noticed by any scp user.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.