Skip to content

Expose realtime output from 'shell' #3887

Closed
rektide opened this Issue Aug 19, 2013 · 25 comments
@rektide
rektide commented Aug 19, 2013

Would love to see each 'shell' (&other) command creathave an option that creates a pipe for the duration of the process and pipes it's output into the pipe real time.

Doing some long lived load generation and Ansible is crucial, but I don't know how to monitor process, short of adding more layers.

@mpdehaan

This is the "async should return partial output". I'm not sure this would make sense for many operations with a large number of hosts but it it's been discussed...

@fdhenard

+1

@dwt
dwt commented Dec 5, 2013

+1

Please let me discuss the use case I would like to tackle with this.

Right now we are using puppet to deploy our systems. We are currently evaluating the switchover to ansible for at least part of the workload.

Our current process has been that we develop the puppet recipes against a vm and then we run it against a subset of the target hosts only showing what it would change (not doing it yet). This we evaluate and if it looks good we run it a second time actually doing the changes.

Since actually getting this to run is a multi step process, getting it completely automated with ansible was our first goal. This works, but it's hard to impossible to get the complete output of a remote tool (but I'm still investigating if it may be possible using the it's api). More importantly though it only arrives completely after the remote tool has run, which means that we lose the advantage of being able to monitor what it is doing while it is doing stuff which gives a better continuous workflow and keeps one in the zone instead of first concentrating on something else and then trying to come back to the output.

Also important to us, when the tool actually runs if we only get the output after the fact, there is no way to stop it halfway through if we realize that it does something horrible. (We luckily didn't need this ability yet - but defense in depth is better than believing one is always right).

I believe that substituting 'puppet' with any other long running tool generating a lot of output and doing important work and I believe the use case should be valid - even should you believe that we could just substitute puppet completely with ansible to achieve most of what I'm asking of here.

@candlerb

Here is another real-world use case: copying very large files (1TB+) from an NFS server to a target host. A single copy takes upwards of an hour.

While the playbook is running, I want to be sure that it is making progress (not stalled) and to get some feedback on how far this step has progressed.

I can use ansible to run a pipeline on the target host like:

pv /path/to/src | cat >/path/to/dest

which will give progress output to stderr. This could be captured into a buffer and passed back periodically to the client. I want to drive this from the API, so I'd also like to have periodic callbacks when stderr is being written to. Using callbacks also relieves ansible from having to maintain an infinite buffer.

Aside: this stderr information could also be shown in the client tools with enough -v options; it would make debugging hung tasks far easier.

Some apps might want to write periodic JSON or YAML docs to stderr, so being able to batch up to line breaks could be useful.

All this applies to "shell" and similar modules. At a more general level, it would be good for all modules to have a way to report their progress. Then for example the 'copy' module could be enhanced to report progress directly, without having to mess with external programs and capturing stderr.

Finally, one more use case: running "python-vmbuilder" on a target host to build a VM image. This can take 3-20 minutes, and it can generate copious progress messages to stderr with the --debug flag, but at the moment this is all hidden by ansible until it completes.

@vdboor
vdboor commented Jan 2, 2014

+1 I'd also like to see the output of pip install -r requirements.txt, apt upgrade=yes or a database import, which both take some time. :-)

@wojtekhaj

+1 could use something to show overall progress of execution of the playbook based on amount of tasks specified by "- name: " block especially this may be useful doing an software deployment playbook for remote machines with the use of yum repositories. As well adding some output during installation/download/update of yum packages could be handy as sometimes especially with remote repo machines with slow connections the whole playbook freezes and there is no way of knowing if anything is running under ansible or something is wrong.

Had to create my own very crude counter and progress display with variable juggling between ansible and shell but does the job (downside is that it floods the screen with a lot of unnecessary output - would be nice as well to have some way to block the task output through indicating some module like "report: yes/no" under the task name)

Thanks and keep up a good work.

@jirutka
jirutka commented Jan 24, 2014

+1

@jacobweber

+1

@cliffano
cliffano commented Feb 6, 2014

+1
I'm using Ansible to execute some long-running tasks (~5 minutes) remotely, having real time output would help tremendously.

@rameshraithatha

+1
It would really help to see in real time how your commands are behaving.

@lorin lorin added a commit to lorin/ansible that referenced this issue Feb 16, 2014
@lorin lorin Add support for realtime output
This patch adds a new "show_output" option that can be enabled on
individual tasks, plays, or globally via a config option.

When enabled, whenever a running module calls AnsibleModule.run_command,
the output will appear in the user's console as the task is running.

Implements #3887
eb34890
@mpdehaan

So the problem behind this is with a very large number of hosts (--forks 150) this will cease to make sense, and should basically involve an out-of-band (non-SSH) backchannel to the host and callbacks to plumb them through, with instrumenting various modules to have something like a module.intermediate_status.

Architectually, I'm not entirely positive this makes sense and in many cases if someone wants a "ping" checkpoint, async polling may suffice.

Hanging commands are usually hanging due to lack of standard input, so this may be better resolved by feeding in some explicit /dev/nulls and talking about what those commands in question actually are.

@candlerb

Hanging stdin is only a small part of the problem set. When I'm running a task which copies 1TB from A to B, I want to see some progress indication; it may still be running but far slower than expected, and I want to see this while it runs.

Making this work with 150 forks is a matter of user-interface design, but it's not limited to the CLI.

One class of user is the API user. They want access to the stdout/stderr streams and can present it however they like. They may write the output into a database, for example, for browsing.

A second class of user is the CLI user. The CLI is just one front-end to the API. Various options are available: the CLI could by default just show the output from the first host. Or it could show output from one host selected by the user, or all of them intermingled (yes that could be confusing), or the last host to generate output after 5 seconds of inactivity.

When someone writes a GUI to sit on top of the API then more options are available - e.g. multiple windows, or a drop-down to select one host.

@jirutka
jirutka commented Mar 14, 2014

I have the same opinion as @candlerb.

@mahemoff

150 hosts is focusing on a high-scale production use case, whereas a lot of the need for this comes while developing the project in the first place. As well as learning Ansible for that matter. Both of those are common scenarios and not edge cases.

As for the production scenario, I agree with @candlerb. There are plenty of ways output could be provided but limited.

@mahemoff

The hanging command problem can happen by accident, e.g. maybe there's some lock file present which caused the command to confirm something, but maybe it's a server timeout outside of my control, e.g. GitHub is down for a few minutes. Without output, I have no way to tell the cause. So it might take several tries before manually ssh'ing and discovering there's a problem with the Ansible config.

@lorin
Ansible member
lorin commented Mar 14, 2014

The underlying problem is that there is no way to debug an Ansible task that is unexpectedly "stuck". Unless a proposed solution provides a user with a way to determine why an Ansible task is currently stuck, it isn't going to address the issue.

@mahemoff

@lorin An example would be seeing "There is a lock file present. Do you wish to continue?". That doesn't address all hanging situations, but it does provide real assistance to some. (While we'd hope modules would prevent interactive prompts, there's no guarantee and it can certainly arise with command/shell modules.)

To be clear, hanging is just one of several reasons why output makes sense imo.

@dwt
dwt commented Mar 18, 2014

I would say that for now, this feature seems to make the most sense to me on the API level. That means, that if I talk to ansible on the API level, then I can request for the execution of one command to get live-output delivered to me via a callback.

That means that in the standard case this feature is neither enabled, nor does it generate network traffic. But it still allows to build custom clients with very little code that allow monitoring specific activities in a playbook.

Then additional steps can be taken to think about allowing keywords to remote executions or debugging flags to make this ability available from the command line client.

This allows a first round of experimentation with this feature to see how and if it is best used to solve real problems (also for larger deployments) before we have to decide on a user interface through the ansible yaml or the command line.

None the less, for the command line I can easily imagine a debug switch that shows me output from one host or a small set of hosts in real time instead of as summaries to be very useful.

@mpdehaan mpdehaan added the P3 label Mar 19, 2014
@mpdehaan

So ignorning the issue of display, there's still the question of how information would need to be posted back.

At the most basic level this would require a few things:

  • a server to accept information out of band
  • a configuration option to indicate to the modules this server was enabled
  • instrumentation of that server to do something with the data
  • etc

This orginal ticket is about the shell, and in many cases, modules aren't going to be about the shell.

For instance, if we are waiting on an EC2 instance, we're waiting on boto to return, etc.

Any solution that just slurps in CLI output is incomplete (as that's for the JSON stream) and this is really about having yet-another-output-stream for modules to send, and something to recieve it.

I can't really see this happening anytime time soon.

The question of hanging modules is usually one of something going interactive, and I think it's a pragmatically difficult question to anticipate any command waiting for input. The best thing that can be done, I think is to identifify what the problem causes are on a case by case basis and
feed them /dev/null as input.

If those are modules other than shell, this is quite easy to take additions. If this is the shell module, that's more involved and is more of a documentation issue.

Most of the Linux issues with services were dealt with by an extra layer of daemonization in the service module.

Regardless, GitHub is an issue tracker, and not a forum -- and nothing's really happening here. As such, I'm going to close this ticket -- there are really compound ideas here - (A) the backchannel for intermediate status -- I don't see this happening anytime soon (SORRY), (B) the idea of isolating problem cases where commands go interactive in the shell module and need to be fed /dev/null as input, (C) any possible cases of other modules getting into hang scenarios -- which absolutely should be opened up as a seperate bug on a case by case basis.

I'm unwilling to accept a solution that assumes a single thread of output from one machine and doesn't scale at least well into the 500+ simultaneous system range -- and yes, for stock Ansible, the "no servers required" piece tends to imply we'd rather not have the complexity of requiring an event consumer for the out of bound traffic.

That being said, it could fork off one -- it COULD ... but then there's the question of security of that channel.

In the spirit of keeping things simple and my view into the future of where this is going I'm going to close this ticket -- I don't see this happening -- but we should definitely address any hang issues that occur in non command/shell modules on a case by case basis. Those are all seperate bugs, wherever they lie.

I also understand what is written above about progress notification, though there are ways to deal with this. I'd hope Ansible isn't doing a 150 TB copy, for one.... but if it did, starting it as a background operation and then using a "until" loop to completion may help with output considerations.

Further discussion of this would be welcomed on ansible-devel, though keep in mind the no-server requirements, security requirements, and need for output to scale to a large number of systems.

A more specifically focused thing for modules that wish to report some degree of completion might be interesting, but I suspect very well will be able to report that kind of detail.

@mpdehaan mpdehaan closed this Mar 29, 2014
@candlerb

for stock Ansible, the "no servers required" piece tends to imply we'd rather not have the complexity of requiring an event consumer for the out of bound traffic

I don't see any need for a side channel or a separate service to receive the progress information; I was envisioning using the same channel.

At the moment, ansible just sits there waiting for the far side to complete, and return a blob of JSON. I thought that instead the far side could send a stream of results - zero or more intermediate results, followed by a final result. This could just be a stream of JSON objects one after another, with the last one being tagged as final.

In the case of a shell module the intermediate result could include chunks of stdout and stderr. In the case of a copy module the intermediate result could include a count of the bytes copied so far. And so on.

The idea of an XML "stream" is common (e.g. XMPP), but generally involves a single open document which grows incrementally and needs a stream parser to read. For JSON what I've seen done elsewhere[^1] is simply to send multiple JSON objects, one after the other, each one terminated by newline. I'm not aware of that being any sort of standard, but it's extremely simple. If you replace newlines within strings with the escape sequence \n (which is valid JSON) then you can simply use gets() to retrieve each object.

[^1]: I'm thinking of the couchdb changes API with feed=continuous. See http://guide.couchdb.org/draft/notifications.html#continuous

@teamon
teamon commented Mar 29, 2014

JSON stream is used quite common - for example https://github.com/brianmario/yajl-ruby has support for it ootb and the example use case is consuming twitter stream API.

@ktosiek
ktosiek commented Aug 3, 2014

Hint to all the +1'ners: this is a closed issue, it won't get a lot of attention from devs. And with all the +1s, it's getting hard to read.
Try to hit the mailing list if you have any new ideas about how this should work.

@vdboor
vdboor commented Sep 2, 2014

I'm not sure how this can be done, but I was wondering whether something like "MIME-multipart" like output would work..

==== BEGIN SHELL OUTPUT ====
<all stdout streaming>
==== BEGIN JSON DATA ====
<the final JSON response>

And perhaps something like this in tasks:

stream_output: true
store_output: {{ host }}.log

I'll leave that up to the real smart people on the ML to discuss all the +1 and -1's on such idea.
This shell output thing is obviously for single hosts, taking all scenario's in mind.

@jasonfharris

+1

It would be really nice to have progress bars at various levels. When running a playbook and you are not debugging it, it would be very nice to be able to instead of getting screenfuls of output to instead have a progress bar, much like git or mercurial have. And then just send the details to some log file. Eg so see a video of the kind of progress indicator I am talking about see: https://github.com/noamraph/tqdm

On the other extreme it would be nice to have progress indicators / feedback mechanisms for the really long running tasks to figure out if they have hung or not. I don't know any of the details of the internals of ansible or how to make this work, but from the bug report this is obviously a very highly desired feature which people want...

@mpdehaan mpdehaan locked and limited conversation to collaborators Sep 5, 2014
@mpdehaan
mpdehaan commented Sep 5, 2014

Hi everyone,

I've closed this ticket as I think many people understand what is going on here and why we have acted on it.

These reasons have been enumerated previously and include:

1) Ansible contains 235+ modules and the APIs used in basically all of these modules don't have any capacity for reporting status in the middle of an async operation

2) To do this properly for N-node management it implies setting up a server and a crypto layer that eliminates much of the architectural elegance of ansible

3) If you just need to know if a module is still running, async is available - http://docs.ansible.com/playbooks_async.html

4) The CLI has no good way to output standard out changes when running in parallel against several hundred hosts

5) The performance implications of doing this over an additional SSH channel or other mechanism are decidedly non-trivial

It's not that we are denying anyone this out of spite, it's actually not a good fit for the system.

We strongly encourage usage of async to show long running tasks are alive. Consider having custom modules, if any, log their actions, to diagnose problems should they occur.

Ansible does in fact report in when each host comes back in a large set of hosts, so that progress is definitely available.

Thanks for your understanding.

So that comments are easy to read I am also removing the various (and numerous) +1s on this file. I appreciate the feedback, but voting cannot make it so.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Something went wrong with that request. Please try again.