Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Capture the beginning of service output during run #4738

Closed
wants to merge 1 commit into from

Conversation

loomchild
Copy link

Issue

docker-compose up always prints the entire container output, but docker-compose run sometimes skips the beginning of it.

I wasn't able to reproduce this issue directly on local Linux machine, but when running docker-compose in docker container I can reproduce it every time.

Steps to reproduce:

  1. Create the following docker-compose.yml:

     version: '2'
     services:
         main:
             image: debian
             command: ["bash", "-c", "echo -n Running; for i in {1..3}; do echo -n .; sleep 0.1; done; echo"]
    
  2. Create the following Dockerfile:

     FROM python:3 
     RUN pip install docker-compose
     COPY docker-compose.yml /
    
  3. Build the image

     docker build -t loomchild/docker-compose-bug .
    
  4. Pull the debian image to avoid doing it later (you can skip this step if you already have debian)

     docker pull debian
    
  5. Run the container:

     docker run -it -v /var/run/docker.sock:/var/run/docker.sock loomchild/docker-compose-bug bash
    
  6. Execute docker-compose inside the container:

     docker-compose run main
    

Expected output:

Running...

Actual output:

..

In other words everything before first sleep instruction is not displayed. However, when I add an additional sleep instruction at the beginning so the command in docker-compose.yml looks like:

command: ["bash", "-c", "sleep 0.1; echo -n Running; for i in {1..3}; do echo -n .; sleep 0.1; done; echo"]

And rebuild the container and re-execute the test, the output is correct.

The described issue could be related to #3239 / #3267 since the behavior described there seems to be pseudo random and it happens inside a VM.

Analysis

This seems to be caused by the fact that we start the container before initializing pseudoterminal (from cli/main.py, run_one_off_container()):

...
pty = PseudoTerminal(project.client, operation)
sockets = pty.sockets()
service.start_container(container)
pty.start(sockets)
exit_code = container.wait()
... 

I see that this code was introduced in below commit:

commit f3e55568d1a7c111398876bec84da8ed8b42da7f
Author: Aanand Prasad <aanand.prasad@gmail.com>
Date:   Thu Jan 21 18:34:18 2016 +0000

    Fix interactive run with networking

    Make sure we connect the container to all required networks *after*
    starting the container and *before* hijacking the terminal.

However, at this time the start_container() function was different and connected container to its networks after starting the container, as above commit message suggests:

def start_container(self, container):
    container.start()
    self.connect_container_to_networks(container)
    return container

In the current code the order is inverted:

def start_container(self, container):
    self.connect_container_to_networks(container)
    try:
        container.start()
    except APIError as ex:
        raise OperationFailedError("Cannot start service %s: %s" % (self.name, ex.explanation))
    return container

Proposed solution

Perhaps the sequence should be the following (see pull request):

  1. connect container to networks
  2. start pty
  3. start container

I have tested the proposed fix locally and it resolves the original problem and doesn't break any of the automated tests. However, it can potentially reintroduce the aforementioned "interactive run with networking" bug - could you explain how to reproduce it?

Considering this issue occurs semi-randomly and only in specific circumstances, I am not sure how to correctly implement a test for it. However, seeing the extensive testing suite based on Docker, I'd be happy to dig deeper if you think it's useful and give me some hints.

Environment

docker-compose version

docker-compose version 1.12.0, build b31ff33
docker-py version: 2.2.1
CPython version: 3.6.1
OpenSSL version: OpenSSL 1.0.1t  3 May 2016

docker version

Client:
 Version:      17.03.1-ce
 API version:  1.27
 Go version:   go1.7.5
 Git commit:   c6d412e32
 Built:        Fri Mar 24 00:39:57 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.03.1-ce
 API version:  1.27 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   c6d412e32
 Built:        Fri Mar 24 00:39:57 2017
 OS/Arch:      linux/amd64
 Experimental: false

The output of docker info.
docker-compose version

docker info

Containers: 8
 Running: 0
 Paused: 0
 Stopped: 8
Images: 774
Server Version: 17.03.1-ce
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 1298
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 4ab9917febca54791c5f071a9d1f404867857fcc
runc version: 54296cf40ad8143b62dbcaa1d90e520a2136ddfe
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 4.9.0-2-amd64
Operating System: Debian GNU/Linux 9 (stretch)
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 7.508 GiB
Name: stinky
ID: 2WWX:OYMN:5QA4:L3VF:TOW2:2MN6:WCU5:FXY6:P7H4:NPW3:GRWS:5ITZ
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Username: loomchild
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Previously the container was started before pseudoterminal was started,
therefore some of the output could've been lost. Goal of this commit is
to make sure that all the input is printed out. To that end, the run
command executes the following sequence:

1. connect container to networks
2. start pty
3. start container

The original code was introduced in
f3e5556

Signed-off-by: Jarek Lipski <loomchild@loomchild.net>
@@ -1160,8 +1160,9 @@ def remove_container(force=False):
)
pty = PseudoTerminal(project.client, operation)
sockets = pty.sockets()
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could simplify the code as follows:

pty.start(sockets)
service.start_container(container)

but I was worried to reintroduce a bug during interactive run with networking, as described in f3e5556

loomchild added a commit to loomchild/volume-backup that referenced this pull request Apr 14, 2017
@shin-
Copy link

shin- commented Apr 14, 2017

Thank you for the detailed writeup and proposed fix.

For context, Aanand's commit you're quoting is part of PR #2723, which has some tests attached. I want to assume that if those tests aren't breaking, that means the fix is preserved, but obviously it's good to be prudent.

I'll take a closer look and do a real review when I have some time, but at first glance this looks reasonable.

@loomchild
Copy link
Author

Interesting. I tried adding a couple of networks to my docker-compose.yml and when I run the following using patched docker-compose:

docker-compose run main 

ip link show

then all of them are correctly displayed.

However, if I try to apply the simple fix mentioned in my commit comment (invert pty.start(sockets) and service.start_container(container) instructions) instead of the full fix, then only one network is connected. I believe it means that I am able to reproduce the original interactive run with networking issue fixed in f3e5556. What's weird is that when I run the automatic test suite, then all tests are green, including the ones added in a14906f.

@loomchild
Copy link
Author

Hi, did you have some time to look into this? Would you be interested in merging this pull request, perform some more tests or drop it as obsolete?

Please let me know if I can help somehow.

@ulyssessouza
Copy link
Contributor

Hello @loomchild. Thanks for the PR, I could not reproduce the issue anymore.

I'm closing this because the issue looks obsolete.
Please, reopen if that's still happening.

@loomchild
Copy link
Author

@ulyssessouza I also cannot reproduce it anymore, must've been fixed in the meantime. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants