Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

investigate terminal resize issues, and (if possible) set size on creation #3554

Closed
thaJeztah opened this issue Apr 12, 2022 · 17 comments
Closed
Labels

Comments

@thaJeztah
Copy link
Member

relates to:

The TestInitTtySizeErrors test is failing frequently (see #3441), and it looks like it's not just a flaky test, but a bug / regression in the CLI:

--- FAIL: TestInitTtySizeErrors (0.16s)
    tty_test.go:29: assertion failed: 
        --- expectedError
        +++ →
        @@ -1,2 +1 @@
        -failed to resize tty, using default size

We had issues in this area before;

Previous "fixes" implemented a retry loop to try resizing the terminal if it didn't work the first time (there's some race between the container being created and it being receptacle for resizing), so the existing code continues to be a "best effort";

if (config.AttachStdin || config.AttachStdout || config.AttachStderr) && config.Tty && dockerCli.Out().IsTerminal() {
if err := MonitorTtySize(ctx, dockerCli, createResponse.ID, false); err != nil {
fmt.Fprintln(stderr, "Error monitoring TTY size:", err)
}
}

Reproduction steps

The issue can be reproduced with this script (which just starts a lot of containers in a loop); I haven't counted the success/failure ratio, but it's failing very frequently:

#!/bin/bash

A=0
while true; do
    for a in $(seq 1 15); do
        com.docker.cli run --rm -t alpine echo $A/$a &
    done

    wait
    date
    A=$((A+1))
    echo $A
done

Coincidentally, when using a Windows machine, the CLI sets the initial size when creating the container.

// Telling the Windows daemon the initial size of the tty during start makes
// a far better user experience rather than relying on subsequent resizes
// to cause things to catch up.
if runtime.GOOS == "windows" {
hostConfig.ConsoleSize[0], hostConfig.ConsoleSize[1] = dockerCli.Out().GetTtySize()
}

The code above looks wrong though;

  • it checks for the operating system on the CLI side, but does not take into account that the daemon could be either a Linux or a Windows daemon
  • which could be either an "omission" (initial size supported on both Linux and Windows daemons)
  • or (if Linux daemons do not support this) result in Linux container shells never getting the right size when started from a Windows CLI.
  • or "redundant" (initial size already connect, but after this, doing a resize to the same size again)

What's needed?

  • Investigate what's missing (if anything) to set the correct terminal size when running (docker run), attaching (docker attach) and exec-ing (docker exec) a container
  • If setting the initial size is not supported on a Linux daemon, why not?
  • ISTR containerd did not support setting the initial size in all of these cases (is this just something that's not "punched through" in the containerd API, or is more needed?)

If it's possible to set the initial size, we should prefer that over "resizing", although resizing would still be needed if the terminal from which the container was started is resized. Of course, we also need to take into account whether the daemon (and containerd) supports setting the initial size (which may depend on the API version).

@rumpl
Copy link
Member

rumpl commented Apr 27, 2022

Just a quick comment, if I change the amount of sleep between two resizes to 200ms then the error goes away completely for me, this is obviously not ideal and makes me think that this is more an error on the daemon side, I'll continue looking.

@thaJeztah
Copy link
Member Author

thaJeztah commented Apr 28, 2022

200ms seems quite long to wait between tries; so seems to indicate something has become "slower" 🤔 was the 200ms just a "let's try something bigger", or did you try incrementing in smaller steps? (perhaps we can do a quick try to see where the "it mostly works" threshold is).

For a short term "quick fix", we could increase the timeout (although I hope if doesn't have to be 200ms!!!); of course the ideal solution is to (as described above) look if we can set the correct size when creating (so that we don't have to do the resize loop); of course if we do, that likely would have to be done only on "newer" APIs only (would have to check); and of course if dockerd and containerd actually support it (and if not "why not?")

@rumpl
Copy link
Member

rumpl commented Apr 28, 2022

We could either do 200ms, or 100ms but retry 10 times, it looks like 1 second is the lowest we should go, see this comment by @cpuguy83

I did a quick check and it seems that you can set the TTY when you create a container, here is the test:

containerd code to set TTY size on creation
package main

import (
	"context"
	"errors"
	"fmt"
	"log"
	"syscall"
	"time"

	"github.com/containerd/console"
	"github.com/containerd/containerd"
	"github.com/containerd/containerd/cio"
	"github.com/containerd/containerd/namespaces"
	"github.com/containerd/containerd/oci"
)

func main() {
	if err := redisExample(); err != nil {
		log.Fatal(err)
	}
}

func redisExample() error {
	// create a new client connected to the default socket path for containerd
	client, err := containerd.New("/run/containerd/containerd.sock")
	if err != nil {
		return err
	}
	defer client.Close()

	// create a new context with an "example" namespace
	ctx := namespaces.WithNamespace(context.Background(), "example")

	// pull the redis image from DockerHub
	image, err := client.Pull(ctx, "docker.io/library/busybox:latest", containerd.WithPullUnpack)
	if err != nil {
		return err
	}

	// create a container
	container, err := client.NewContainer(
		ctx,
		"redis-server",
		containerd.WithImage(image),
		containerd.WithNewSnapshot("redis-server-snapshot", image),
		// Setting a small size terminal to make sure this is taken into account, to test run "ls", it should wrap
		// its output to fit into a small window
		containerd.WithNewSpec(oci.Compose(oci.WithImageConfig(image), oci.WithTTY, oci.WithTTYSize(30, 30))),
	)

	if err != nil {
		return err
	}
	defer container.Delete(ctx, containerd.WithSnapshotCleanup)

	var con console.Console
	con = console.Current()
	defer con.Reset()
	if err := con.SetRaw(); err != nil {
		return fmt.Errorf("failed to SetRaw,err is: %w", err)
	}
	if con == nil {
		return errors.New("got nil con with flagT=true")
	}
	// create a task from the container
	task, err := container.NewTask(ctx, cio.NewCreator(cio.WithStreams(con, con, nil), cio.WithTerminal))
	if err != nil {
		return err
	}
	defer task.Delete(ctx)

	// make sure we wait before calling start
	exitStatusC, err := task.Wait(ctx)
	if err != nil {
		fmt.Println(err)
	}

	// call start on the task to execute the redis server
	if err := task.Start(ctx); err != nil {
		return err
	}

	// sleep for a lil bit to see the logs
	time.Sleep(30 * time.Second)

	// kill the process and get the exit status
	if err := task.Kill(ctx, syscall.SIGTERM); err != nil {
		return err
	}

	// wait for the process to fully exit and print out the exit status

	status := <-exitStatusC
	code, _, err := status.Result()
	if err != nil {
		return err
	}
	fmt.Printf("redis-server exited with status: %d\n", code)

	return nil
}

Running this code doesn't fail and the terminal size is right:

containerdtest sudo ./containerdtest
/ # ls
bin   home  run   usr
dev   proc  sys   var
etc   root  tmp

@thaJeztah
Copy link
Member Author

hm.... rendering bug on GitHub?? The > triangle widget no longer renders on collapsed things (😱); see

Screenshot 2022-04-28 at 14 12 26

@rumpl
Copy link
Member

rumpl commented Apr 28, 2022

That's odd, works on my machine :D

@thaJeztah
Copy link
Member Author

you on Safari, or on Chrome? (looks indeed to still work on Chrome 🤔)

@rumpl
Copy link
Member

rumpl commented Apr 28, 2022

I'm on Firefox and yeah, it works well

rumpl added a commit to rumpl/cli that referenced this issue Apr 28, 2022
I some cases, for example if there is a heavy load, the initialization of the TTY size
would fail. This change makes the cli retry more times, 10 instead of 5 and we wait 100ms
between two calls to resize the TTY.

Relates to docker#3554

Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>
rumpl added a commit to rumpl/cli that referenced this issue Apr 28, 2022
I some cases, for example if there is a heavy load, the initialization of the TTY size
would fail. This change makes the cli retry more times, 10 instead of 5 and we wait 100ms
between two calls to resize the TTY.

Relates to docker#3554

Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>
rumpl added a commit to rumpl/cli that referenced this issue Apr 28, 2022
I some cases, for example if there is a heavy load, the initialization of the TTY size
would fail. This change makes the cli retry more times, 10 instead of 5 and we wait 100ms
between two calls to resize the TTY.

Relates to docker#3554

Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>
@thaJeztah
Copy link
Member Author

Arf; didn't post 😂

Back to "on topic"; with your test, could you try if it picks up the size immediately? (I recall the issue when not resizing is that everything "seems" ok, only to discover the size is limited to the default 80-chars, and output getting truncated.

For example;

On my host;

tput cols && tput lines
204
37

And in a container:

docker run -it --rm ubuntu sh -c 'tput cols; tput lines'
80
24

(this is with the current release, so the initial size is still 80x24, but then gets resized if you would run interactively and continue working in that shell

@rumpl
Copy link
Member

rumpl commented Apr 28, 2022

Tested with my code above, I changed the image to be ubuntu and added oci.WithProcessArgs("tput", "lines") and it outputs the right number of lines (30), outputing cols works too. So it looks like containerd really does pick this up immediately. On moby/moby it might be a bit trickier (?), I saw somewhere a comment that we attach a dummy terminal. Will have to dig moby/moby a bit more

@thaJeztah
Copy link
Member Author

Nice. so perhaps "all" that would be needed are the changes I mentioned in #3572 (comment)

applying the changes in Moby isn't too difficult (but as described there, the "expectations" on the client side will have to be gated by API version (pre v1.42 -> do the resize loop, API v1.42 or up -> initial size should already be good)

@thaJeztah
Copy link
Member Author

Updating this thread; this PR is making the time between retries a bit longer, and changes the logic for the initial resize to try 10 times (instead of 5);

Copying my comment (#3573 (comment)) from that PR so that it doesn't get lost once that's merged;

Let me comment here, as we were discussing this;

  • First of all, this loop is for the initial resize (docker run -> resize to desired dimensions), and not used for the "monitor for terminal to be resized" case
  • The retry loop is only hit after we tried a resize immediately (line 54 does a resize immediately after attaching, and we only hit the retry loop if that fails)
  • so: incrementing this sleep does not delay the initial resize (which is good)

Thinking about this a bit more;

  1. for a "daemon under load", it may be expected that we need slightly longer (or more retries)
  2. if the resize is taking longer for other (daemon not under load) situations, it may be that there's a regression in performance
  3. so even if we fix the initial size (to be defined when creating the container, and without a "resize" after), the "monitor for resize" would still need to trigger resizes
  4. so: if there's a regression in performance, that "monitor" case would still be affected by the regression (if any), so should still be looked into. (at least, we can keep a tracking ticket to do so)

rumpl added a commit to rumpl/cli that referenced this issue May 2, 2022
I some cases, for example if there is a heavy load, the initialization of the TTY size
would fail. This change makes the cli retry 10 times instead of 5 and we wait
incrementally from 10ms to 100ms

Relates to docker#3554

Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>
rumpl added a commit to rumpl/cli that referenced this issue May 2, 2022
I some cases, for example if there is a heavy load, the initialization of the TTY size
would fail. This change makes the cli retry 10 times instead of 5 and we wait
incrementally from 10ms to 100ms

Relates to docker#3554

Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>
rumpl added a commit to rumpl/cli that referenced this issue May 2, 2022
I some cases, for example if there is a heavy load, the initialization of the TTY size
would fail. This change makes the cli retry 10 times instead of 5 and we wait
incrementally from 10ms to 100ms

Relates to docker#3554

Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>
@thaJeztah
Copy link
Member Author

So fixes are being made in moby/moby#43593 and #3619 was merged, which also sets the size on docker create (not only on docker run).

Let me copy my comment from a Slack thread with @vvoland

I was digging a bit in the code, and wondering if we could also make it work for docker exec; and it looks like this is where we call out to containerd; https://github.com/moby/moby/blob/7b9275c0da707b030e62c96b679a976f31f929d3/libcontainerd/remote/client.go#L307-L310

	p, err = t.Exec(ctx, processID, spec, func(id string) (cio.IO, error) {
		rio, err = c.createIO(fifos, containerID, processID, stdinCloseSync, attachStdio)
		return rio, err
	})

In which spec is a spec *specs.Process, which has a ConsoleSize.

So if I'm right, it would be possible to have docker exec also set the initial size, but would mean we need to add extra options to the API to allow passing the size, but perhaps it's possible 🤔

Gandalf-the-Grey pushed a commit to openhive-network/hive that referenced this issue Jun 9, 2023
…inate terminal size problems (and other strange docker bugs related to display). See docker/cli#3554
@markdascher
Copy link

Is this fully fixed now in v23? moby/moby#43622 and #3627 seem to address the recent comment.

@thaJeztah
Copy link
Member Author

I think it may be yes; @vvoland was there anything left to look into?

@vvoland
Copy link
Contributor

vvoland commented Jul 25, 2023

No, this was fully addressed as far as I know, so I think this can be closed.

@markdascher are you still experiencing any of the issues described here?

@markdascher
Copy link

I think my issues were gone after the initial fixes somewhere around v18, so I'm probably not the best judge. (But for what it's worth, I haven't seen any issues with recent versions.)

Mostly I was just poking around to see when all the remaining edge cases would've been fixed, and the answer seems to be v23. But then saw this issue was still open.

@thaJeztah
Copy link
Member Author

Alright; looks like it's time to close this one then! Thanks for the reminder 🤗

Gandalf-the-Grey pushed a commit to openhive-network/hive that referenced this issue Aug 11, 2023
…inate terminal size problems (and other strange docker bugs related to display). See docker/cli#3554
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants