Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cli-plugins: use system dial-stdio to contact the engine. #1654

Merged
merged 2 commits into from
Feb 21, 2019

Conversation

ijc
Copy link
Contributor

@ijc ijc commented Feb 1, 2019

- What I did

Arranged so that cli-plugins will use docker system dial-stdio to connect to the engine as proposed by @tonistiigi in #1534 (comment).

- How I did it

The main commit message goes into some detail:

    cli-plugins: use `docker system dial-stdio` to call the daemon
    
    This means that plugins can use whatever methods the monolithic CLI supports,
    which is good for consistency.
    
    This relies on `os.Args[0]` being something which can be executed again to
    reach the same binary, since it is propagated (via an envvar) to the plugin for
    this purpose. This essentially requires that the current working directory and
    path are not modified by the monolithic CLI before it launches the plugin nor
    by the plugin before it initializes the client. This should be the case.
    
    Previously the fake apiclient used by `TestExperimentalCLI` was not being used,
    since `cli.Initialize` was unconditionally overwriting it with a real one
    (talking to a real daemon during unit testing, it seems). This wasn't expected
    nor desirable and no longer happens with the new arrangements, exposing the
    fact that no `pingFunc` is provided, leading to a panic. Add a `pingFunc` to
    the fake client to avoid this.
    
    Signed-off-by: Ian Campbell <ijc@docker.com>

It's a bit tricky, so could use some careful thought around the possible pitfalls.

Things I've considered doing differently:

  • Passing the original CLI via a new global argument instead of an envvar.
  • Passing the entire dial-stdio command line to use explicitly to the plugin. Not sure how best to do that without getting into confusion around quoting all the way through.

- How to verify it

There is an e2e test which confirms correct argument parsing. Also the existing TestCliInitialized happens to use the mechanism.

- Description for the changelog

N/A should be covered by #1564's entry.

- A picture of a cute animal (not mandatory but encouraged)

This can be obtained with the `.Name()` method.

Signed-off-by: Ian Campbell <ijc@docker.com>
@codecov-io
Copy link

codecov-io commented Feb 1, 2019

Codecov Report

Merging #1654 into master will decrease coverage by 0.07%.
The diff coverage is 5.71%.

@@            Coverage Diff             @@
##           master    #1654      +/-   ##
==========================================
- Coverage   56.12%   56.04%   -0.08%     
==========================================
  Files         306      306              
  Lines       20964    20925      -39     
==========================================
- Hits        11766    11728      -38     
- Misses       8345     8354       +9     
+ Partials      853      843      -10

cli/command/cli.go Outdated Show resolved Hide resolved
cli/command/cli.go Outdated Show resolved Hide resolved
@@ -53,6 +53,16 @@ func GetConnectionHelper(daemonURL string) (*ConnectionHelper, error) {
return nil, err
}

// GetCommandConnectionHelper returns a ConnectionHelp constructed from an arbitrary command.
func GetCommandConnectionHelper(cmd string, flags ...string) (*ConnectionHelper, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering why we didn't call the other one NewConnectionHelper (and this one NewCommandConnectionHelper)

(just thinking out loud 🤔)

Dialer: func(ctx context.Context, network, addr string) (net.Conn, error) {
return newCommandConn(ctx, cmd, flags...)
},
Host: "http://docker",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also thinking out loud while trying to grasp this PR; we should add comments with these (future readers may be confused and think "what's this magic docker host?"

Copy link
Contributor Author

@ijc ijc Feb 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This came pretty much verbatim from the existing example, I don't know where it came from nor really where it shows up...

docs/extend/cli_plugins.md Show resolved Hide resolved
@ijc
Copy link
Contributor Author

ijc commented Feb 15, 2019

@thaJeztah ping -- I think your remaining comments were just thinking out loud rather than things that should be changed.

Copy link
Contributor

@silvin-lubecki silvin-lubecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, even if I think the design seems fragile: trying to call the CLI back with what we hope to be the same context (all the global flags we retrieve), but I don't see any cleaner way to do it.

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but left one small comment

also would love to see @tonistiigi have a quick look

cli/command/cli.go Outdated Show resolved Hide resolved
@ijc
Copy link
Contributor Author

ijc commented Feb 18, 2019

LGTM, even if I think the design seems fragile: trying to call the CLI back with what we hope to be the same context (all the global flags we retrieve), but I don't see any cleaner way to do it.

This is certainly true. The other choice is the status quo which is to do the dial in the plugin itself, with whatever capabilities the vendored cli libraries happen to have, which may differ from what the calling top-level CLI supports (which might be confusing for users).

This means that plugins can use whatever methods the monolithic CLI supports,
which is good for consistency.

This relies on `os.Args[0]` being something which can be executed again to
reach the same binary, since it is propagated (via an envvar) to the plugin for
this purpose. This essentially requires that the current working directory and
path are not modified by the monolithic CLI before it launches the plugin nor
by the plugin before it initializes the client. This should be the case.

Previously the fake apiclient used by `TestExperimentalCLI` was not being used,
since `cli.Initialize` was unconditionally overwriting it with a real one
(talking to a real daemon during unit testing, it seems). This wasn't expected
nor desirable and no longer happens with the new arrangements, exposing the
fact that no `pingFunc` is provided, leading to a panic. Add a `pingFunc` to
the fake client to avoid this.

Signed-off-by: Ian Campbell <ijc@docker.com>
@thaJeztah
Copy link
Member

Let's get this merged; thanks!

@thaJeztah thaJeztah merged commit 06b837a into docker:master Feb 21, 2019
@GordonTheTurtle GordonTheTurtle added this to the 19.03.0 milestone Feb 21, 2019
@ijc
Copy link
Contributor Author

ijc commented Feb 21, 2019

Thanks, I ticked the box in #1661.

@ijc ijc deleted the plugins-dial-stdio branch February 21, 2019 11:14
@ijc
Copy link
Contributor Author

ijc commented Mar 1, 2019

Hrm, I hadn't appreciated that this would require 18.09 on the daemon side;

error during connect: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.40/containers/json?all=1&«...»&limit=0: command [docker system dial-stdio] has exited with exit status 1, please make sure the URL is valid, and Docker 18.09 or later is installed on the remote host: stderr=

This seems like a problem to me, @thaJeztah @tonistiigi WDYT?

Not sure how to address, since figuring out the daemon version requires calling out to the dameon, which currently required dial-stdio, rinse and repeat...

@ijc
Copy link
Contributor Author

ijc commented Mar 4, 2019

The above was due to me misreading the logs -- in fact it was telling me that my test had accidentally found /usr/bin/docker which was 18.06 and this didn't support docker system dial-stdio. Now that I fixed the test to find the right cli this aspect is ok.

However, it appears that dial-stdio does not work on Windows:

docker@WIN-NUC0 C:\Users\docker>.\docker-windows-amd64.exe version
Client:
 Version:           19.03.0-dev
 API version:       1.39 (downgraded from 1.40)
 Go version:        go1.11.5
 Git commit:        8ddde26a
 Built:             Mon Mar  4 16:34:00 2019
 OS/Arch:           windows/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.0
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.4
  Git commit:       4d60db4
  Built:            Wed Nov  7 00:55:00 2018
  OS/Arch:          linux/amd64
  Experimental:     false

docker@WIN-NUC0 C:\Users\docker>.\docker-windows-amd64.exe system dial-stdio
the raw stream connection does not implement halfCloser

Basically it seems that net.Conn doesn't have ReadClose on Windows (from my tests it does have WriteClose):

docker@WIN-NUC0 C:\Users\docker>.\docker-windows-amd64.exe -D system dial-stdio
DEBU[0000] conn is a *winio.win32MessageBytePipe
DEBU[0000] conn is a halfReadCloser: false
DEBU[0000] conn is a halfWriteCloser: true
the raw stream connection does not implement halfCloser

(this seems to be independent of plugins using this, it's just the base functionality is not there in windows)

@tonistiigi
Copy link
Member

@AkihiroSuda If ReadClose is missing in win, could we add a fallback that doesn't need that.

@ijc
Copy link
Contributor Author

ijc commented Mar 4, 2019

I raised #1710 in the meantime.

ijc pushed a commit to ijc/docker-cli that referenced this pull request Mar 18, 2019
Since docker#1654 so far we've had problems with it not working on Windows (npipe
lacked the `ReadClose` method) and problems with using tcp with tls (the tls
connection also lacks `ReadClose`). Both of these were workedaround in docker#1718
which added a nop `ReadClose` method.

However I am now seeing hangs (on Windows) where the `system dial-stdio`
subprocess is not exiting (I'm unsure why so far).

I think the 3rd problem found with this is an indication that `dial-stdio` is
not quite ready for wider use outside of its initial usecase (support for
`ssh://` URLs to connect to remote daemons).

This change simply disables the `dial-stdio` path for all plugins. However
rather than completely reverting 891b3d9 ("cli-plugins: use `docker system
dial-stdio` to call the daemon") I've just disabled the functionality at the
point of use and left in a trap door environment variable so that those who
want to experiment with this mode (and perhaps fully debug it) have an easier
path do doing so.

Signed-off-by: Ian Campbell <ijc@docker.com>
ijc pushed a commit to ijc/docker-cli that referenced this pull request Mar 18, 2019
Since docker#1654 so far we've had problems with it not working on Windows (npipe
lacked the `CloseRead` method) and problems with using tcp with tls (the tls
connection also lacks `CloseRead`). Both of these were workedaround in docker#1718
which added a nop `CloseRead` method.

However I am now seeing hangs (on Windows) where the `system dial-stdio`
subprocess is not exiting (I'm unsure why so far).

I think the 3rd problem found with this is an indication that `dial-stdio` is
not quite ready for wider use outside of its initial usecase (support for
`ssh://` URLs to connect to remote daemons).

This change simply disables the `dial-stdio` path for all plugins. However
rather than completely reverting 891b3d9 ("cli-plugins: use `docker system
dial-stdio` to call the daemon") I've just disabled the functionality at the
point of use and left in a trap door environment variable so that those who
want to experiment with this mode (and perhaps fully debug it) have an easier
path do doing so.

Signed-off-by: Ian Campbell <ijc@docker.com>
ijc pushed a commit to ijc/docker-cli that referenced this pull request Mar 18, 2019
Since docker#1654 so far we've had problems with it not working on Windows (npipe
lacked the `CloseRead` method) and problems with using tcp with tls (the tls
connection also lacks `CloseRead`). Both of these were workedaround in docker#1718
which added a nop `CloseRead` method.

However I am now seeing hangs (on Windows) where the `system dial-stdio`
subprocess is not exiting (I'm unsure why so far).

I think the 3rd problem found with this is an indication that `dial-stdio` is
not quite ready for wider use outside of its initial usecase (support for
`ssh://` URLs to connect to remote daemons).

This change simply disables the `dial-stdio` path for all plugins. However
rather than completely reverting 891b3d9 ("cli-plugins: use `docker system
dial-stdio` to call the daemon") I've just disabled the functionality at the
point of use and left in a trap door environment variable so that those who
want to experiment with this mode (and perhaps fully debug it) have an easier
path do doing so.

The e2e test for this case is disabled unless the trap door envvar is set. I
also renamed the test to clarify that it is about cli plugins.

Signed-off-by: Ian Campbell <ijc@docker.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants