Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignition's "gcloud" alias doesn't work with torcx-ified docker #2112

Closed
euank opened this Issue Aug 17, 2017 · 20 comments

Comments

Projects
None yet
10 participants
@euank
Copy link
Contributor

euank commented Aug 17, 2017

Ignition defines a gcloud alias here: https://github.com/coreos/ignition/blob/v0.17.2/internal/oem/oem.go#L202

It bindmounts in /usr/bin/docker, however in a torcx world that's not a supported thing.

The simplest reproduction is the following:

$ gcloud docker -- pull busybox
The program docker is managed by torcx, which did not run.
@euank

This comment has been minimized.

Copy link
Contributor Author

euank commented Aug 17, 2017

cc @crawford @lucab

The possible options that make sense to me are:

  1. Replace the alias with a function or shell script we add to the user's path; have sufficient torcx-aware logic to bindmount through that docker binary.
  2. Fork the gcloud image and have a docker client binary installed in it
  3. Bindmount through /run/metadata and /run/torcx as well

I think 2 is the most correct, but 3 is the easiest. I'm in favor of doing 2 and am wiling to create / maintain the image (aka setup an auto-update jenkins job)

@lucab

This comment has been minimized.

Copy link
Member

lucab commented Aug 18, 2017

I didn't know about this. It looks like 1 and 3 would also require bindmounting lib directories (not sure how it can work right now). With 2 however I'm not sure how to handle discrepancies between client/server protocol versions.

As I guess this same scenario will happen again, should we consider exploring a symlink /usr/bin/docker -> /run/torcx/bin/docker? Or perhaps just change the alias to a which docker with a torcx-aware PATH?

@rojer

This comment has been minimized.

Copy link

rojer commented Aug 18, 2017

it's not just gcloud though. this just hit us in stable, we are no longer able to run docker from within our containers. there needs to be a generic solution for running docker-within-docker. traditionally bind-mounting docker binary was enough, it no longer is. i'd say thiss change should be olled back until proper solution is available.

@rojer

This comment has been minimized.

Copy link

rojer commented Aug 18, 2017

also:

Torcx is currently in an experimental state. The API and CLI have no guarantees of stability, and the design is not yet finalized.
Running torcx in production is not recommended.

and this gets pushed to stable? not pleased.

@euank

This comment has been minimized.

Copy link
Contributor Author

euank commented Aug 18, 2017

@lucab

It looks like 1 and 3 would also require bindmounting lib directories

Since gcloud only uses the client portion, which doesn't touch those directories directly, I'm not sure why they'd be needed.

With 2 however I'm not sure how to handle discrepancies between client/server protocol versions.

At this point docker client/server mismatch is largely a historical problem. We'd have to double check a modern client talks to 1.12.6 correctly, but I do think that's true nowadays.


@rojer It's unfortunate this impacted you. We announced this impending change some months ago, and kept an ear out while it was in alpha and beta.

there needs to be a generic solution for running docker-within-docker. traditionally bind-mounting docker binary was enough, it no longer is.

That always had the chance to break given we ship a dynamically linked docker client (e.g. before it wouldn't work in an alpine container).
For just running the docker client, installing or downloading the client binary in the container itself + bindmounting in the socket does and has worked.

.. should be rolled back until a better solution is available.

Does including the docker client binary in your container work as a solution for you?

@crawford

This comment has been minimized.

Copy link
Member

crawford commented Aug 21, 2017

I think the best move forward is going to be to tear out the GCE Agent and related tooling. Their python code has been a pain to maintain over the past few years and it has seemingly very little benefit. Ignition and coreos-metadata already contain most of the functionality needed. The big thing we lose is gcloud ssh (which is a gaping security hole in my opinion).

@euank

This comment has been minimized.

Copy link
Contributor Author

euank commented Aug 21, 2017

@crawford This issue -- the aliases ignition creates -- is entirely unrelated to the gce agent or python.

@crawford

This comment has been minimized.

Copy link
Member

crawford commented Aug 21, 2017

It's related if the solution is to remove the alias completely.

@rojer

This comment has been minimized.

Copy link

rojer commented Aug 21, 2017

fwiw, i ended up installing docker inside the container and all is well now. sorry for raising fuss, the solution wasn't right to begin with, coreos changes just exposed this fact.

@vlerenc

This comment has been minimized.

Copy link

vlerenc commented Aug 27, 2017

Same problem here. This is totally unrelated to GCE. I simply can't bind-mount the Docker socket and binary anymore, which used to be enough for running Docker. Now I get this Torcx problem, something that's even in an experimental state?

Is there a decent proposal on what to do if I need to run Docker in my containers?

Does somebody have the list of stuff I now need to bind-mount? Or should I really install Docker in my containers, which is a lot of work since I have many (Jenkins scenario).

@vlerenc

This comment has been minimized.

Copy link

vlerenc commented Aug 27, 2017

OK, this seems to work: docker run --name docker-ps-test -it -v /var/run/docker.sock:/var/run/docker.sock -v /usr/bin/docker:/usr/bin/docker -v /run/metadata:/run/metadata -v /run/torcx:/run/torcx golang:1.8.0 docker ps

@euank

This comment has been minimized.

Copy link
Contributor Author

euank commented Aug 31, 2017

Specifically for the gcloud alias issue, I've filed GoogleCloudPlatform/cloud-sdk-docker#97 upstream. If they accept that approach, we can more easily fix our issue.

@vlerenc it really is generally better to install the docker client specifically in containers.

An alternate hack would be to download a statically linked docker client from docker's releases and bindmount that in instead — as long as the upstream project continues to provide statically compiled clients that should work.

@JasonGiedymin

This comment has been minimized.

Copy link

JasonGiedymin commented Aug 31, 2017

Got bit by this today. Was looking to just pull from my registry. No go.

@bgilbert

This comment has been minimized.

Copy link
Member

bgilbert commented Sep 11, 2017

According to GoogleCloudPlatform/cloud-sdk-docker#97 (comment), Google Cloud upstream is okay with adding a docker binary to the SDK container. In anticipation, coreos/ignition#439 removes the /usr/bin/docker bindmount from the gcloud alias. Leaving this bug open until the docker binary lands upstream.

@sfchrisgleason

This comment has been minimized.

Copy link

sfchrisgleason commented Sep 27, 2017

Hey, I'm not sure if this matters, but the -v is misspelled for docker.sock. It's spelled doker.sock

I copied that alias to a container optimized OS and it worked fine. will try updating to see if it works in CoreOS

@sfchrisgleason

This comment has been minimized.

Copy link

sfchrisgleason commented Sep 27, 2017

Nope. Did not make any difference on CoreOS one. When it is fixed you might want to add the:

--rm

flag to the docker run command so it doesn't fill up images with a bunch of detritus.

@bgilbert

This comment has been minimized.

Copy link
Member

bgilbert commented Sep 29, 2017

@sfchrisgleason Thanks for pointing out the missing --rm. I've submitted coreos/ignition#464 to fix it.

@bgilbert

This comment has been minimized.

Copy link
Member

bgilbert commented Oct 23, 2017

google/cloud-sdk 176.0.0 now includes the docker client. gcloud should now work correctly with docker on Container Linux instances originally provisioned with CL 1535.0.0 or newer. If you've previously used gcloud on such an instance, run docker pull google/cloud-sdk to update.

@bgilbert bgilbert closed this Oct 23, 2017

@rushins

This comment has been minimized.

Copy link

rushins commented Aug 4, 2018

i hit the same issues on baremetal with 1.96 tectonic installer and docker is not started at all

any help.

@seitbekir

This comment has been minimized.

Copy link

seitbekir commented Mar 10, 2019

So there is still no solution? I just restarted the machine and it's no longer runs docker. And I don't see any solution what to do. All tries to run some other things, that torcx just not affecting on anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.