Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker in docker actions no longer work in version v0.27.2 and v0.27.3 #2519

Closed
7 tasks done
mrparkers opened this issue Apr 17, 2023 · 9 comments · Fixed by #2536
Closed
7 tasks done

Docker in docker actions no longer work in version v0.27.2 and v0.27.3 #2519

mrparkers opened this issue Apr 17, 2023 · 9 comments · Fixed by #2536
Labels
bug Something isn't working needs triage Requires review from the maintainers

Comments

@mrparkers
Copy link

mrparkers commented Apr 17, 2023

Checks

Controller Version

v0.27.3

Helm Chart Version

v0.23.0

CertManager Version

v1.11.0

Deployment Method

Helm

cert-manager installation

I followed the installation instructions and I installed cert-manager from an official source.

Checks

  • This isn't a question or user support case (For Q&A and community support, go to Discussions. It might also be a good idea to contract with any of contributors and maintainers if your business is so critical and therefore you need priority support
  • I've read releasenotes before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes
  • My actions-runner-controller version (v0.x.y) does support the feature
  • I've already upgraded ARC (including the CRDs, see charts/actions-runner-controller/docs/UPGRADING.md for details) to the latest and it didn't fix the issue
  • I've migrated to the workflow job webhook event (if you using webhook driven scaling)

Resource Definitions

apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: org
spec:
  template:
    spec:
      organization: redacted
      resources:
        requests:
          ephemeral-storage: 20Gi
        limits:
          ephemeral-storage: 30Gi
      labels:
        - cluster_staging

To Reproduce

1. Setup a GitHub Actions workflow that uses a Docker-based action, which itself tries to invoke the `docker` command.
2. See the error message `Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?`

Describe the bug

A change in v0.27.2 caused docker within Docker-based actions to stop working, due to the docker socket being mounted to a different location.

Here is all of the information that I provided in a previous comment:

In v0.27.2, I've verified that the runner pod is still able to use docker just fine:

$ kubectl exec -it -c runner org-m7skv-sd6hx -- /bin/bash
runner@org-m7skv-sd6hx:/$ docker ps
CONTAINER ID   IMAGE                                     COMMAND                  CREATED         STATUS         PORTS     NAMES
731aed091e81   60e226:eb06da2dd3ec44b28ceed4eea31cebcf   "/opt/action-run.sh"   2 seconds ago   Up 2 seconds             e226eb06da2dd3ec44b28ceed4eea31cebcf_381567

However this container isn't able to use docker, despite this working in v0.27.1:

runner@org-m7skv-sd6hx:/$ docker exec -it 731aed091e81 /bin/bash
root@731aed091e81:/github/workspace# docker ps
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

I think this is broken because GitHub Actions automatically mounts the docker socket for docker-based actions, but it assumes that the docker socket is available at /var/run/docker.sock, which is no longer the case as of v0.27.2:

runner@org-m7skv-sd6hx:/$ docker ps
CONTAINER ID   IMAGE                                     COMMAND                  CREATED         STATUS         PORTS     NAMES
731aed091e81   60e226:eb06da2dd3ec44b28ceed4eea31cebcf   "/opt/action-run.sh"   4 minutes ago   Up 4 minutes             e226eb06da2dd3ec44b28ceed4eea31cebcf_381567
runner@org-m7skv-sd6hx:/$ docker inspect -f '{{ .Mounts }}' 731aed091e81
[{bind  /runner/_work/_temp/_github_home /github/home   true rprivate} {bind  /runner/_work/_temp/_github_workflow /github/workflow   true rprivate} {bind  /runner/_work/_temp/_runner_file_commands /github/file_commands   true rprivate} {bind  /runner/_work/test-build/test-build /github/workspace   true rprivate} {bind  /var/run/docker.sock /var/run/docker.sock   true rprivate}]

It's easier to see in the JSON returned via docker inspect:

        "Mounts": [
            // other mounts omitted for brevity
            {
                "Type": "bind",
                "Source": "/var/run/docker.sock",
                "Destination": "/var/run/docker.sock",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            }
        ],

This mount isn't correct anymore - the source should be /var/run/docker/docker.sock now. Unfortunately, it doesn't seem possible to override this via GitHub actions (actions/runner#1754).

I was not able to use a tool like socat to make the socket available in the expected location. Symlinking the socket also didn't work, as I expect that I would have to also mount the directory where the symlink points to.

Describe the expected behavior

Invoking docker within a Docker-based action should work, as it did in v0.27.1.

Whole Controller Logs

https://gist.github.com/mrparkers/b619003caff3dd15b1f568ecb93f2a35

Whole Runner Pod Logs

https://gist.github.com/mrparkers/d1d75da1808abdc3c40306eacc118293
@mrparkers mrparkers added bug Something isn't working needs triage Requires review from the maintainers labels Apr 17, 2023
@github-actions
Copy link
Contributor

Hello! Thank you for filing an issue.

The maintainers will triage your issue shortly.

In the meantime, please take a look at the troubleshooting guide for bug reports.

If this is a feature request, please review our contribution guidelines.

@LorenzoRogai
Copy link

Can confirm the same problem on our organization. Had to revert to 0.27.1

@LorenzoRogai
Copy link

Probably introduced inside 878c9b8

@mumoshu
Copy link
Collaborator

mumoshu commented Apr 24, 2023

Hey @mrparkers @LorenzoRogai! Thanks for the detailed report. I can definitely see docker ps is working in the runner containers since v0.27.3. However, I can also see your point of-

but it assumes that the docker socket is available at /var/run/docker.sock

It's still at /var/run/docker/docker.sock with the fixed permissions in v0.27.3. Even though docker ps work, anything depends on /var/run/docker.sock would still fail in v0.27.3...

I'll try to see how we could fix it. Perhaps we might just default to /var/run/docker.sock when composing runner pod specs.

@m-barczyk
Copy link

@mumoshu Similar problem exists for actions-runner-dind-rootless. In the runner, docker socket is available under:

runner@runner-test-9krrs-b4ck8:~$ echo $DOCKER_HOST
unix:///run/user/1000/docker.sock

runner@runner-test-9krrs-b4ck8:~$ ls -l /run/user/1000/docker.sock 
srw-rw---T 1 runner runner 0 Apr 25 14:23 /run/user/1000/docker.sock

But all the containers executed on this runner:

runner@runner-test-9krrs-b4ck8:~$ docker ps
CONTAINER ID   IMAGE         COMMAND               CREATED       STATUS       PORTS           NAMES
f1b51ef8d50f   docker:23.0   "tail -f /dev/null"   4 hours ago   Up 4 hours   2375-2376/tcp   3206c77686c54c2aa231d55c5e7a014c_docker230_78e894

Are trying to bind non existing /var/run/docker.sock:

runner@runner-test-9krrs-b4ck8:~$ docker inspect f1b51ef8d50f 
        "Mounts": [
            {
                "Type": "bind",
                "Source": "/var/run/docker.sock",
                "Destination": "/var/run/docker.sock",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            },

As a result docker inside such containers does not work:

runner@runner-test-9krrs-b4ck8:~$ docker exec -it f1b51ef8d50f sh
/__w/my-repository/my-repository # docker ps
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

@mumoshu
Copy link
Collaborator

mumoshu commented Apr 26, 2023

Hey @m-barczyk! I was going to fix this via #2536, and yours seem to require a completely different fix.
Would you mind submitting a brand-new issue for your problem so we can continue the discussion and the potential fix there?
Thanks in advance for your support!

@mumoshu
Copy link
Collaborator

mumoshu commented Apr 26, 2023

Hey @mrparkers @LorenzoRogai, some good news here- I think I was able to reproduce the issue locally, and #2536 fixed it for me.
I'm still not 100% sure that fixes the issue for you so... It would be awesome if you could give it a shot on your own environment, by building ARC from #2536 and deploying it onto your test environment!

@vishu42
Copy link

vishu42 commented Apr 26, 2023

#2536 fixes the issue for me.
For anyone who wants to verify #2536 can use the arc I have built from #2536 and have pushed to dockerhub at vishu42/actions-runner-controller:canary

cc: @mumoshu @LorenzoRogai @mrparkers

mumoshu added a commit that referenced this issue Apr 27, 2023
Starting ARC v0.27.2, we've changed the `docker.sock` path from `/var/run/docker.sock` to `/var/run/docker/docker.sock`. That resulted in breaking some container-based actions due to the hard-coded `docker.sock` path in various places.

Even `actions/runner` seem to use `/var/run/docker.sock` for building container-based actions and for service containers?

Anyway, this fixes that by moving the sock file back to the previous location.

Once this gets merged, users stuck at ARC v0.27.1, previously upgraded to 0.27.2 or 0.27.3 and reverted back to v0.27.1 due to #2519, should be able to upgrade to the upcoming v0.27.4.

Resolves #2519
Resolves #2538
@m-barczyk
Copy link

Hey @m-barczyk! I was going to fix this via #2536, and yours seem to require a completely different fix. Would you mind submitting a brand-new issue for your problem so we can continue the discussion and the potential fix there? Thanks in advance for your support!

@mumoshu of course, here is a new issue: #2548

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Requires review from the maintainers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants