Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection Refused on example-runner #767

Closed
1 of 2 tasks
nicon89 opened this issue Aug 25, 2021 · 7 comments
Closed
1 of 2 tasks

Connection Refused on example-runner #767

nicon89 opened this issue Aug 25, 2021 · 7 comments
Labels
istio Issues caused by istio

Comments

@nicon89
Copy link

nicon89 commented Aug 25, 2021

Describe the bug
While running latest version of actions-runner-controller I'm getting error "connection refused":

$ k logs -f example-runner --all-containers
Github endpoint URL https://github.com/

--------------------------------------------------------------------------------
|        ____ _ _   _   _       _          _        _   _                      |
|       / ___(_) |_| | | |_   _| |__      / \   ___| |_(_) ___  _ __  ___      |
|      | |  _| | __| |_| | | | | '_ \    / _ \ / __| __| |/ _ \| '_ \/ __|     |
|      | |_| | | |_|  _  | |_| | |_) |  / ___ \ (__| |_| | (_) | | | \__ \     |
|       \____|_|\__|_| |_|\__,_|_.__/  /_/   \_\___|\__|_|\___/|_| |_|___/     |
|                                                                              |
|                       Self-hosted runner registration                        |
|                                                                              |
--------------------------------------------------------------------------------

# Authentication

Connection refused

This is a private cluster with istio in place. I have whitelisted api.github.com, github.com and pipelines.actions.githubusercontent.com

Checks

  • My actions-runner-controller version (v0.x.y) does support the feature
  • I'm using an unreleased version of the controller I built from HEAD of the default branch

To Reproduce
Steps to reproduce the behavior:

  1. Setup controller with https://github.com/actions-runner-controller/actions-runner-controller/releases/download/v0.18.2/actions-runner-controller.yaml
  2. Setup example-runner
  3. Check logs

Expected behavior
It should work :)

Environment (please complete the following information):

  • Controller Version [0.18.2]
  • Deployment Method [Kustomize ]

Additional context
When I tried to run ./config.sh from the pod it actually works:

runner@example-runner:/runner$ ./config.sh --unattended --replace --name example-runner --url https://github.com/XXXX --token YYYY --runnergroup '' --labels '' --work /runner/_work
--------------------------------------------------------------------------------
|        ____ _ _   _   _       _          _        _   _                      |
|       / ___(_) |_| | | |_   _| |__      / \   ___| |_(_) ___  _ __  ___      |
|      | |  _| | __| |_| | | | | '_ \    / _ \ / __| __| |/ _ \| '_ \/ __|     |
|      | |_| | | |_|  _  | |_| | |_) |  / ___ \ (__| |_| | (_) | | | \__ \     |
|       \____|_|\__|_| |_|\__,_|_.__/  /_/   \_\___|\__|_|\___/|_| |_|___/     |
|                                                                              |
|                       Self-hosted runner registration                        |
|                                                                              |
--------------------------------------------------------------------------------

# Authentication
√ Connected to GitHub

# Runner Registration
√ Runner successfully added
@callum-tait-pbx
Copy link
Contributor

callum-tait-pbx commented Aug 25, 2021

Is the runner coming up before the proxy is up?

Can you set holdApplicationUntilProxyStarts istio/istio#11130? If not can you add in a startup delay to the runner instead?

apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: example-runnerdeployment
spec:
  template:
    spec:
      env:
        - name: STARTUP_DELAY_IN_SECONDS
          value: "2"

I highly doubt this is an action-runner-controller issue

@nicon89
Copy link
Author

nicon89 commented Aug 26, 2021

It went a bit further...

$ k logs -f example-runnerdeployment-9mgrx-vhlwx runner
Delaying startup by 20 seconds
Github endpoint URL https://github.com/

--------------------------------------------------------------------------------
|        ____ _ _   _   _       _          _        _   _                      |
|       / ___(_) |_| | | |_   _| |__      / \   ___| |_(_) ___  _ __  ___      |
|      | |  _| | __| |_| | | | | '_ \    / _ \ / __| __| |/ _ \| '_ \/ __|     |
|      | |_| | | |_|  _  | |_| | |_) |  / ___ \ (__| |_| | (_) | | | \__ \     |
|       \____|_|\__|_| |_|\__,_|_.__/  /_/   \_\___|\__|_|\___/|_| |_|___/     |
|                                                                              |
|                       Self-hosted runner registration                        |
|                                                                              |
--------------------------------------------------------------------------------

# Authentication


√ Connected to GitHub

# Runner Registration




√ Runner successfully added
The HTTP request timed out after 00:01:40.
16c16
< ./externals/node12/bin/node ./bin/RunnerService.js &
---
> ./externals/node12/bin/node ./bin/RunnerService.js $* &
20c20
< wait $PID
---
> wait $PID
\ No newline at end of file
19c19
< var runService = function () {
---
> var runService = function() {
23c23
<     if (!stopping) {
---
>     if(!stopping) {
27c27
<                 listener = childProcess.spawn(listenerExePath, ['run'], { env: process.env });
---
>                 listener = childProcess.spawn(listenerExePath, ['run'].concat(process.argv.slice(3)), { env: process.env });
30c30
<                 listener = childProcess.spawn(listenerExePath, ['run', '--startuptype', 'service'], { env: process.env });
---
>                 listener = childProcess.spawn(listenerExePath, ['run', '--startuptype', 'service'].concat(process.argv.slice(2)), { env: process.env });
33c33
<             console.log(`Started listener process, pid: ${listener.pid}`);
---
>             console.log('Started listener process');
43,46d42
<             listener.on("error", (err) => {
<                 console.log(`Runner listener fail to start with error ${err.message}`);
<             });
<
64c60
<                 if (!stopping) {
---
>                 if(!stopping) {
69c65
<         } catch (ex) {
---
>         } catch(ex) {
78c74
< var gracefulShutdown = function (code) {
---
> var gracefulShutdown = function(code) {
85,86c81
<         console.log('Sending SIGKILL to runner listener');
<         setTimeout(() => listener.kill('SIGKILL'), 30000);
---
>         // TODO wait for 30 seconds and send a SIGKILL
96c91
< });
---
> });
\ No newline at end of file
Passing --once to runsvc.sh to enable the legacy ephemeral runner.
.path=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/home/runner/.local/bin
Starting Runner listener with startup type: service
Started listener process
Started running service
An error occurred: Not configured. Run config.(sh/cmd) to configure the runner.
Runner listener exited with error code 2
Runner listener exit with retryable error, re-launch runner in 5 seconds.
Starting Runner listener with startup type: service
Started listener process
An error occurred: Not configured. Run config.(sh/cmd) to configure the runner.
Runner listener exited with error code 2
Runner listener exit with retryable error, re-launch runner in 5 seconds.
Starting Runner listener with startup type: service
Started listener process
An error occurred: Not configured. Run config.(sh/cmd) to configure the runner.
Runner listener exited with error code 2
Runner listener exit with retryable error, re-launch runner in 5 seconds.
Starting Runner listener with startup type: service
Started listener process
An error occurred: Not configured. Run config.(sh/cmd) to configure the runner.
Runner listener exited with error code 2
Runner listener exit with retryable error, re-launch runner in 5 seconds.
Starting Runner listener with startup type: service
Started listener process
An error occurred: Not configured. Run config.(sh/cmd) to configure the runner.
Runner listener exited with error code 2
Runner listener exit with retryable error, re-launch runner in 5 seconds.
Starting Runner listener with startup type: service
Started listener process
An error occurred: Not configured. Run config.(sh/cmd) to configure the runner.
Runner listener exited with error code 2
Runner listener exit with retryable error, re-launch runner in 5 seconds.
Starting Runner listener with startup type: service
Started listener process
An error occurred: Not configured. Run config.(sh/cmd) to configure the runner.
Runner listener exited with error code 2
Runner listener exit with retryable error, re-launch runner in 5 seconds.
Starting Runner listener with startup type: service
Started listener process
An error occurred: Not configured. Run config.(sh/cmd) to configure the runner.
Runner listener exited with error code 2
Runner listener exit with retryable error, re-launch runner in 5 seconds.
Starting Runner listener with startup type: service
Started listener process
An error occurred: Not configured. Run config.(sh/cmd) to configure the runner.
Runner listener exited with error code 2
Runner listener exit with retryable error, re-launch runner in 5 seconds.
Starting Runner listener with startup type: service
Started listener process
An error occurred: Not configured. Run config.(sh/cmd) to configure the runner.
Runner listener exited with error code 2
Runner listener exit with retryable error, re-launch runner in 5 seconds.
Starting Runner listener with startup type: service
Started listener process
An error occurred: Not configured. Run config.(sh/cmd) to configure the runner.
Runner listener exited with error code 2
Runner listener exit with retryable error, re-launch runner in 5 seconds.

I can see the runner on runners list, but its status is offline.

@mumoshu
Copy link
Collaborator

mumoshu commented Aug 26, 2021

// The HTTP request timed out after 00:01:40.

I can see you still have an error. Probably you need to add another host to a kind of the egress allow-list(I'm no Istio expert so not sure how it's called in the Istio land though)

@nicon89
Copy link
Author

nicon89 commented Aug 26, 2021

Yeah, it was missing access to: vstoken.actions.githubusercontent.com
Thanks for Debian with sudo - it was quite easy to install tcpdump to troubleshoot this :)
For posterity, you need to whitelist:

* api.github.com
* github.com
* pipelines.actions.githubusercontent.com
* vstoken.actions.githubusercontent.com
* codeload.github.com

and for Google Cloud GCR access:

* gcr.io
* dl.google.com
* www.googleapis.com
* accounts.google.com
* cloudresourcemanager.googleapis.com
* oauth2.googleapis.com
* storage.googleapis.com

@nicon89 nicon89 closed this as completed Aug 26, 2021
@nicon89
Copy link
Author

nicon89 commented Aug 26, 2021

I need to reopen as I have one more issue that is causing deletion issues for runners.

I tried to remove runnerdeployment and it got stuck.

In logs I can see:

2021-08-26T10:21:39.623Z        ERROR   actions-runner-controller.runner        Failed to update runner for finalizer removal   {"runner": "actions-runner-system/example-runnerdeployment-9mgrx-vhlwx", "error": "Internal error occurred: failed calling webhook \"mutate.runner.actions.summerwind.dev\": Post https://webhook-service.actions-runner-system.svc:443/mutate-actions-summerwind-dev-v1alpha1-runner?timeout=30s: x509: certificate is not valid for any names, but wanted to match webhook-service.actions-runner-system.svc"}
github.com/go-logr/zapr.(*zapLogger).Error
        /go/pkg/mod/github.com/go-logr/zapr@v0.1.0/zapr.go:128
github.com/summerwind/actions-runner-controller/controllers.(*RunnerReconciler).Reconcile
        /workspace/controllers/runner_controller.go:137
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:256
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:232
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:211
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1
        /go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190913080033-27d36303b655/pkg/util/wait/wait.go:152
k8s.io/apimachinery/pkg/util/wait.JitterUntil
        /go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190913080033-27d36303b655/pkg/util/wait/wait.go:153
k8s.io/apimachinery/pkg/util/wait.Until
        /go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190913080033-27d36303b655/pkg/util/wait/wait.go:88
2021-08-26T10:21:39.623Z        ERROR   controller-runtime.controller   Reconciler error        {"controller": "runner-controller", "request": "actions-runner-system/example-runnerdeployment-9mgrx-vhlwx", "error": "Internal error occurred: failed calling webhook \"mutate.runner.actions.summerwind.dev\": Post https://webhook-service.actions-runner-system.svc:443/mutate-actions-summerwind-dev-v1alpha1-runner?timeout=30s: x509: certificate is not valid for any names, but wanted to match webhook-service.actions-runner-system.svc"}
github.com/go-logr/zapr.(*zapLogger).Error
        /go/pkg/mod/github.com/go-logr/zapr@v0.1.0/zapr.go:128
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:258
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:232
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:211
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1
        /go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190913080033-27d36303b655/pkg/util/wait/wait.go:152
k8s.io/apimachinery/pkg/util/wait.JitterUntil
        /go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190913080033-27d36303b655/pkg/util/wait/wait.go:153
k8s.io/apimachinery/pkg/util/wait.Until
        /go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190913080033-27d36303b655/pkg/util/wait/wait.go:88
2021/08/26 10:24:53 http: TLS handshake error from 127.0.0.1:49670: EOF
2021/08/26 10:24:53 http: TLS handshake error from 127.0.0.1:49672: EOF
2021/08/26 10:25:16 http: TLS handshake error from 127.0.0.1:49824: EOF
2021/08/26 10:25:16 http: TLS handshake error from 127.0.0.1:49826: EOF

Or:

Error from server (InternalError): error when replacing "/opt/tmp/kubectl-edit-csa4l.yaml": Internal error occurred: failed calling webhook "mutate.runner.actions.summerwind.dev": Post https://webhook-service.actions-runner-system.svc:443/mutate-actions-summerwind-dev-v1alpha1-runner?timeout=30s: x509: certificate is not valid for any names, but wanted to match webhook-service.actions-runner-system.svc

@mumoshu : any advice?

@nicon89 nicon89 reopened this Aug 26, 2021
@nicon89
Copy link
Author

nicon89 commented Aug 26, 2021

It appears that mTLS from istio was causing another issues.
Disabled mTLS, it seem to work now fine.

@nicon89 nicon89 closed this as completed Aug 26, 2021
@mumoshu
Copy link
Collaborator

mumoshu commented Aug 27, 2021

@nicon89 Thanks for sharing your solution!

For posterity, you need to whitelist:

That's good to know! Thanks again for sharing.

It appears that mTLS from istio was causing another issues.

Makes sense.
You already resolved it, but I can suggest another solution- can you somehow disable mTLS or sidecar injection only for actions-runner-controller's controller-manager deployments and pods?

That's where the mutating(and validating) webhook server is running. It does it's own TLS (I thought it's not mutual though) by using certs provided by cert-manager. Disabling sidecar injection only on the webhook server would bring back the TLS connection between the K8s apiserver and the webhook server.

@mumoshu mumoshu added the istio Issues caused by istio label Aug 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
istio Issues caused by istio
Projects
None yet
Development

No branches or pull requests

3 participants