Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sensor for reacting to Kubernetes objects fails #80

Closed
shrinandj opened this issue Jul 31, 2018 · 8 comments
Closed

Sensor for reacting to Kubernetes objects fails #80

shrinandj opened this issue Jul 31, 2018 · 8 comments
Assignees
Labels
bug Something isn't working

Comments

@shrinandj
Copy link
Contributor

Describe the bug
I am trying to run a workflow in response to a Kubernetes object being created (specifically, a namespace). I am running the sensor-controller and the resource signals. However, when I created the sensor CR, the sensor-controller crashes with an error:

2018-07-31T21:56:17.374Z	ERROR	controller/controller.go:149	Error syncing sensor 'default/resource-example': the signal 'resource' does not exist with the signal universe. please choose one from: [calendar artifact webhook]

To Reproduce

  1. Run sensor-controller
  2. Run the resource signal.
$ kp
NAME                                 READY     STATUS    RESTARTS   AGE
artifacts-minio-85547b6bd9-vtbfd     1/1       Running   0          14d
sensor-controller-766675b9df-dzm64   1/1       Running   0          23m
signal-calendar-78d7c8f5c-v7r8j      1/1       Running   0          22m
signal-resource-64d57998d9-kj87m     1/1       Running   0          7m
signal-webhook-6c896d9b8f-xwnc2      1/1       Running   0          22m
  1. Verify that the service objects exist in the cluster:
$  k get svc
NAME              TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
artifacts-minio   ClusterIP   None            <none>        9000/TCP   14d
calendar          ClusterIP   100.70.43.60    <none>        8080/TCP   6d
kubernetes        ClusterIP   100.64.0.1      <none>        443/TCP    15d
resource          ClusterIP   100.71.110.61   <none>        8080/TCP   8m
  1. Create the following sensor object:
$ cat /stash/argo-events-examples/k8s.yaml
apiVersion: argoproj.io/v1alpha1
kind: Sensor
metadata:
  name: resource-example
  labels:
    sensors.argoproj.io/controller-instanceid: axis
spec:
  signals:
    - name: worklow-1
      resource:
        namespace: default
        group: ""
        version: "v1"
        kind: "Namespace"
        filter:
          prefix: scripts-bash
  triggers:
    - name: ns-workflow
      resource:
        namespace: default
        group: argoproj.io
        version: v1alpha1
        kind: Workflow
        source:
          inline: |
              apiVersion: argoproj.io/v1alpha1
              kind: Workflow
              metadata:
                generateName: hello-world-
              spec:
                entrypoint: whalesay
                templates:
                  -
                    container:
                      args:
                        - "hello world"
                      command:
                        - cowsay
                      image: "docker/whalesay:latest"
                    name: whalesay
  1. When the above sensor is created using kubectl create, the sensor controller throws the following errors:
2018-07-31T21:56:17.364Z	INFO	controller/signal.go:65	WARNING: event stream for signal 'worklow-1' is missing - could have missed events! reconnecting stream...	{"sensor": "resource-example", "namespace": "default"}
2018-07-31T21:56:17.374Z	ERROR	controller/controller.go:149	Error syncing sensor 'default/resource-example': the signal 'resource' does not exist with the signal universe. please choose one from: [calendar artifact webhook]
github.com/argoproj/argo-events/controller.(*SensorController).handleErr
	/Users/sjavadekar/ws/go/src/github.com/argoproj/argo-events/controller/controller.go:149
github.com/argoproj/argo-events/controller.(*SensorController).processNextItem
	/Users/sjavadekar/ws/go/src/github.com/argoproj/argo-events/controller/controller.go:121
github.com/argoproj/argo-events/controller.(*SensorController).runWorker
	/Users/sjavadekar/ws/go/src/github.com/argoproj/argo-events/controller/controller.go:185
github.com/argoproj/argo-events/controller.(*SensorController).(github.com/argoproj/argo-events/controller.runWorker)-fm
	/Users/sjavadekar/ws/go/src/github.com/argoproj/argo-events/controller/controller.go:178
github.com/argoproj/argo-events/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1
	/Users/sjavadekar/ws/go/src/github.com/argoproj/argo-events/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133
github.com/argoproj/argo-events/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil
	/Users/sjavadekar/ws/go/src/github.com/argoproj/argo-events/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134
github.com/argoproj/argo-events/vendor/k8s.io/apimachinery/pkg/util/wait.Until
	/Users/sjavadekar/ws/go/src/github.com/argoproj/argo-events/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88
ERROR: logging before flag.Parse: W0731 22:02:50.478195       1 reflector.go:341] github.com/argoproj/argo-events/controller/config.go:61: watch of *v1.ConfigMap ended with: too old resource version: 2561690 (2562505)

Expected behavior
The sensor-controller should correctly execute the workflow when a namespace is created.

@magaldima
Copy link
Contributor

@shrinandj I realized this error today myself. This occurred because of a careless mistake in the resource_signal.go. I pushed up a fix for this in: 29f06e4

I'm still attempting to run the resource signal myself (in preparation for a demo tomorrow) and running into some trouble. I'll update this ticket with my findings.

We may have to push out a patch release on top of the v0.5-beta1 for this.

@magaldima
Copy link
Contributor

I'm trying to create an argo "workflow" watch signal:

    - name: worklow-1
      resource:
        namespace: dev-axis
        group: argoproj.io
        version: v1alpha1
        kind: Workflow
        filter:
          prefix: coinflip-recursive
          labels:
            workflows.argoproj.io/phase: Succeeded

and i get this error in the controller logs:

time="2018-07-31T22:57:38Z" level=error msg="Error syncing sensor 'dev-axis/resource-example': rpc error: code = Unknown desc = unknown (get workflows.argoproj.io)"
time="2018-07-31T22:57:59Z" level=warning msg="WARNING: event stream for signal 'worklow-1' is missing - could have missed events! reconnecting stream..." namespace=dev-axis sensor=resource-example

I think the ResourceInterface is failing to Watch. I'm going to investigate this tomorrow. I think it can be tied to the client-go version or directly trying to watch Workflows. I think the argo workflow-controller doesn't even watch workflows directly but instead watches unstructured objects...

@shrinandj
Copy link
Contributor Author

The watch on the namespace resource fails because the signal deployment tries to watch the namespace scripts-bash in the namespace default which just seems wrong. Instead, the the K8s object to watch was, say a configMap, the signal deployment will watch for a configMap in a the default namespace which is correct.

@magaldima
Copy link
Contributor

the resource signal uses the client-go dynamic client to find the right ResourceInterface for the desired object. This method is used:

// Resource returns an API interface to the specified resource for this client's
// group and version.  If resource is not a namespaced resource, then namespace
// is ignored.  The ResourceInterface inherits the parameter codec of this client.
Resource(resource *metav1.APIResource, namespace string) ResourceInterface

I'm not sure what's going on here and if the default namespace is being respected. Do you have any logs from the resource-signal pod?

@shrinandj shrinandj added the bug Something isn't working label Aug 6, 2018
@shrinandj shrinandj self-assigned this Aug 6, 2018
@shrinandj
Copy link
Contributor Author

The signal-resource deployment throws the following errors:

time="2018-08-06T20:31:00Z" level=info msg="Resource signal listening: worklow-1\n"
time="2018-08-06T20:31:00Z" level=info msg="Adding /v1, Kind=Namespace to resources\n"
time="2018-08-06T20:31:00Z" level=info msg="Error watching resource &{0xc4205880c0 0xc420173040 default <nil>} with options {{ }   false true  <nil> 0 }\n"

The sensor-controller throws the following errors:

time="2018-08-06T20:31:00Z" level=info msg="Signal 'worklow-1' initialized: " namespace=default sensor=resource-example-9gm72
time="2018-08-06T20:31:00Z" level=info msg="Found signals: [calendar resource artifact]"
time="2018-08-06T20:31:00Z" level=info msg="Marking signal error! 1" namespace=default sensor=resource-example-9gm72
time="2018-08-06T20:31:00Z" level=info msg="Signal 'worklow-1' phase  -> Error" namespace=default sensor=resource-example-9gm72
time="2018-08-06T20:31:00Z" level=info msg="Signal 'worklow-1' message  -> rpc error: code = Unknown desc = unknown (get namespaces)" namespace=default sensor=resource-example-9gm72
time="2018-08-06T20:31:00Z" level=info msg="Signal 'worklow-1' completed: 2018-08-06 20:31:00.111040859 +0000 UTC" namespace=default sensor=resource-example-9gm72
time="2018-08-06T20:31:01Z" level=error msg="Error syncing sensor 'default/resource-example-9gm72': rpc error: code = Unknown desc = unknown (get namespaces)"
time="2018-08-06T20:31:01Z" level=warning msg="WARNING: event stream for signal 'worklow-1' is missing - could have missed events! reconnecting stream..." namespace=default sensor=resource-example-9gm72

@shrinandj
Copy link
Contributor Author

I have a suspicion that this might be a client-go bug.

  1. resource.Listen in resource.go calls r.discoverResources.
  2. r.discoverResources creates a new dynamicClient and a new discoveryClient.
  3. For the given objects group version (v1 in this case), all the resource interfaces are then obtained.
  4. From the resource interface, we extract the apiResources and match the apiResource whose kind matches the given objects kind (Namespace in this case).
  5. Then, it is confirmed that the apiResource supports the watch API
  6. And after this, the dynamicClient for this object is obtained and the resource object from the client is appended to resources.
  7. This resource object is used to invoke the Watch api. The watch API throws the error: rpc error: code = Unknown desc = unknown (get namespaces)

While this is fairly cumbersome, it seems like the correct sequence of steps with appropriate checks in place.

@magaldima Do you have any other ideas? Otherwise, I can file a bug in the main kubernetes repo with the appropriate code snippets to check if anyone else can either validate that the bug exists or suggest changes to workaroud.

@magaldima
Copy link
Contributor

magaldima commented Aug 7, 2018

I know many enhancements were made to the discovery client in the v8 go client. Maybe try upgrading and see if this also solves the config map watch issue. If that doesn’t work, we can raise an issue in the main kubernetes repo.

EDIT: i don't think upgrading will help as client-go version 7.0 should be in sync with kubernetes v1.10.

@VaibhavPage
Copy link
Contributor

Addressed in PR #100

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants