Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when SCHEDULER_CRON_EXPRESSION is set without UPDATE_WINDOW_START & UPDATE_WINDOW_STOP #428

Closed
prashant-prodigal opened this issue Mar 9, 2023 · 15 comments

Comments

@prashant-prodigal
Copy link

We are installing bottlerocket update operator in an EKS with no internet access. But the operator deployment starts failing, its giving this error:

2023-03-09T13:12:28.369373Z INFO actix_server::builder: starting 2 workers
at /src/.cargo/registry/src/github.com-1ecc6299db9ec823/actix-server-2.2.0/src/builder.rs:200

2023-03-09T13:12:28.369436Z ERROR controller: controller exited
at controller/src/main.rs:110

I am using Latest image as per the docs. Could you pls point me to what could be wrong here?

@jpmcb
Copy link
Contributor

jpmcb commented Mar 9, 2023

Hi @prashant-prodigal - the error you are seeing at controller/src/main.rs:110 is related to the controller's metric server attempting to bind to the local loopback network and start serving metrics.

We are installing bottlerocket update operator in an EKS with no internet access.

Note that the Bottlerocket update operator requires network access to updates.bottlerocket.aws: this is how update operator system queries for new OS updates. Read more about it here: https://github.com/bottlerocket-os/bottlerocket-update-operator#why-are-my-bottlerocket-nodes-egressing-to-httpsupdatesbottlerocketaws

Does your node have some network attached? In order for the prometheus server to come up, it'll at least need to be able to bind on 0.0.0.0 for IPv4 clusters or [::] for IPv6 clusters.

Can you provide the full logs from the failed controller deployment?

 kubectl logs -n brupop-bottlerocket-aws pod/brupop-controller-deployment-{YOUR-DEPLOYMENT}

@prashant-prodigal
Copy link
Author

Hello, I have allowed the URL https://updates.bottlerocket.aws still we are getting below error from command
kubectl logs -n brupop-bottlerocket-aws pod/brupop-controller-deployment-{YOUR-DEPLOYMENT}

2023-03-10T04:36:28.369259Z INFO actix_server::builder: starting 2 workers
at /src/.cargo/registry/src/github.com-1ecc6299db9ec823/actix-server-2.2.0/src/builder.rs:200

2023-03-10T04:36:28.369368Z ERROR controller: controller exited
at controller/src/main.rs:110

@jpmcb
Copy link
Contributor

jpmcb commented Mar 10, 2023

What's the shape of your network? Are there any other logs in from the other update operator components?

@blakeromano
Copy link

blakeromano commented Apr 11, 2023

I am seeing the same thing when my node has access to egress.

brupop-controller-deployment-875956b84-l42nf   0/1     CrashLoopBackOff   7 (2m32s ago)   13m
2023-04-11T20:56:31.570124Z  INFO actix_server::builder: starting 1 workers
at /src/.cargo/registry/src/github.com-1ecc6299db9ec823/actix-server-2.2.0/src/builder.rs:200
2023-04-11T20:56:31.570208Z ERROR controller: controller exited
at controller/src/main.rs:110

With the deployment like

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/component: brupop-controller
    app.kubernetes.io/managed-by: brupop
    app.kubernetes.io/part-of: brupop
    brupop.bottlerocket.aws/component: brupop-controller
  name: brupop-controller-deployment
  namespace: brupop-bottlerocket-aws
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      brupop.bottlerocket.aws/component: brupop-controller
  strategy:
    type: Recreate
  template:
    metadata:
      creationTimestamp: null
      labels:
        brupop.bottlerocket.aws/component: brupop-controller
      namespace: brupop-bottlerocket-aws
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.io/os
                operator: In
                values:
                - linux
              - key: kubernetes.io/arch
                operator: In
                values:
                - amd64
                - arm64
      containers:
      - command:
        - ./controller
        env:
        - name: MY_NODE_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: spec.nodeName
        - name: SCHEDULER_CRON_EXPRESSION
          value: '* * * * * * *'
        - name: MAX_CONCURRENT_UPDATE
          value: "1"
        image: public.ecr.aws/bottlerocket/bottlerocket-update-operator:v1.1.0
        imagePullPolicy: IfNotPresent
        name: brupop
        resources:
          limits:
            cpu: 10m
            memory: 50Mi
          requests:
            cpu: 3m
            memory: 8Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      priorityClassName: brupop-controller-high-priority
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: brupop-controller-service-account
      serviceAccountName: brupop-controller-service-account
      terminationGracePeriodSeconds: 30

@ghost
Copy link

ghost commented Apr 20, 2023

Good afternoon team,

Is there any further information regarding this issue? We're currently facing the same issue in an installation we have done this morning using operator version v1.1.0

$> kubectl logs deployment/brupop-controller-deployment --namespace brupop-bottlerocket-aws
  2023-04-20T09:54:18.670695Z  INFO actix_server::builder: starting 1 workers
    at /src/.cargo/registry/src/github.com-1ecc6299db9ec823/actix-server-2.2.0/src/builder.rs:200

  2023-04-20T09:54:18.670766Z  INFO actix_server::server: Actix runtime found; starting in Actix runtime
    at /src/.cargo/registry/src/github.com-1ecc6299db9ec823/actix-server-2.2.0/src/server.rs:196

   2023-04-20T09:54:18.968337Z ERROR controller: controller exited
    at controller/src/main.rs:110

It is deployed in a regular EKS cluster with no customizations. Services are configured to use IPv4 addresses. The current BottleRocket version is 1.12.0 and the only workloads currently installed besides the default ones are:

  • cert-manager 1.11.0
  • Nginx Ingress Controller 1.7.0

Let us know please if we can help with providing any other information.

@tmahalligan
Copy link

For me this error happens when SCHEDULER_CRON_EXPRESSION is set and UPDATE_WINDOW_START & UPDATE_WINDOW_STOP are removed

If all three are present then the controller runs fine, though I am pretty sure my SCHEDULER_CRON_EXPRESSION is ignored

In my case I am rolling back to use of UPDATE_WINDOW_START & UPDATE_WINDOW_STOP to control update window.

@ghost
Copy link

ghost commented May 13, 2023

Thanks for the tip @tmahalligan, we're going to give it a try!!

@prashant-prodigal
Copy link
Author

prashant-prodigal commented May 16, 2023

Thanks @tmahalligan. This has solved the problem and i am able to run the controller now.
@jpmcb This might be a bug you would like to address?
Also @jpmcb are UPDATE_WINDOW_START & UPDATE_WINDOW_STOP ignored when SCHEDULER_CRON_EXPRESSION is set?

@stmcginnis stmcginnis changed the title Error while installing bottlerocket-update-operator Error when SCHEDULER_CRON_EXPRESSION is set without UPDATE_WINDOW_START & UPDATE_WINDOW_STOP May 16, 2023
@stmcginnis
Copy link
Contributor

Updated title to reflect what I think is the root issue here. Please correct me if I'm wrong.

@stmcginnis
Copy link
Contributor

stmcginnis commented May 16, 2023

Verified this is expected behavior when both a time window and a cron expression are provided:

return scheduler_error::DisallowSetTimeWindowAndSchedulerSnafu {}.fail();

This could be handled a little more gracefully though...

Edit: Actually... that is the opposite of what is noted above:

For me this error happens when SCHEDULER_CRON_EXPRESSION is set and UPDATE_WINDOW_START & UPDATE_WINDOW_STOP are removed

If all three are present then the controller runs fine, though I am pretty sure my SCHEDULER_CRON_EXPRESSION is ignored

More investigation needed then.

@gthao313
Copy link
Member

@tmahalligan Hi. what version of bottlerocket update operator container were you using? I think it might because you were using the latest version yaml file but still use the old bottlerocket update operator. cron scheduler is a new feature which we will introduce in next release, so the errors on the controller could be related to the system still need time window but cron expression scheduler provided. Can you try to use this yaml file? thanks!

@tmahalligan
Copy link

Am using v1.1.0 here is relevant config @gthao313

containers:

  • command:
    • ./controller
      env:
    • name: MY_NODE_NAME
      valueFrom:
      fieldRef:
      apiVersion: v1
      fieldPath: spec.nodeName
    • name: MAX_CONCURRENT_UPDATE
      value: "1"
    • name: SCHEDULER_CRON_EXPRESSION
      value: '* * * * 6'
    • name: UPDATE_WINDOW_START
      value: "0:0:0"
    • name: UPDATE_WINDOW_STOP
      value: "0:0:0"
      image: public.ecr.aws/bottlerocket/bottlerocket-update-operator:v1.1.0
      imagePullPolicy: IfNotPresent
      name: brupop

@gthao313
Copy link
Member

@tmahalligan yeah, v1.1.0 doesn't have SCHEDULER_CRON_EXPRESSION, and we plan to release v1.2.0 later which will introduce cron scheduler. Currently, can you remove SCHEDULER_CRON_EXPRESSION from the config and everything should be work. This is the v1.1.0 config. : )

@tmahalligan
Copy link

I was under the impression from the documentation https://github.com/bottlerocket-os/bottlerocket-update-operator#set-scheduler that the released version of the Operator supported the cron functionality. I assume others may have made same mistake perhaps the docs should be amended. @gthao313

Thanks for the follow-up will adjust and wait on next release

@jpmcb
Copy link
Contributor

jpmcb commented May 17, 2023

Also note that we attach the relevant configs to the release for each version: https://github.com/bottlerocket-os/bottlerocket-update-operator/releases/tag/v1.1.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants