Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need help in setting up dkron cluster on K8S #1191

Closed
nikunjbadjatya opened this issue Sep 25, 2022 · 11 comments
Closed

Need help in setting up dkron cluster on K8S #1191

nikunjbadjatya opened this issue Sep 25, 2022 · 11 comments

Comments

@nikunjbadjatya
Copy link

nikunjbadjatya commented Sep 25, 2022

Hello.
Using dkron 3.2.1 on GKE as a statefulset with 3 replicas. (Also created a headless service so that each pod can discover other pods using dns plus created a load balancer to access the service and UI from outside).

Q1) Not able to setup cluster via cloud auto join:
We are currently evaluating dkron and trying to setup the cluster via cloud auto join.
We are on K8s (type statefulset and replica 3).

We have this in each pods dkron.yaml

server: true
data-dir: dkron.data
bootstrap-expect: 3
retry-join: ["provider=k8s label_selector=\"app=dkron,component=server\""]
log-level: debug

We are running dkron in debug mode and are seeing errors in the logs like below for each pod:

level=warning msg="agent: Join LAN failed: no servers to join, retrying in 30s"
level=info msg="2022/09/23 xx:xx:xx discover.go:178: [DEBUG] discover: Using provider \"k8s\""
level=info msg="agent: Discovered LAN servers: " node=dkron-xxx-xxx
  • All the 3 pods have the same labels present.
  • Removing retry join line from config yaml and starting dkron works fine but not in cluster mode as expected.
  • Removing retry join and adding "join" block in config yaml and providing static POD IPs sets up the cluster properly.
  • Pods have list permissions for other pods.

any idea as to what could be the issue here ?

Q2) Would adding DNS entries in 'join' in dkron.yaml work as same way as adding static IP addresses, for cluster join? We tested this in our environment but want to be sure as there is not enough documentation on this.
Example:

join:
        - dkron-new-0.dkron-new.namespace1.svc.cluster.local
        - dkron-new-1.dkron-new.namespace1.svc.cluster.local
        - dkron-new-2.dkron-new.namespace1.svc.cluster.local

Q3) Is there official upper limit on number of schedules we can set ?

Q4) Is a persistent data-dir plus valid dns entries in dkron.yaml is all needed for HA ?

Thanks.

@nikunjbadjatya
Copy link
Author

nikunjbadjatya commented Sep 28, 2022

@vcastellm can you help with above queries ? Thanks for your time on this.

@vcastellm
Copy link
Member

Hey @nikunjbadjatya

Q1) I will need to take a few tests, even though I do not actively test dkron in k8s.

Hopefully other experienced users running in k8s can respond to this question.

Q2) Yes, this resolves to the node IP, but consider that the IP shouldn't change during the lifecycle of the cluster.

Q3) There's not set limit on the number of jobs, it's solely based on the resources of the node. Dkron it's being running with hundred thousands of jobs by some users.

Q4) Yes, that's all.

Additional note: Consider that depending on your use case, maybe you only need one pod as long as k8s maintains it running, you won't get much benefit of a clustered setup.

@smullins3000
Copy link

@nikunjbadjatya did you ever resolve your issues running a dkron server cluster in k8s? About to embark on this as well.

@nikunj-badjatya
Copy link

@smullins3000
For some reasons we are not able to make cloud auto join work on K8S - tried in config yaml or via command line.
I tried on GCE as well but there too it is not able to discover the other nodes. We need concrete setup guide around this from the dkron community.

adding static IPs list in dkron.yml works.

@nikunj-badjatya
Copy link

nikunj-badjatya commented Oct 12, 2022

Hi @vcastellm , @yvanoers

More follow up questions to help us complete our dkron evaluation. I tried to find related information in doc but could not find one.

Quoting your reply to Q2 and Q4.

Q2) Yes, this resolves to the node IP, but consider that the IP shouldn't change during the lifecycle of the cluster.

Q4) Yes, that's all.

So would the DNS entries work in a K8S cluster setup and recovery in case if a pod goes down but comes up with new IP ?

Can we get some examples (K8S manifests yaml, dkron config yaml etc.) on how to make cloud auto join work in K8S ?
Please see the error we are facing in my first query above.
Update: This is resolved now.

In case of partial/full disaster (a dkron server node goes down), how much time does it take for a new node to fully sync with existing nodes in the cluster ?

As I understand, if I have 3 node setup all running in server mode, technically I can send requests to any node to setup a job and the other 2 nodes would sync internally ? Are these 3 nodes active-active ?

In a dkron cluster, say my schedule targets only one node with matching a tag (https://dkron.io/docs/usage/target-nodes-spec/) and that node is down, what would happen to that schedule trigger ? As retries also go to the same node.

Thanks.

@nikunj-badjatya
Copy link

@vcastellm , @yvanoers

@yxxhero
Copy link

yxxhero commented Oct 20, 2022

@nikunj-badjatya I meet the same issue. do you solve those issues? Thanks very much.

@nikunj-badjatya
Copy link

@yxxhero
Not yet. Awaiting info from someone from core contributors to help here.
Came across this PR #704. This is more than 1.5 years old PR.

@nikunj-badjatya
Copy link

@vcastellm , @yvanoers
Would really appreciate if we can get the responses for above questions. Thanks for your time.

@yxxhero
Copy link

yxxhero commented Nov 4, 2022

me too.

@vcastellm
Copy link
Member

Check if it work with the latest version of https://github.com/distribworks/dkron-helm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants