Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[question] about the HA-cluster #2329

Closed
sunhubs opened this issue Jun 10, 2021 · 7 comments
Closed

[question] about the HA-cluster #2329

sunhubs opened this issue Jun 10, 2021 · 7 comments

Comments

@sunhubs
Copy link

sunhubs commented Jun 10, 2021

I want to learn more details about the HA-cluster.
As I know, microk8s will sync the cluster information into the folder /var/snap/microk8s/current/var/kubenetes/backend, so I want to know:
1, the content of the data, I want to check whether it is needed to do frequent sync.
2, the default frequency and how to control it.
3, the cost time for every sync.
It is appreciate that you can provide some document or code snippnet for checking.

@ktsakalozos
Copy link
Member

The kubernetes API server needs a datastore to store its state. In MicroK8s by default this datastore is dqlite [1]. Three nodes in an dqlite HA cluster play the role of the voter. The voters maintain a copy of the database and they elect a leader. The leader is there to ensure everything written on the database is first replicated to the majority of the voters. This replication is part of the Raft [3] consensus protocol. To your questions:

1, the content of the data, I want to check whether it is needed to do frequent sync.

You do not need to sync anything. Data are replicated across the nodes as part of the consensus protocol.

2, the default frequency and how to control it.

Every time something is written on the database it is first replicated across the voters. There is no periodic syncing and thus there is no frequency to control

3, the cost time for every sync

The cost of replicating the database has to do with the workload it serves. There might be Kubernetes workloads that store lots of data on the API server is the form of CRD/resources. For every time a resource gets created/updated/deleted a respective operation needs to be replicated on the majority of the voters.

May I ask, how do you plan to use MicroK8s? Are you under hardware constraints or you need to comply to certain specifications?

[1] https://github.com/canonical/dqlite
[2] https://dqlite.io/
[3] https://raft.github.io/

@sunhubs
Copy link
Author

sunhubs commented Jun 10, 2021

@ktsakalozos thanks for your response. actually I just use microk8s as a single node. but I encouter the dqlite-stored data corrupted, which will lead to a failure start and it is not able to recover. And I find a method for ha-cluster metioned in https://discuss.kubernetes.io/t/recovery-of-ha-microk8s-clusters/12931, so that i want to backup the data and use it to recover the microk8s, this has been verified. So in such situation, I just want to confirm what is stored and the frequency, so I can check when and how often is the best answer for the backup operation.
As your response, "something is written on the database" will cause a sync to hard disk? may I know what kind of data will be written into database, the k8s resources, the system hardware or software information?

@ktsakalozos
Copy link
Member

As your response, "something is written on the database" will cause a sync to hard disk?

Yes every time something if written in the database the data are persisted on the disk and synced across nodes.

may I know what kind of data will be written into database, the k8s resources, the system hardware or software information?

The data written on disk are the k8s resources. Other cluster configuration you may want to backup are /var/snap/microk8s/current/args and /var/snap/microk8s/current/certs with service arguments and certificates respectively.

@MathieuBordere
Copy link

Hi @sunhubs, could you give us some more details on the data corruption? Do you get a useful error message? Can you list the contents of the database folder ls -alh /var/snap/microk8s/current/var/kubenetes/backend ? You can send me the contents of the backend folder if you want (If it doesn't contain sensitive info) at mathieu.bordere@canonical.com, so I can take a look.

Thanks

@sunhubs
Copy link
Author

sunhubs commented Jun 10, 2021

@ktsakalozos from my experience, the two /var/snap/microk8s/current/args and /var/snap/microk8s/current/certs with service arguments are not the problem, i can recover the microk8s by just replace the /var/snap/microk8s/current/var/kubenetes/backend.
As I monitor, the similar file 0000000002847710-0000000002848259 or snapshot-* will be updated every 1~2 minutes, but during a long time, for example a few hours, i do not do any change on the k8s resource. may i know the reason?
1, there are some system resource need to update? 2, some other data i do not know need to update?

@sunhubs
Copy link
Author

sunhubs commented Jun 10, 2021

@MathieuBordere sorry for that i do not keep the clean corrupt data now, but i can replay my fault recover steps and describe the errors:
1, the cluster.yaml is not right, it is not a file contains info about cluster, but a non-ascii string, and the microk8s starts fail due to a not recognized yaml file.
2, i update the cluster.yaml from a fresh installed server, and the microk8s starts fail again, the reason seems to be that dqlite read the snapshot not matched(the exact error is not recorded, i will update here if the errors happen again)
3, so that i finally replace all the folder completely and it works.
What is more, why my data corrupt? i install microk8s on a centos which is in a vmware machine on my windows desktop. But unfortunately i reboot the virtual machine(the first time) and the windows desktop hang(the second time), after restart the host and vm, the microk8s enters an error status.

@sunhubs
Copy link
Author

sunhubs commented Jun 21, 2021

thanks @MathieuBordere. Hopes it will be fixed in the future.

@sunhubs sunhubs closed this as completed Sep 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants