[question] about the HA-cluster #2329

sunhubs · 2021-06-10T02:43:18Z

I want to learn more details about the HA-cluster.
As I know, microk8s will sync the cluster information into the folder /var/snap/microk8s/current/var/kubenetes/backend, so I want to know:
1, the content of the data, I want to check whether it is needed to do frequent sync.
2, the default frequency and how to control it.
3, the cost time for every sync.
It is appreciate that you can provide some document or code snippnet for checking.

ktsakalozos · 2021-06-10T06:08:54Z

The kubernetes API server needs a datastore to store its state. In MicroK8s by default this datastore is dqlite [1]. Three nodes in an dqlite HA cluster play the role of the voter. The voters maintain a copy of the database and they elect a leader. The leader is there to ensure everything written on the database is first replicated to the majority of the voters. This replication is part of the Raft [3] consensus protocol. To your questions:

1, the content of the data, I want to check whether it is needed to do frequent sync.

You do not need to sync anything. Data are replicated across the nodes as part of the consensus protocol.

2, the default frequency and how to control it.

Every time something is written on the database it is first replicated across the voters. There is no periodic syncing and thus there is no frequency to control

3, the cost time for every sync

The cost of replicating the database has to do with the workload it serves. There might be Kubernetes workloads that store lots of data on the API server is the form of CRD/resources. For every time a resource gets created/updated/deleted a respective operation needs to be replicated on the majority of the voters.

May I ask, how do you plan to use MicroK8s? Are you under hardware constraints or you need to comply to certain specifications?

[1] https://github.com/canonical/dqlite
[2] https://dqlite.io/
[3] https://raft.github.io/

sunhubs · 2021-06-10T06:29:20Z

@ktsakalozos thanks for your response. actually I just use microk8s as a single node. but I encouter the dqlite-stored data corrupted, which will lead to a failure start and it is not able to recover. And I find a method for ha-cluster metioned in https://discuss.kubernetes.io/t/recovery-of-ha-microk8s-clusters/12931, so that i want to backup the data and use it to recover the microk8s, this has been verified. So in such situation, I just want to confirm what is stored and the frequency, so I can check when and how often is the best answer for the backup operation.
As your response, "something is written on the database" will cause a sync to hard disk? may I know what kind of data will be written into database, the k8s resources, the system hardware or software information?

ktsakalozos · 2021-06-10T08:18:55Z

As your response, "something is written on the database" will cause a sync to hard disk?

Yes every time something if written in the database the data are persisted on the disk and synced across nodes.

may I know what kind of data will be written into database, the k8s resources, the system hardware or software information?

The data written on disk are the k8s resources. Other cluster configuration you may want to backup are /var/snap/microk8s/current/args and /var/snap/microk8s/current/certs with service arguments and certificates respectively.

MathieuBordere · 2021-06-10T08:25:32Z

Hi @sunhubs, could you give us some more details on the data corruption? Do you get a useful error message? Can you list the contents of the database folder ls -alh /var/snap/microk8s/current/var/kubenetes/backend ? You can send me the contents of the backend folder if you want (If it doesn't contain sensitive info) at mathieu.bordere@canonical.com, so I can take a look.

Thanks

sunhubs · 2021-06-10T09:55:48Z

@ktsakalozos from my experience, the two /var/snap/microk8s/current/args and /var/snap/microk8s/current/certs with service arguments are not the problem, i can recover the microk8s by just replace the /var/snap/microk8s/current/var/kubenetes/backend.
As I monitor, the similar file 0000000002847710-0000000002848259 or snapshot-* will be updated every 1~2 minutes, but during a long time, for example a few hours, i do not do any change on the k8s resource. may i know the reason?
1, there are some system resource need to update? 2, some other data i do not know need to update?

sunhubs · 2021-06-10T10:06:47Z

@MathieuBordere sorry for that i do not keep the clean corrupt data now, but i can replay my fault recover steps and describe the errors:
1, the cluster.yaml is not right, it is not a file contains info about cluster, but a non-ascii string, and the microk8s starts fail due to a not recognized yaml file.
2, i update the cluster.yaml from a fresh installed server, and the microk8s starts fail again, the reason seems to be that dqlite read the snapshot not matched(the exact error is not recorded, i will update here if the errors happen again)
3, so that i finally replace all the folder completely and it works.
What is more, why my data corrupt? i install microk8s on a centos which is in a vmware machine on my windows desktop. But unfortunately i reboot the virtual machine(the first time) and the windows desktop hang(the second time), after restart the host and vm, the microk8s enters an error status.

sunhubs · 2021-06-21T02:38:07Z

thanks @MathieuBordere. Hopes it will be fixed in the future.

MathieuBordere mentioned this issue Jun 16, 2021

cluster.yaml becomes empty / corrupted canonical/go-dqlite#145

Closed

sunhubs closed this as completed Sep 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[question] about the HA-cluster #2329

[question] about the HA-cluster #2329

sunhubs commented Jun 10, 2021

ktsakalozos commented Jun 10, 2021

sunhubs commented Jun 10, 2021

ktsakalozos commented Jun 10, 2021

MathieuBordere commented Jun 10, 2021

sunhubs commented Jun 10, 2021

sunhubs commented Jun 10, 2021 •

edited

Loading

sunhubs commented Jun 21, 2021

[question] about the HA-cluster #2329

[question] about the HA-cluster #2329

Comments

sunhubs commented Jun 10, 2021

ktsakalozos commented Jun 10, 2021

sunhubs commented Jun 10, 2021

ktsakalozos commented Jun 10, 2021

MathieuBordere commented Jun 10, 2021

sunhubs commented Jun 10, 2021

sunhubs commented Jun 10, 2021 • edited Loading

sunhubs commented Jun 21, 2021

sunhubs commented Jun 10, 2021 •

edited

Loading