You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dec 14 09:31:11 microk8s-55-1-3 microk8s.daemon-kubelite[9310]: I1214 09:31:11.024783 9310 job_controller.go:562] "enqueueing job" key="rook-ceph/rook-ceph-csi-detect-version"
Dec 14 09:31:11 microk8s-55-1-3 microk8s.daemon-kubelite[9310]: I1214 09:31:11.106641 9310 replica_set.go:676] "Finished syncing" kind="ReplicaSet" key="rook-ceph/csi-rbdplugin-provisioner-5885496bf5" duration="83.646528ms"
Dec 14 09:31:11 microk8s-55-1-3 microk8s.daemon-kubelite[9310]: I1214 09:31:11.116299 9310 replica_set.go:676] "Finished syncing" kind="ReplicaSet" key="rook-ceph/csi-rbdplugin-provisioner-5885496bf5" duration="651.454µs"
Dec 14 09:31:11 microk8s-55-1-3 microk8s.daemon-kubelite[9310]: I1214 09:31:11.709716 9310 job_controller.go:562] "enqueueing job" key="rook-ceph/rook-ceph-csi-detect-version"
Dec 14 09:31:17 microk8s-55-1-3 microk8s.daemon-k8s-dqlite[11911]: time="2023-12-14T09:31:17Z" level=error msg="error in txn: exec (try: 500): database is locked"
Dec 14 09:31:17 microk8s-55-1-3 microk8s.daemon-kubelite[9310]: E1214 09:31:17.734550 9310 status.go:71] apiserver received an error that is not an metav1.Status: &status.Error{s:(*status.Status)(0xc00950fd48)}: rpc error: code = Unknown desc = exec (try: 500): database is locked
Dec 14 09:31:17 microk8s-55-1-3 microk8s.daemon-kubelite[9310]: I1214 09:31:17.734937 9310 trace.go:236] Trace[1854421033]: "Create" accept:application/json, */*,audit-id:c1a0ffa6-961c-4f85-92c5-da79207e583e,client:10.246.167.34,protocol:HTTP/2.0,resource:namespaces,scope:resource,url:/api/v1/namespaces,user-agent:Go-http-client/2.0,verb:POST (14-Dec-2023 09:31:15.682) (total time: 2052ms):
Dec 14 09:31:17 microk8s-55-1-3 microk8s.daemon-kubelite[9310]: Trace[1854421033]: ["Create etcd3" audit-id:c1a0ffa6-961c-4f85-92c5-da79207e583e,key:/namespaces/juju-system-microk8s,type:*core.Namespace,resource:namespaces 2030ms (09:31:15.704)
Dec 14 09:31:17 microk8s-55-1-3 microk8s.daemon-kubelite[9310]: Trace[1854421033]: ---"Txn call failed" err:rpc error: code = Unknown desc = exec (try: 500): database is locked 2029ms (09:31:17.734)]
Dec 14 09:31:17 microk8s-55-1-3 microk8s.daemon-kubelite[9310]: Trace[1854421033]: [2.052580676s] [2.052580676s] END
What Should Happen Instead?
A database locked error, which is probably a temporary issue while creating a namespace, should be handled more gracefully, perhaps retrying after some time. It should at least be reported more gracefully to Juju if the retry must be done by that. See feedback from the Juju team in the relevant issue filed with them LP#2046471.
Reproduction Steps
Such issues are difficult to reproduce, but we have a lot of information and crash-dump available in our Solutions QA labs (accessible only to Canonical employees). In any case, a short summary would be as follows:
Using Juju, create a new microk8s_cloud with three nodes (provided by MAAS).
Attempt to create a namespace in this cloud via Juju:
Hi @motjuste, thank you for reporting this. This kind of behavior we often see when the node is under I/O starvation. Usually there is a background process (it does not need to be a pod/container) that performs some I/O heavy task so the kubernetes datastore does not have the chance to write on the disk and distribute the write operation to its peers. The failure of creating the namespace is just a symptom. In the reported case the I/O operation failed because for two seconds we could not reach the DB. During that time we retried the request 500 times. What else was happening on the physical node at that time? Maybe ceph was syncing?
Summary
The charm does not gracefully handle
database locked
RPC error when creating a new namespace via Juju.In the
syslog
for the main machine:What Should Happen Instead?
A
database locked
error, which is probably a temporary issue while creating a namespace, should be handled more gracefully, perhaps retrying after some time. It should at least be reported more gracefully to Juju if the retry must be done by that. See feedback from the Juju team in the relevant issue filed with them LP#2046471.Reproduction Steps
Such issues are difficult to reproduce, but we have a lot of information and crash-dump available in our Solutions QA labs (accessible only to Canonical employees). In any case, a short summary would be as follows:
microk8s_cloud
with three nodes (provided by MAAS).$ juju add-model -c ... juju-system-microk8s microk8s_cloud
Environment
MicroK8s charm track: 1.28/stable
Juju version: 3.1.6
Cloud: MaaS
Additional info, logs
Apart from the parts of the logs mentioned above, a lot of information can be found in the Solutions QA labs, but that is accessible only to Canonical employees. There, the relevant crash-dump artefact can be found at generated/generated/microk8s/juju-crashdump-microk8s-2023-12-14-09.32.16.tar.gz.
The text was updated successfully, but these errors were encountered: