New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1 slow ops, oldest one blocked for 1880444 sec, mon.e has slow ops #14126
Comments
please find the attached logs |
attached the operator logs and mon-e logs please let me know how to resolve, its production issue, please this is high priority issue |
Are you saying this issue is preventing you from taking backups?
It is just a health warning, so it is surprising if it is blocking the IO. If it is just mon.e with the problem, you could try scaling down that mon and fail it over to see if a new mon will help. |
@travisn yes due to this we are not able to take backup because we are using the persistant volumes to take backup, is there any way we can modify the permisions to access the PVC ? |
@travisn every 45 sec operator will monitor the mon's right it should be resolve by operator, why its not doing it, |
@travisn could you please any advice on how we can enable to self heal with operator |
You can try the direct mount tools |
Are you referring to the mon failover? As long as mons are up, this is a different issue. |
@travisn yes, if mon fails automatically operator pod resolve the issue it should comeup right ? regards mount issue: if i want to try with directmount tools , how i can confirm that specific workernode, we have issue with sepcific workernode, we are using sharedfilesys as type |
Yes, a mon will failover by default after 10 min if it is out of quorum. The doc linked previously explains more.
If you want to mount it from a specific worker node, you could create the direct tools pod with a node selector. |
@travisn we can see below logs in the rbdplugin-provisioner, is this causing not able to access the mountpoint getting permision denied `sudo kubectl logs csi-rbdplugin-provisioner-84c7fb8d76-q68lv -n rook-ceph csi-provisioner ` |
@travisn rbdplugin-provisioner is mounting emptyDir, what it write in the emptyDir ? |
Can you provide the following?
|
Hi Team,
we have installed rook-ceph in kubernetes cluster, and we are using hostbased deployment of rook-ceph, we have 3 workernodes attached the to the disk and recently we have issue on one of the workernode, we have restared the workernode.
after workernode has restared we can see the status below in rook-ceph, how can we resolve this issue
cluster:
id: cc9da5b7-2e86-4abc-971a-701ce9b3d532
health: HEALTH_WARN
1 slow ops, oldest one blocked for 1880444 sec, mon.e has slow ops
services:
mon: 3 daemons, quorum c,e,f (age 3w)
mgr: a(active, since 3w)
mds: 1/1 daemons up, 1 hot standby
osd: 3 osds: 3 up (since 3w), 3 in (since 3w)
data:
volumes: 1/1 healthy
pools: 3 pools, 65 pgs
objects: 3.05k objects, 516 MiB
usage: 3.2 GiB used, 1.8 TiB / 1.8 TiB avail
pgs: 65 active+clean
io:
client: 1.2 KiB/s rd, 2 op/s rd, 0 op/s wr
Logs to submit:
Operator's logs, if necessary
Crashing pod(s) logs, if necessary
To get logs, use
kubectl -n <namespace> logs <pod name>
When pasting logs, always surround them with backticks or use the
insert code
button from the Github UI.Read GitHub documentation if you need help.
Cluster Status to submit:
Output of kubectl commands, if necessary
To get the health of the cluster, use
kubectl rook-ceph health
To get the status of the cluster, use
kubectl rook-ceph ceph status
For more details, see the Rook kubectl Plugin
Environment:
OS (e.g. from /etc/os-release): ubuntu
Kernel (e.g.
uname -a
): Linux master1 4.15.0-206-generic cache node id in the data dir #217-Ubuntu SMP Fri Feb 3 19:10:13 UTC 2023 x86_64 x86_64 x86_64 GNU/LinuxCloud provider or hardware configuration: its on premises
Rook version (use
rook version
inside of a Rook Pod): v8.1.1Storage backend version (e.g. for ceph do
ceph -v
): v16.2.10Kubernetes version (use
kubectl version
):v1.21.5Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): on premises cluster
Storage backend status (e.g. for Ceph use
ceph health
in the Rook Ceph toolbox):cluster:
id: cc9da5b7-2e86-4abc-971a-701ce9b3d532
health: HEALTH_WARN
1 slow ops, oldest one blocked for 1880444 sec, mon.e has slow ops
services:
mon: 3 daemons, quorum c,e,f (age 3w)
mgr: a(active, since 3w)
mds: 1/1 daemons up, 1 hot standby
osd: 3 osds: 3 up (since 3w), 3 in (since 3w)
data:
volumes: 1/1 healthy
pools: 3 pools, 65 pgs
objects: 3.05k objects, 516 MiB
usage: 3.2 GiB used, 1.8 TiB / 1.8 TiB avail
pgs: 65 active+clean
io:
client: 1.2 KiB/s rd, 2 op/s rd, 0 op/s wr
The text was updated successfully, but these errors were encountered: