MDS behind on trimming every 4-5 weeks causing issue for ceph filesystem #14220

akash123-eng · 2024-05-16T07:33:22Z

Hi,

We are using rook-ceph with operator 1.10.8 and ceph 17.2.5.
we are using ceph filesystem with 4 mds i.e 2 active & 2 standby MDS
every 3-4 weeks filesystem is having issue i.e in ceph status we can see below warnings warnings :

2 MDS reports slow requests 
2 MDS Behind on Trimming
mds.myfs-a(mds.1) : behind on trimming (6378/128) max_segments:128, num_segments: 6378
mds.myfs-c(mds.1):  behind on trimming (6560/128) max_segments:128, num_segments: 6560

to fix it, we have to restart all MDS pods one by one.
this is happening every 4-5 weeks.

We have seen many ceph issues related to it on ceph tracker and many people are suggesting to increase mds_cache_memory_limit
currently for our cluster mds_cache_memory_limit is set to default 4GB
mds_log_max_segments is set to default 128
should we increase mds_cache_memory_limit to 8GB from default 4GB or is there any solution to fix this issue permenantly?

Environment:
Kubernetes

OS (e.g. from /etc/os-release): Centos 7.9
Rook version (use rook version inside of a Rook Pod): . rook operator 1.10.8
Storage backend version (e.g. for ceph do ceph -v): Ceph 17.2.5
Kubernetes version (use kubectl version): 1.25.9

The text was updated successfully, but these errors were encountered:

akash123-eng · 2024-05-16T07:34:21Z

@Rakshith-R @Madhu-1 can you please help on above ?

Rakshith-R · 2024-05-16T08:52:31Z

https://ceph.io/en/community/connect/

I or madhu are not familiar with such core ceph problems.

please use rook issues for rook specific probelms.

You should reach out on ceph slack or their mailing list for core ceph issues.
https://ceph.io/en/community/connect/

akash123-eng added the bug label May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MDS behind on trimming every 4-5 weeks causing issue for ceph filesystem #14220

MDS behind on trimming every 4-5 weeks causing issue for ceph filesystem #14220

akash123-eng commented May 16, 2024

akash123-eng commented May 16, 2024

Rakshith-R commented May 16, 2024

MDS behind on trimming every 4-5 weeks causing issue for ceph filesystem #14220

MDS behind on trimming every 4-5 weeks causing issue for ceph filesystem #14220

Comments

akash123-eng commented May 16, 2024

akash123-eng commented May 16, 2024

Rakshith-R commented May 16, 2024