Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mds: fix ceph_assert(!capped)" during max_mds thrashing #24490

Merged
merged 3 commits into from Nov 13, 2018

Conversation

@ukernel
Copy link
Member

commented Oct 9, 2018

Fixes: http://tracker.ceph.com/issues/36350

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug
@batrick

This comment has been minimized.

Copy link
Member

commented Oct 9, 2018

@ukernel ukernel force-pushed the ukernel:wip-36350 branch from 9c412d3 to 1ce54b5 Oct 9, 2018

batrick added a commit to batrick/ceph that referenced this pull request Oct 15, 2018
Merge PR ceph#24490 into wip-pdonnell-testing-20181011.152759
* refs/pull/24490/head:
	mds: use MDlog::trim_all() to trim log when deactivating mds
	mds: don't cap log when there are replicated objects
@batrick

This comment has been minimized.

Copy link
Member

commented Oct 16, 2018

This may cause this set of failures:

Failure: Test failure: test_all_down (tasks.cephfs.test_failover.TestClusterResize)
5 jobs: ['3127797', '3127688', '3128016', '3127578', '3127907']
suites intersection: ['clusters/1a3s-mds-2c-client.yaml', 'conf/{client.yaml', 'fs/multifs/{begin.yaml', 'mds.yaml', 'mon-debug.yaml', 'mon.yaml', 'mount/fuse.yaml', 'osd.yaml}', 'overrides/{frag_enable.yaml', 'tasks/failover.yaml}', 'whitelist_health.yaml', 'whitelist_wrongly_marked_down.yaml}']
suites union: ['clusters/1a3s-mds-2c-client.yaml', 'conf/{client.yaml', 'fs/multifs/{begin.yaml', 'mds.yaml', 'mon-debug.yaml', 'mon.yaml', 'mount/fuse.yaml', 'objectstore-ec/bluestore-comp-ec-root.yaml', 'objectstore-ec/bluestore-comp.yaml', 'objectstore-ec/bluestore-ec-root.yaml', 'objectstore-ec/bluestore.yaml', 'objectstore-ec/filestore-xfs.yaml', 'osd.yaml}', 'overrides/{frag_enable.yaml', 'supported-random-distros$/{centos_latest.yaml}', 'supported-random-distros$/{rhel_latest.yaml}', 'tasks/failover.yaml}', 'whitelist_health.yaml', 'whitelist_wrongly_marked_down.yaml}']

See also: /ceph/teuthology-archive/pdonnell-2018-10-11_17:55:20-fs-wip-pdonnell-testing-20181011.152759-distro-basic-smithi/3127797/remote/smithi153/log/ceph-mds.a.log.gz

@batrick

This comment has been minimized.

Copy link
Member

commented Oct 24, 2018

Still seeing the same class of failures:

Failure: Test failure: test_thrash (tasks.cephfs.test_failover.TestClusterResize)
5 jobs: ['3177770', '3177725', '3177905', '3177860', '3177815']
suites intersection: ['clusters/1a3s-mds-2c-client.yaml', 'conf/{client.yaml', 'fs/multifs/{begin.yaml', 'mds.yaml', 'mon-debug.yaml', 'mon.yaml', 'mount/fuse.yaml', 'osd.yaml}', 'overrides/{frag_enable.yaml', 'tasks/failover.yaml}', 'whitelist_health.yaml', 'whitelist_wrongly_marked_down.yaml}']
suites union: ['clusters/1a3s-mds-2c-client.yaml', 'conf/{client.yaml', 'fs/multifs/{begin.yaml', 'mds.yaml', 'mon-debug.yaml', 'mon.yaml', 'mount/fuse.yaml', 'objectstore-ec/bluestore-comp-ec-root.yaml', 'objectstore-ec/bluestore-comp.yaml', 'objectstore-ec/bluestore-ec-root.yaml', 'objectstore-ec/bluestore.yaml', 'objectstore-ec/filestore-xfs.yaml', 'osd.yaml}', 'overrides/{frag_enable.yaml', 'supported-random-distros$/{centos_latest.yaml}', 'supported-random-distros$/{rhel_latest.yaml}', 'supported-random-distros$/{ubuntu_16.04.yaml}', 'supported-random-distros$/{ubuntu_latest.yaml}', 'tasks/failover.yaml}', 'whitelist_health.yaml', 'whitelist_wrongly_marked_down.yaml}']

e.g. /ceph/teuthology-archive/pdonnell-2018-10-24_02:35:37-fs-wip-pdonnell-testing-20181023.224346-distro-basic-smithi/3177770/remote/smithi009/log/ceph-mds.x-s.log.gz

ukernel added 3 commits Oct 9, 2018
mds: don't cap log when there are replicated objects
replicas may dirty scatter locks

Fixes: http://tracker.ceph.com/issues/36350
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
mds: use MDlog::trim_all() to trim log when deactivating mds
The problem of MDLog::trim(0) is that it expires current segment.
New log events (scatter nudge) may get added to current segment
when MDLog::trim(0) expires current segement.

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
mds: flush dirty dirfrags that weren't logged when deactivating mds
CDir::log_mark_dirty() may mark dirfrag dirty, without journal event.

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>

@ukernel ukernel force-pushed the ukernel:wip-36350 branch from da7389a to 68b6073 Oct 29, 2018

@ukernel

This comment has been minimized.

Copy link
Member Author

commented Nov 1, 2018

fixed

@batrick

This comment has been minimized.

Copy link
Member

commented Nov 2, 2018

Probably caused: /ceph/teuthology-archive/pdonnell-2018-11-02_00:51:40-kcephfs-wip-pdonnell-testing-20181101.222544-distro-basic-smithi/3212716/teuthology.log

@ukernel

This comment has been minimized.

Copy link
Member Author

commented Nov 9, 2018

It's unrelated.

kernel_mount's kill did an ipmi power off. It seems that it cleanly umount cephfs

@batrick batrick merged commit 68b6073 into ceph:master Nov 13, 2018

5 checks passed

Docs: build check OK - docs built
Details
Signed-off-by all commits in this PR are signed
Details
Unmodified Submodules submodules for project are unmodified
Details
make check make check succeeded
Details
make check (arm64) make check succeeded
Details
batrick added a commit that referenced this pull request Nov 13, 2018
Merge PR #24490 into master
* refs/pull/24490/head:
	mds: flush dirty dirfrags that weren't logged when deactivating mds
	mds: use MDlog::trim_all() to trim log when deactivating mds
	mds: don't cap log when there are replicated objects

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.