resolve and document most common erasure coded pool pain points #3194

http://tracker.ceph.com/issues/10349 Fixes: #10349 Signed-off-by: Loic Dachary <ldachary@redhat.com>

It is common for people to try to map 9 OSDs out of a 9 OSDs total ceph cluster. The default tries (50) will frequently lead to bad mappings for this use case. Changing it to 100 makes no significant CPU performance difference, as tested manually by running crushtool on one million mappings. http://tracker.ceph.com/issues/10353 Fixes: #10353 Signed-off-by: Loic Dachary <ldachary@redhat.com>

The ruleset created for an erasure coded pool has max_size set to a fixed value of 20, which may be incorrect when more than 20 chunks are needed and lead to obscure errors. Set it to the number of chunks, i.e. k+m most of the time. In a cluster with few OSDs (9 for instance), setting max_size to 20 causes performance problems when injecting a new crushmap. The monitor will call CrushTester::test which tries 1024 mappins for all sizes ranging from min_size to max_size. Each attempt to map more OSDs than available will exhaust all retries (50 by default) and it takes a significant amount of time. In a cluster with 9 OSDs, testing one such ruleset can take up to 5 seconds. Since the test blocks the monitor leader, a few erasure coded rulesets will block the monitor long enough to exceed the timeouts and trigger an election. http://tracker.ceph.com/issues/10363 Fixes: #10363 Signed-off-by: Loic Dachary <ldachary@redhat.com>

Add a new section to the PG troubleshooting section that covers the most common problems reported when an erasure coded pool fails to properly map PGs to enough OSDs. http://tracker.ceph.com/issues/10350 Fixes: #10350 Signed-off-by: Loic Dachary <ldachary@redhat.com>

Use different erasure coded pool names and profiles to avoid deletion / creation races. The more expensive alternative is to run a different cluster for each test. Signed-off-by: Loic Dachary <ldachary@redhat.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

resolve and document most common erasure coded pool pain points #3194

resolve and document most common erasure coded pool pain points #3194

Commits on Jan 15, 2015