Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resolve and document most common erasure coded pool pain points #3194

Merged
merged 5 commits into from Jan 18, 2015
Merged

resolve and document most common erasure coded pool pain points #3194

merged 5 commits into from Jan 18, 2015

Commits on Jan 15, 2015

  1. crush: update tries statistics for indep rules

    http://tracker.ceph.com/issues/10349 Fixes: #10349
    
    Signed-off-by: Loic Dachary <ldachary@redhat.com>
    ldachary committed Jan 15, 2015
    Configuration menu
    Copy the full SHA
    4d07a32 View commit details
    Browse the repository at this point in the history
  2. crush: set_choose_tries = 100 for erasure code rulesets

    It is common for people to try to map 9 OSDs out of a 9 OSDs total ceph
    cluster. The default tries (50) will frequently lead to bad mappings for
    this use case. Changing it to 100 makes no significant CPU performance
    difference, as tested manually by running crushtool on one million
    mappings.
    
    http://tracker.ceph.com/issues/10353 Fixes: #10353
    
    Signed-off-by: Loic Dachary <ldachary@redhat.com>
    ldachary committed Jan 15, 2015
    Configuration menu
    Copy the full SHA
    2f87ac8 View commit details
    Browse the repository at this point in the history
  3. erasure-code: set max_size to chunk_count() instead of 20

    The ruleset created for an erasure coded pool has max_size set to a
    fixed value of 20, which may be incorrect when more than 20 chunks are
    needed and lead to obscure errors. Set it to the number of chunks,
    i.e. k+m most of the time.
    
    In a cluster with few OSDs (9 for instance), setting max_size to 20
    causes performance problems when injecting a new crushmap. The monitor
    will call CrushTester::test which tries 1024 mappins for all sizes
    ranging from min_size to max_size. Each attempt to map more OSDs than
    available will exhaust all retries (50 by default) and it takes a
    significant amount of time. In a cluster with 9 OSDs, testing one such
    ruleset can take up to 5 seconds.
    
    Since the test blocks the monitor leader, a few erasure coded rulesets
    will block the monitor long enough to exceed the timeouts and trigger an
    election.
    
    http://tracker.ceph.com/issues/10363 Fixes: #10363
    
    Signed-off-by: Loic Dachary <ldachary@redhat.com>
    ldachary committed Jan 15, 2015
    Configuration menu
    Copy the full SHA
    8b64fe9 View commit details
    Browse the repository at this point in the history
  4. documentation: add troubleshooting erasure coded PGs section

    Add a new section to the PG troubleshooting section that covers the most
    common problems reported when an erasure coded pool fails to properly
    map PGs to enough OSDs.
    
    http://tracker.ceph.com/issues/10350 Fixes: #10350
    
    Signed-off-by: Loic Dachary <ldachary@redhat.com>
    ldachary committed Jan 15, 2015
    Configuration menu
    Copy the full SHA
    02cab93 View commit details
    Browse the repository at this point in the history
  5. erasure-code: tests use different pool/profile names

    Use different erasure coded pool names and profiles to avoid deletion /
    creation races. The more expensive alternative is to run a different
    cluster for each test.
    
    Signed-off-by: Loic Dachary <ldachary@redhat.com>
    ldachary committed Jan 15, 2015
    Configuration menu
    Copy the full SHA
    dac666f View commit details
    Browse the repository at this point in the history