Skip to content

Commit

Permalink
Merge pull request #1362 from dachary/wip-7548
Browse files Browse the repository at this point in the history
doc: erasure coded pool developer and operations documentation

Reviewed-by: Sage Weil <sage@inktank.com>
  • Loading branch information
Sage Weil committed Mar 12, 2014
2 parents 01a93a2 + 724ad02 commit c55da14
Show file tree
Hide file tree
Showing 2 changed files with 165 additions and 11 deletions.
74 changes: 74 additions & 0 deletions doc/dev/erasure-coded-pool.rst
@@ -0,0 +1,74 @@
Erasure Coded pool
==================

Purpose
-------

Erasure coded pools requires less storage space compared to replicated
pools. It has higher computation requirements and only supports a
subset of the operations allowed on an object (no partial write for
instance).

Use cases
---------

Cold storage
~~~~~~~~~~~~

An erasure coded pool is created to store a large number of 1GB
objects (imaging, genomics, etc.) and 10% of them are read per
month. New objects are added every day and the objects are not
modified after being written. On average there is one write for 10,000
reads.

A replicated pool is created and set as a cache tier for the
replicated pool. An agent demotes objects (i.e. moves them from the
replicated pool to the erasure coded pool) if they have not been
accessed in a week.

The erasure coded pool crush ruleset targets hardware designed for
cold storage with high latency and slow access time. The replicated
pool crush ruleset targets faster hardware to provide better response
times.

Cheap multidatacenter storage
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Ten datacenters are connected with dedicated network links. Each
datacenters contain the same amount of storage, have no power supply
backup and no air cooling system.

An erasure coded pool is created with a crush map ruleset that will
ensure no data loss if at most three datacenters fail
simultaneously. The overhead is 50% with erasure code configured to
split data in six (k=6) and create three coding chunks (m=3). With
replication the overhead would be 400% (four replicas).

Interface
---------

Set up an erasure coded pool::

ceph osd create ecpool 12 12 erasure

Set up an erasure coded pool and the associated crush ruleset::

ceph osd crush rule create-erasure ecruleset
ceph osd pool create ecpool 12 12 erasure \
crush_ruleset=ecruleset

Set the ruleset failure domain to osd instead of the host which is the default::

ceph osd pool create ecpool 12 12 erasure \
erasure-code-ruleset-failure-domain=osd

Control the parameters of the erasure code plugin::

ceph osd pool create ecpool 12 12 erasure \
erasure-code-k=2 erasure-code-m=1

Choose an alternate erasure code plugin::

ceph osd create ecpool 12 12 erasure \
erasure-code-plugin=example

102 changes: 91 additions & 11 deletions doc/rados/operations/pools.rst
Expand Up @@ -3,13 +3,14 @@
=======

When you first deploy a cluster without creating a pool, Ceph uses the default
pools for storing data. A pool differs from CRUSH's location-based buckets in
that a pool doesn't have a single physical location, and a pool provides you
with some additional functionality, including:
pools for storing data. A pool provides you with:

- **Replicas**: You can set the desired number of copies/replicas of an object.
- **Resilience**: You can set how many OSD are allowed to fail without loosing data.
For replicated pools, it is the desired number of copies/replicas of an object.
A typical configuration stores an object and one additional copy
(i.e., ``size = 2``), but you can determine the number of copies/replicas.
For erasure coded pools, it is the number of coding chunks
(i.e. ``erasure-code-m=2``)

- **Placement Groups**: You can set the number of placement groups for the pool.
A typical configuration uses approximately 100 placement groups per OSD to
Expand All @@ -18,9 +19,9 @@ with some additional functionality, including:
placement groups for both the pool and the cluster as a whole.

- **CRUSH Rules**: When you store data in a pool, a CRUSH ruleset mapped to the
pool enables CRUSH to identify a rule for the placement of the primary object
and object replicas in your cluster. You can create a custom CRUSH rule for your
pool.
pool enables CRUSH to identify a rule for the placement of the object
and its replicas (or chunks for erasure coded pools) in your cluster.
You can create a custom CRUSH rule for your pool.

- **Snapshots**: When you create snapshots with ``ceph osd pool mksnap``,
you effectively take a snapshot of a particular pool.
Expand Down Expand Up @@ -60,7 +61,14 @@ For example::

To create a pool, execute::

ceph osd pool create {pool-name} {pg-num} [{pgp-num}]
ceph osd pool create {pool-name} {pg-num} [{pgp-num}] [replicated]
ceph osd pool create {pool-name} {pg-num} {pgp-num} erasure \
[{crush_ruleset=ruleset}] \
[{erasure-code-directory=directory}] \
[{erasure-code-plugin=plugin}] \
[{erasure-code-k=data-chunks}] \
[{erasure-code-m=coding-chunks}] \
[{key=value} ...]

Where:

Expand Down Expand Up @@ -90,6 +98,78 @@ Where:
:Required: Yes. Picks up default or Ceph configuration value if not specified.
:Default: 8

``{replicated|erasure}``

:Description: The pool type which may either be **replicated** to
recover from lost OSDs by keeping multiple copies of the
objects or **erasure** to get a kind of generalized
RAID5 capability. The **replicated** pools require more
raw storage but implement all Ceph operations. The
**erasure** pools require less raw storage but only
implement a subset of the available operations.

:Type: String
:Required: No.
:Default: replicated

``{crush_ruleset=ruleset}``

:Description: For **erasure** pools only. Set the name of the CRUSH
**ruleset**. It must be an existing ruleset matching
the requirements of the underlying erasure code plugin.

:Type: String
:Required: No.

``{erasure-code-directory=directory}``

:Description: For **erasure** pools only. Set the **directory** name
from which the erasure code plugin is loaded.

:Type: String
:Required: No.
:Default: /usr/lib/ceph/erasure-code

``{erasure-code-plugin=plugin}``

:Description: For **erasure** pools only. Use the erasure code **plugin**
to compute coding chunks and recover missing chunks.

:Type: String
:Required: No.
:Default: jerasure

``{erasure-code-k=data-chunks}``

:Description: For **erasure** pools using the **jerasure** plugin
only. Each object is split in **data-chunks** parts,
each stored on a different OSD.

:Type: Integer
:Required: No.
:Default: 4

``{erasure-code-m=coding-chunks}``

:Description: For **erasure** pools using the **jerasure** plugin
only. Compute **coding chunks** for each object and
store them on different OSDs. The number of coding
chunks is also the number of OSDs that can be down
without losing data.

:Type: Integer
:Required: No.
:Default: 2

``{key=value}``

:Description: For **erasure** pools, the semantic of the remaining
key/value pairs is defined by the erasure code plugin.
For **replicated** pools, the key/value pairs are
ignored.

:Type: String
:Required: No.

When you create a pool, set the number of placement groups to a reasonable value
(e.g., ``100``). Consider the total number of placement groups per OSD too.
Expand Down Expand Up @@ -171,12 +251,12 @@ You may set values for the following keys:

``size``

:Description: Sets the number of replicas for objects in the pool. See `Set the Number of Object Replicas`_ for further details.
:Description: Sets the number of replicas for objects in the pool. See `Set the Number of Object Replicas`_ for further details. Replicated pools only.
:Type: Integer

``min_size``

:Description: Sets the minimum number of replicas required for io. See `Set the Number of Object Replicas`_ for further details
:Description: Sets the minimum number of replicas required for io. See `Set the Number of Object Replicas`_ for further details. Replicated pools only.
:Type: Integer

.. note:: Version ``0.54`` and above
Expand Down Expand Up @@ -233,7 +313,7 @@ To set a value to a pool, execute the following::
Set the Number of Object Replicas
=================================

To set the number of object replicas, execute the following::
To set the number of object replicas on a replicated pool, execute the following::

ceph osd pool set {poolname} size {num-replicas}

Expand Down

0 comments on commit c55da14

Please sign in to comment.