DNM: erasure code XOR plugin #1164

apeters1971 · 2014-01-30T08:50:23Z

This is a performance optimized EC plug-in computing simple parity similar to RAID-4 algorithms.
It is particular useful in combination with the EC pyramid code to compute local parities.

The implementation uses SSE2 assembler and region XOR'ing of 512-bit blocks. If not available it falls back to vector operations with 128-bit or 64-bit arithmetic.

There is no need to specialize more than ostream : it only makes it impossible to use cerr or cout as a parameter to str_map. Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com> Signed-off-by: Loic Dachary <loic@dachary.org>

Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com> Signed-off-by: Loic Dachary <loic@dachary.org>

So that a plugin can provide a more efficient implementation. Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com> Signed-off-by: Loic Dachary <loic@dachary.org>

* With year 2014 * Use "Ceph distributed storage system" instead of "Ceph - scalable distributed file system" Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com> Signed-off-by: Loic Dachary <loic@dachary.org>

The encode and decode interface are expected to allocate properly aligned chunks when needed and convert bufferlists into chunks. The encode_chunks and decode_chunks are lower level interfaces that assume all chunks are allocated and that all chunks are to be decoded or encoded. They are meant to be used in contexts where these constraints are enforced by default, such as when a plugin is used within the pyramid erasure code implementation. Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com> Signed-off-by: Loic Dachary <loic@dachary.org>

The ErasureCode class is derived from ErasureCodeInterface and implements stubs for some of the methods. The encode() method stub relies on encode_chunk(). The decode() method stub relies on decode_chunk(). Both are otherwise copied from ErasureCodeJerasure. The minimum_to_decode() and minimum_to_decode_with_cost() implementation are copied verbatim from ErasureCodeJerasure. The corresponding ErasureCodeJerasure methods are removed. The existing decode_concat() helper is moved from ErasureCodeInterface to ErasureCode so ErasureCodeInterface only contains pure virtual as is expected from an interface definition. Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com> Signed-off-by: Loic Dachary <loic@dachary.org>

Implementation of the corresponding ErasureCodeInterface methods which convert bufferlists into char * and map almost exactly to the jerasure method prototype ( modulo the function name which is wrapped in a virtual method ). Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com> Signed-off-by: Loic Dachary <loic@dachary.org>

The Mutex scope is restricted to only protect the load() method and not the factory() method. This allows a plugin to load another plugin from within the factory() method. This is convenient for the pyramid plugin where each layer can specify a different plugin. Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com> Signed-off-by: Loic Dachary <loic@dachary.org>

The erasure code example plugin is re-implemented using the encode_chunks() and decode_chunks() methods to show how they work. Betting on the fact that a plugin implementor is likely to find this API more straightforward to adapt than the encode() and decode() helpers which are more convenient from the point of view of the caller but not from the point of view of the plugin implementor. Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com> Signed-off-by: Loic Dachary <loic@dachary.org>

The decode() stub method from ErasureCode implements the intended side effect of only returning the chunks required by want_to_decode. Modify the tests to reflect this change. Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com> Signed-off-by: Loic Dachary <loic@dachary.org>

Add tests demonstrating that decode() and encode() methods avoid copying the buffers when they can. Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com> Signed-off-by: Loic Dachary <loic@dachary.org>

Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com> Signed-off-by: Loic Dachary <loic@dachary.org>

An erasure code plugin providing an implementation of ErasureCodeInterface. The caller can specify how to recursively apply erasure coding to the chunks to control the placement of the erasure coded chunks. For instance with a crush ruleset containing the following steps: take root set choose datacenter 2 set choose devices 5 An erasure coded pool is given 10 OSDs ( 0123456789 ) the first five ( 01234 ) are in a datacenter, the last five ( 56789 ) are in another datacenter. Creating a pyramidal layout by which recovering from the loss of a single OSD does not require getting data from an OSD located in another datacenter can be done with: [ { "plugin": "jerasure", "technique": "cauchy_good", "k": "6", "m": "2", "mapping": "^-ABCDEF-^", }, { "plugin": "xor", "k": "3", "m": "1", "type": "datacenter", "size": 2, "mapping": "-^ABCDEF^-", }, ] The object is first encoded in six data chunks ( "k": "6" ) and two coding chunks ( "m": "2" ) by the first layer of the pyramid, using the jerasure plugin ( "plugin": "jerasure" ). The jerasure plugin creates a total of eight chunks ( k=6 + m=2 == 8 ) and ensures that the first six contain the original data. If the data chunks were designated by letters and the coding chunks by ^, it could be something like ABCDEF^^ If used outside of the context of the pyramid plugin, the jerasure plugin would spread data and coding chunks as follows ( the dash - designates a chunk that is not being used ): 01324 56789 ABCDE F^^-- i.e. with the first five data chunks in a datacenter ( the crush ruleset above provides OSDs 01234 in one datacenter and 56789 in another ) and the remaining chunks in another datacenter. The pyramid plugin remaps it ( "mapping": "^-ABCDEF-^" ) and the chunk placement becomes : 01234 56789 ^-ABC DEF-^ which is more evenly distributed, with three data chunks and a coding chunk in one datacenter plus three data chunks and a coding chunk in another datacenter. The next level of the pyramid is expected to create coding chunks that allows recovery without crossing datacenter boundaries, using a XOR coding ( "plugin": "xor" ) supporting the loss of a single chunk ( "m": "1" ) out of the three ( "k": "3" ) found in a given datacenter. It starts by splitting the chunks in two ( "size": 2 ), starting with: 01234 ^-ABC The XOR plugin is given ABC as an input and creates ABC^ which are remapped into 01234 -^ABC as specified in the first half of the mapping ( "mapping": "-^ABCDEF^-" ). The coding chunk of the previous level of the pyramid is left undisturbed because the dash ( - ) in the mapping requires that it is not used. The same logic can be applied to three levels with: take root set choose datacenter 2 set choose rack 2 set choose devices 5 ^-ABC-DEF--GHI-JKL-^ -^ABC-DEF--GHI-JKL^- datacenter --ABC^DEF^^GHI^JKL-- rack http://tracker.ceph.com/issues/7238 Fixes: ceph#7238 Signed-off-by: Loic Dachary <loic@dachary.org>

Signed-off-by: Loic Dachary <loic@dachary.org>

…systematic CODEC

ghost · 2014-02-19T10:26:25Z

Reopen when work starts again : a link is added to http://tracker.ceph.com/issues/6478#note-6 to not loose track. It would be convenient to have a "DRAFT" category of pull requests but that will do. Having a long running DNM in the list of open pull request is not a good practice because it make noise.

buildpackages: backport make-rpm.sh improvements Reviewed-by: Loic Dachary <ldachary@redhat.com>

Loic Dachary and others added 15 commits January 28, 2014 19:24

common: s/stringstream/ostream/ in str_map

e1e7f7e

There is no need to specialize more than ostream : it only makes it impossible to use cerr or cout as a parameter to str_map. Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com> Signed-off-by: Loic Dachary <loic@dachary.org>

erasure-code: remove useless assert

1aae3bb

Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com> Signed-off-by: Loic Dachary <loic@dachary.org>

erasure-code: make decode_concat virtual

08ccdcc

So that a plugin can provide a more efficient implementation. Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com> Signed-off-by: Loic Dachary <loic@dachary.org>

erasure-code: update copyright notices

bf6f37e

* With year 2014 * Use "Ceph distributed storage system" instead of "Ceph - scalable distributed file system" Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com> Signed-off-by: Loic Dachary <loic@dachary.org>

erasure-code: test ErasureCodeJerasure zero copy

759f3e1

Add tests demonstrating that decode() and encode() methods avoid copying the buffers when they can. Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com> Signed-off-by: Loic Dachary <loic@dachary.org>

erasure-code: move ErasureCodeJerasure tests to a directory

0fe0ece

Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com> Signed-off-by: Loic Dachary <loic@dachary.org>

erasure-code: tests for the pyramid plugin

7326b9f

Signed-off-by: Loic Dachary <loic@dachary.org>

EC-XOR: adding implementation of a RAID4-like XOR based local parity …

6d2c39a

…systematic CODEC

apeters1971 closed this Jan 30, 2014

apeters1971 deleted the wip-xor branch January 30, 2014 09:50

apeters1971 reopened this Jan 30, 2014

ghost closed this Feb 19, 2014

liewegas pushed a commit to liewegas/ceph that referenced this pull request Nov 18, 2016

Merge pull request ceph#1164 from SUSE/wip-fix-make-rpm-jewel

a995ddd

buildpackages: backport make-rpm.sh improvements Reviewed-by: Loic Dachary <ldachary@redhat.com>

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DNM: erasure code XOR plugin #1164

DNM: erasure code XOR plugin #1164

apeters1971 commented Jan 30, 2014

ghost commented Feb 19, 2014

DNM: erasure code XOR plugin #1164

DNM: erasure code XOR plugin #1164

Conversation

apeters1971 commented Jan 30, 2014

ghost commented Feb 19, 2014