New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DNM: erasure code XOR plugin #1164
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
There is no need to specialize more than ostream : it only makes it impossible to use cerr or cout as a parameter to str_map. Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com> Signed-off-by: Loic Dachary <loic@dachary.org>
Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com> Signed-off-by: Loic Dachary <loic@dachary.org>
So that a plugin can provide a more efficient implementation. Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com> Signed-off-by: Loic Dachary <loic@dachary.org>
* With year 2014 * Use "Ceph distributed storage system" instead of "Ceph - scalable distributed file system" Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com> Signed-off-by: Loic Dachary <loic@dachary.org>
The encode and decode interface are expected to allocate properly aligned chunks when needed and convert bufferlists into chunks. The encode_chunks and decode_chunks are lower level interfaces that assume all chunks are allocated and that all chunks are to be decoded or encoded. They are meant to be used in contexts where these constraints are enforced by default, such as when a plugin is used within the pyramid erasure code implementation. Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com> Signed-off-by: Loic Dachary <loic@dachary.org>
The ErasureCode class is derived from ErasureCodeInterface and implements stubs for some of the methods. The encode() method stub relies on encode_chunk(). The decode() method stub relies on decode_chunk(). Both are otherwise copied from ErasureCodeJerasure. The minimum_to_decode() and minimum_to_decode_with_cost() implementation are copied verbatim from ErasureCodeJerasure. The corresponding ErasureCodeJerasure methods are removed. The existing decode_concat() helper is moved from ErasureCodeInterface to ErasureCode so ErasureCodeInterface only contains pure virtual as is expected from an interface definition. Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com> Signed-off-by: Loic Dachary <loic@dachary.org>
Implementation of the corresponding ErasureCodeInterface methods which convert bufferlists into char * and map almost exactly to the jerasure method prototype ( modulo the function name which is wrapped in a virtual method ). Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com> Signed-off-by: Loic Dachary <loic@dachary.org>
The Mutex scope is restricted to only protect the load() method and not the factory() method. This allows a plugin to load another plugin from within the factory() method. This is convenient for the pyramid plugin where each layer can specify a different plugin. Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com> Signed-off-by: Loic Dachary <loic@dachary.org>
The erasure code example plugin is re-implemented using the encode_chunks() and decode_chunks() methods to show how they work. Betting on the fact that a plugin implementor is likely to find this API more straightforward to adapt than the encode() and decode() helpers which are more convenient from the point of view of the caller but not from the point of view of the plugin implementor. Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com> Signed-off-by: Loic Dachary <loic@dachary.org>
The decode() stub method from ErasureCode implements the intended side effect of only returning the chunks required by want_to_decode. Modify the tests to reflect this change. Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com> Signed-off-by: Loic Dachary <loic@dachary.org>
Add tests demonstrating that decode() and encode() methods avoid copying the buffers when they can. Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com> Signed-off-by: Loic Dachary <loic@dachary.org>
Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com> Signed-off-by: Loic Dachary <loic@dachary.org>
An erasure code plugin providing an implementation of ErasureCodeInterface. The caller can specify how to recursively apply erasure coding to the chunks to control the placement of the erasure coded chunks. For instance with a crush ruleset containing the following steps: take root set choose datacenter 2 set choose devices 5 An erasure coded pool is given 10 OSDs ( 0123456789 ) the first five ( 01234 ) are in a datacenter, the last five ( 56789 ) are in another datacenter. Creating a pyramidal layout by which recovering from the loss of a single OSD does not require getting data from an OSD located in another datacenter can be done with: [ { "plugin": "jerasure", "technique": "cauchy_good", "k": "6", "m": "2", "mapping": "^-ABCDEF-^", }, { "plugin": "xor", "k": "3", "m": "1", "type": "datacenter", "size": 2, "mapping": "-^ABCDEF^-", }, ] The object is first encoded in six data chunks ( "k": "6" ) and two coding chunks ( "m": "2" ) by the first layer of the pyramid, using the jerasure plugin ( "plugin": "jerasure" ). The jerasure plugin creates a total of eight chunks ( k=6 + m=2 == 8 ) and ensures that the first six contain the original data. If the data chunks were designated by letters and the coding chunks by ^, it could be something like ABCDEF^^ If used outside of the context of the pyramid plugin, the jerasure plugin would spread data and coding chunks as follows ( the dash - designates a chunk that is not being used ): 01324 56789 ABCDE F^^-- i.e. with the first five data chunks in a datacenter ( the crush ruleset above provides OSDs 01234 in one datacenter and 56789 in another ) and the remaining chunks in another datacenter. The pyramid plugin remaps it ( "mapping": "^-ABCDEF-^" ) and the chunk placement becomes : 01234 56789 ^-ABC DEF-^ which is more evenly distributed, with three data chunks and a coding chunk in one datacenter plus three data chunks and a coding chunk in another datacenter. The next level of the pyramid is expected to create coding chunks that allows recovery without crossing datacenter boundaries, using a XOR coding ( "plugin": "xor" ) supporting the loss of a single chunk ( "m": "1" ) out of the three ( "k": "3" ) found in a given datacenter. It starts by splitting the chunks in two ( "size": 2 ), starting with: 01234 ^-ABC The XOR plugin is given ABC as an input and creates ABC^ which are remapped into 01234 -^ABC as specified in the first half of the mapping ( "mapping": "-^ABCDEF^-" ). The coding chunk of the previous level of the pyramid is left undisturbed because the dash ( - ) in the mapping requires that it is not used. The same logic can be applied to three levels with: take root set choose datacenter 2 set choose rack 2 set choose devices 5 ^-ABC-DEF--GHI-JKL-^ -^ABC-DEF--GHI-JKL^- datacenter --ABC^DEF^^GHI^JKL-- rack http://tracker.ceph.com/issues/7238 Fixes: ceph#7238 Signed-off-by: Loic Dachary <loic@dachary.org>
Signed-off-by: Loic Dachary <loic@dachary.org>
Reopen when work starts again : a link is added to http://tracker.ceph.com/issues/6478#note-6 to not loose track. It would be convenient to have a "DRAFT" category of pull requests but that will do. Having a long running DNM in the list of open pull request is not a good practice because it make noise. |
ghost
closed this
Feb 19, 2014
liewegas
pushed a commit
to liewegas/ceph
that referenced
this pull request
Nov 18, 2016
buildpackages: backport make-rpm.sh improvements Reviewed-by: Loic Dachary <ldachary@redhat.com>
This pull request was closed.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a performance optimized EC plug-in computing simple parity similar to RAID-4 algorithms.
It is particular useful in combination with the EC pyramid code to compute local parities.
The implementation uses SSE2 assembler and region XOR'ing of 512-bit blocks. If not available it falls back to vector operations with 128-bit or 64-bit arithmetic.