New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BlueStore compression - recompression #56975
Open
aclamk
wants to merge
48
commits into
ceph:main
Choose a base branch
from
aclamk:wip-aclamk-bs-compression-recompression
base: main
Could not load branches
Branch not found: {{ refName }}
Could not load tags
Nothing to show
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Introduce printer class that allows to select parts of Blob that are to be printed. It severly reduced amount of clutter in output. Usage: using P = Bluestore::Blob::printer; dout << blob->printer(P::ptr + P::sdisk + P::schk); Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Modify Extent similar to Blob, so that one can use improved Blob printing when printing extents. Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Now printing Blob can include buffers. There are 2 variants: - 'buf' same as original in dump_onode - 'sbuf' only fundamental params, no ptr etc. Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Added nice replacement for dump_onode function. Introduce printer class that allows to select parts of Onode that are to be printed. It severly reduced amount of clutter in output. Usage: using P = Bluestore::printer; dout << blob->print(P::ptr + P::sdisk + P::schk + P::buf + P::attrs); Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
- moved operator<< to BlueStore_debug file - upcased Printer {} flags - more reliable heap begin detection - fixup after rebase Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Created new variant of bluestore_blob_t::release_extents function. Now the function takes range [offset~length] as an argument, a simplification that allows it to have much better performance. Created comprehensive unit test, tests 40k random blobs. The unit test does not test for a potential case of having bluestore_blob_t.extents that are not allocation unit aligned. Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
p2remain gives remaining data in a block. It simialar to p2nphase, but for 0 offset returns full size. Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
New version of put(). It is simpler and faster, but does not allow for overprovisioning of used AUs. Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Created dedicated mutator of ExtentMap that is useful when a logical extent must be split. Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Introducing new logic of Onode processing during write. New punch_hole_2 function empties range, but keeps track of elements: - allocations that are no longer used - blobs that are now empty - shared blobs that got modified - statfs changes to apply later This change allows to reuse allocation for deferred freely, which means that we can use allocations in deferred mode in other blob then they come from. Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Comprehensive tests for punch_hole_2. New formulation of punch_hole_2 makes it very easy to create patterns and inspect results. Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
It is more organized this way. Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Fixed memleak (Extents) from punch_hole_2. Also fixed num_blobs in test_bluestore_types. Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Set of various simple functions to simplify code. No special logic here. Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
BlueStore::Writer is a toolkit to give more options to control write. It gives more control over compression process, letting user of the class manually split incoming data to blobs. Now for large writes all but last blob can be fully filled with data. There is now a single place that decides on deferred/direct. Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Extensive tests for BlueStore::Writer. Some extra debug hooks for BlueStore. Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
New "write_lat" tracks wallclock time spent in execution of BlueStore::_write(). Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Added _do_write_v2 function to BlueStore. Function is not integrated yet. Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Add new conf variable. bluestore_write_v2 = true : use new _do_write_v2() function bluestore_write_v2 = false : use legacy _do_write() function This variable is read only at start time. Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Add missing real read in io_schedule and io_schedule_masked. Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Added missing implementation of object reading itself. Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
- changed "o" to "onode" - moved away from using "offset" when illogical - added a lot of comments - fixed interface / fixed public,private - expand blob scan in _try_reuse_allocated_l/r Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Synchronized test_bluestore_types with fixes to BlueStore::Write class. Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Is is actually useful for anything? Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Split object data into segments of conf.bluestore_segment_data_size bytes. This means that no blob will be in two segments at the same time. Modified reshard function to prefer segment separation lines. As a result no spanning blobs are created. Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
New segment_size pool option creates segmentation in BlueStore Onodes. This simplifies metadata encoding. Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
+ adapt to new "segment_size" pool option Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Add feature of recompression scanner that looks around write region to see how much would be gained, if we read some more around and wrote more. Added Compression.h / Compression.cc. Added debug_bluestore_compression dout. Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
1) Created _deferred_decision function. This is centalized place to decide whether IO in transaction should be all deferred or all direct. 2) Reorganized do_write() into several functions. Now function progresses in steps and has a _deferred_decision step. This is a fundation for having muliple disjointed regions to have centalized decision. 3) Added blob_vec type. Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Created Estimator class. It is used by Scanner to decide if specific extent is to be recompressed. Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
No longer keep references to Extents. Added mark_main() to signal original write region. Fixed get_regions() function. Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Scan::on_write_start no longer has extra_rewrites output. This is not retrieved through Estimator::get_regions(). Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Move calculation of pp_mode (pretty print mode) initialization to constructor. Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Split do_write into do_write and do_write_with_blobs. The original is used when only uncompressed data is written. The new one accepts stream of data formatted into blobs; the blobs can be compressed or uncompressed. Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Modify do_write_v2() to handle also compressed. Segmented and regular cases are recognized and handled properly. New do_write_v2_compressed() oversees compression / recompression. Added split_and_compress() function. It is dumb now and always assumes compression makes sense. To be improved. Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Now printing blob_vec properly handles compression cases. Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Add blob_create_full_compressed. Fix do_put_new_blobs to handle compressed. Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Move most logic from Scanner to Estimator. Prepare for future machine learning / adaptive algorithm for estimation. Renamed functions, added interface comments. Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Make one estimator per collection. It makes possible for estimator to learn in collection specific compressibility. Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
An error causes recompression lookup to go into infinite loop. Now we properly skip over shared blobs in compressed expansion step. Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Give Estimator proper logic. It now learns expected recompression values, and uses them in next iterations to predict. Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
@aclamk Very nice! Digging those 1.7 results especially. They are nearly as good as ref in write IOPS and significantly better in all other metrics. |
aclamk
force-pushed
the
wip-aclamk-bs-compression-recompression
branch
from
April 19, 2024 16:09
7990f2c
to
d35088d
Compare
Added missing files to alienstore CMake list. Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
aclamk
force-pushed
the
wip-aclamk-bs-compression-recompression
branch
from
April 19, 2024 16:32
d35088d
to
2bfe098
Compare
Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
jenkins test make check |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This work is a partial solution to the problem bluestore compression quality on RBD.
Original (v1) compression has following characteristics:
New (v2) compression is doing:
Initial engineering tests of the feature:
old: https://docs.google.com/spreadsheets/d/1ZRfTYFlc8gdDoU76qrgTFswzViho4ameabIk9TIx6Uw/edit#gid=560728694
moved, no restructions: https://docs.google.com/spreadsheets/d/1M_k2huVDlqZOPC5lz5JoKM1jrPSP9K43ErBoo5cMHDo/edit#gid=1978643008
Below extracted summary:
ref - reference
1.1 - most agressive recompression, it is enough to predict 1.1x gain to do recompress
..
1.7 - relaxed recompression, it requires 1.7x gain estimation to do recompress
Allocated space:
Write IOPs:
CPU used for writes:
Read IOPs: (Ref degradation is not a fluke. It is real and repeatable.):
CPU used for reads: (Ref degradation is not a fluke. It is real and repeatable.):
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
x
between the brackets:[x]
. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows
jenkins test rook e2e