Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os/bluestore: introduce hybrid_btree2 allocator #52489

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

ifed01
Copy link
Contributor

@ifed01 ifed01 commented Jul 17, 2023

This patch introduces new bluestore allocator: hybrid_btree2 intended primarily to lower disk fragmentation happening to aged OSD volumes.
The key features of the new allocator includes:

  1. Use btree containers instead of avl ones to reduce memory footprint and improve allocate/release latencies
  2. Use "mostly" lockless extents cache to reduce lock contention in the allocator. In fact shared lock + lockless access is used for that cache to permit multiple users to access it without contention. Unique lock is used for the cache for global access, e.g. foreach call, only.
  3. Implement different allocation strategy which attempts to allocate the most suitable (in terms of size) free extent(s) first. If failed it searches for suitable extent(s) trying to balance free space vs. data-at-rest fragmentations - actually they are antagonists to each other.

As a result new allocator generally shows much better performance, less RAM footprint, lower free space fragmentation and a bit higher data-at-rest fragmentation.

Sample numbers for hybrid and hybrid_btree allocators from allocator's benchmarking (unittest_alloc_bench --gtest_filter="*test_alloc_bench2_90_500_x2/*"):

hybrid:
Executed in 209.750353
Avail 13106 MB Fragmentation:0.91834
latest iteration data-at-rest fragmentation ratio (allocated extents/allocation requests): 1.1089

<=4096 -> 172250/172250 a_bytes 705536000 0.513345%
<=16384 -> 371807/371807 a_bytes 4381003776 3.1876%
<=65536 -> 318488/318488 a_bytes 8655015936 6.29735%
<=262144 -> 0/0 a_bytes 0 0%
<=1048576 -> 0/0 a_bytes 0 0%
<=4194304 -> 0/0 a_bytes 0 0%
<=16777216 -> 0/0 a_bytes 0 0%
<=18446744073709551615 -> 0/0 a_bytes 0 0%
 "bluestore_alloc": {
                "items": 1365198,
                "bytes": 71319360
            },

hybrid_btree2:
Executed in 62.844530
Avail 13107 MB Fragmentation:0.639356
latest iteration data-at-rest fragmentation ratio (allocated extents/allocation requests): 1.6903

<=4096 -> 354993/354993 a_bytes 1454051328 1.05796%
<=16384 -> 703942/703942 a_bytes 7709278208 5.60924%
<=65536 -> 32/32 a_bytes 720896 0.000524521%
<=262144 -> 0/0 a_bytes 0 0%
<=1048576 -> 0/0 a_bytes 0 0%
<=4194304 -> 0/0 a_bytes 0 0%
<=16777216 -> 0/0 a_bytes 0 0%
<=18446744073709551615 -> 1/1 a_bytes 4580282368 3.33259%
"bluestore_alloc": {
                "items": 5951210,
                "bytes": 47609680
  },

Some presentation slides can be found at https://docs.google.com/presentation/d/1TBLRvY7AF-K-DGijJifMnyUx_Oil2Mb-vIqp2TpMGQY/edit?usp=sharing

Signed-off-by: Igor Fedotov igor.fedotov@croit.io

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@YiteGu
Copy link
Contributor

YiteGu commented Aug 28, 2023

hi @ifed01, i found a error, i report here first:

2023-08-25T20:23:59.596+0800 7f83272d15c0 20 bluefs _replay 0x867000:  op_file_update  file(ino 1365 size 0x0 mtime 2023-08-25T18:05:13.829404+0800 allocated 0 alloc_commit 0 extents [])
2023-08-25T20:23:59.596+0800 7f83272d15c0 20 bluefs _replay 0x867000:  op_dir_link  db/MANIFEST-001326 to 1365
2023-08-25T20:23:59.596+0800 7f83272d15c0 20 bluefs _replay 0x867000:  op_file_update_inc  delta(ino 1365 size 0x67a mtime 2023-08-25T18:05:13.832592+0800 offset 0 extents [1:0xb00000~100000])
2023-08-25T20:23:59.596+0800 7f83272d15c0 20 bluefs _replay 0x867000:  op_file_update_inc produced  file(ino 1365 size 0x67a mtime 2023-08-25T18:05:13.832592+0800 allocated 100000 alloc_commit 0 extents [1:0xb00000~100000])
2023-08-25T20:23:59.596+0800 7f83272d15c0 10 bluefs _read h 0x56037d402180 0x868000~1000 from file(ino 1 size 0x868000 mtime 2023-08-16T15:34:08.361207+0800 allocated d00000 alloc_commit 500000 extents [1:0x3600000~100000,0:0x1300000~400000,0:0x200000~800000])
2023-08-25T20:23:59.596+0800 7f83272d15c0 20 bluefs _read left 0x98000 len 0x1000
2023-08-25T20:23:59.596+0800 7f83272d15c0 20 bluefs _read got 4096
2023-08-25T20:23:59.596+0800 7f83272d15c0 10 bluefs _replay 0x868000: txn(seq 29040 len 0x6d crc 0x357529fa)
2023-08-25T20:23:59.596+0800 7f83272d15c0 20 bluefs _replay 0x868000:  op_file_update  file(ino 1366 size 0x0 mtime 2023-08-25T18:05:13.834788+0800 allocated 0 alloc_commit 0 extents [])
2023-08-25T20:23:59.596+0800 7f83272d15c0 20 bluefs _replay 0x868000:  op_dir_link  db/001326.dbtmp to 1366
2023-08-25T20:23:59.596+0800 7f83272d15c0 20 bluefs _replay 0x868000:  op_file_update_inc  delta(ino 1366 size 0x10 mtime 2023-08-25T18:05:13.838958+0800 offset 0 extents [1:0x2000~fe000,1:0xd00000~2000])
2023-08-25T20:23:59.596+0800 7f83272d15c0 20 bluefs _replay 0x868000:  op_file_update_inc produced  file(ino 1366 size 0x10 mtime 2023-08-25T18:05:13.838958+0800 allocated 100000 alloc_commit 0 extents [1:0x2000~fe000,1:0xd00000~2000])
2023-08-25T20:23:59.596+0800 7f83272d15c0 -1 bluefs _verify_alloc_granularity OP_FILE_UPDATE_INC of 1:0x2000~fe000 does not align to alloc_size 0x100000
2023-08-25T20:23:59.596+0800 7f83272d15c0 -1 bluefs work-around by setting bluefs_alloc_size = 8192 for this OSD
2023-08-25T20:23:59.596+0800 7f83272d15c0 -1 bluefs mount failed to replay log: (14) Bad address
2023-08-25T20:23:59.596+0800 7f83272d15c0 20 bluefs _stop_alloc
2023-08-25T20:23:59.596+0800 7f83272d15c0 10 bdev(0x5603823d6500 /home/ceph/build/dev/osd1/block.wal) discard_drain
2023-08-25T20:23:59.596+0800 7f83272d15c0 10 bdev(0x56037d3fb700 /home/ceph/build/dev/osd1/block.db) discard_drain
2023-08-25T20:23:59.596+0800 7f83272d15c0 10 bdev(0x5603823d6000 /home/ceph/build/dev/osd1/block) discard_drain
2023-08-25T20:23:59.596+0800 7f83272d15c0 -1 bluestore(/home/ceph/build/dev/osd1) _open_bluefs failed bluefs mount: (14) Bad address
2023-08-25T20:23:59.596+0800 7f83272d15c0 10 bluefs maybe_verify_layout no memorized_layout in bluefs superblock
2023-08-25T20:23:59.596+0800 7f83272d15c0 -1 bluestore(/home/ceph/build/dev/osd1) _open_db failed to prepare db environment:
2023-08-25T20:23:59.596+0800 7f83272d15c0  1 bdev(0x56037d3fb200 /home/ceph/build/dev/osd1/block) close
2023-08-25T20:23:59.596+0800 7f83272d15c0 10 bdev(0x56037d3fb200 /home/ceph/build/dev/osd1/block) _aio_stop
2023-08-25T20:23:59.745+0800 7f8314238700 10 bdev(0x56037d3fb200 /home/ceph/build/dev/osd1/block) _aio_thread end
2023-08-25T20:23:59.769+0800 7f83272d15c0 10 bdev(0x56037d3fb200 /home/ceph/build/dev/osd1/block) _discard_stop
2023-08-25T20:23:59.769+0800 7f8313a37700 20 bdev(0x56037d3fb200 /home/ceph/build/dev/osd1/block) _discard_thread wake
2023-08-25T20:23:59.769+0800 7f8313a37700 10 bdev(0x56037d3fb200 /home/ceph/build/dev/osd1/block) _discard_thread finish
2023-08-25T20:23:59.769+0800 7f83272d15c0 10 bdev(0x56037d3fb200 /home/ceph/build/dev/osd1/block) _discard_stop stopped
2023-08-25T20:23:59.769+0800 7f83272d15c0 -1 osd.1 0 OSD:init: unable to mount object store
2023-08-25T20:23:59.769+0800 7f83272d15c0 -1 ^[[0;31m ** ERROR: osd init failed: (5) Input/output error^[[0m
  1. apply this PR, test hybrid_btree2
  2. start osd and start successful
  3. restart osd after 10 minutes
  4. occurred this error

@YiteGu
Copy link
Contributor

YiteGu commented Aug 29, 2023

I add some debug log, as below:

2023-08-26T16:42:17.682+0800 7f0de304a5c0 10 bluefs _flush_range_F 0x55685c543b00 pos 0x0 0x0~10 to file(ino 39 size 0x0 mtime 2023-08-26T16:42:17.684869+0800 allocated 0 alloc_commit 0 extents [])
2023-08-26T16:42:17.682+0800 7f0de304a5c0 10 bluefs _allocate len 0x10 from 1
2023-08-26T16:42:17.682+0800 7f0de304a5c0 10 hybrid_btree2::allocate want 0x100000 unit 0x100000 max_alloc_size 0x100000 hint 0x0
2023-08-26T16:42:17.682+0800 7f0de304a5c0 10 hybrid_btree2::maybe_get_from_cache res_offset 0 want 1048576 ret 0
2023-08-26T16:42:17.682+0800 7f0de304a5c0 10 hybrid_btree2::allocate want 0x100000 unit 0x100000 max_alloc_size 0x100000 hint 0x0
2023-08-26T16:42:17.682+0800 7f0de304a5c0 10 hybrid_btree2::_allocate want 1048576 unit 1048576 max_alloc_size 1048576 allocated 0 want_now 1048576
2023-08-26T16:42:17.682+0800 7f0de304a5c0 10 hybrid_btree2::__allocate bucket0 8 size 1048576 uniti 1048576
2023-08-26T16:42:17.682+0800 7f0de304a5c0 10 hybrid_btree2::__allocate insert to extents  offset 8192 rs_p length 1040384 len 1040384
2023-08-26T16:42:17.682+0800 7f0de304a5c0 10 hybrid_btree2::_allocate want 1048576 unit 1048576 max_alloc_size 1048576 allocated 1040384 want_now 8192
2023-08-26T16:42:17.682+0800 7f0de304a5c0 10 hybrid_btree2::__allocate bucket0 1 size 8192 uniti 1048576
2023-08-26T16:42:17.682+0800 7f0de304a5c0 10 hybrid_btree2::__allocate insert to extents  offset 2097152 rs_p length 2097152 len 8192
2023-08-26T16:42:17.682+0800 7f0de304a5c0 20 bluefs _flush_range_F file now, unflushed file(ino 39 size 0x10 mtime 2023-08-26T16:42:17.684869+0800 allocated 100000 alloc_commit 0 extents [1:0x2000~fe000,1:0x200000~2000])

file ino 39, it want need 1048576, unit 1048576, but btree2 allocate twice, the first time alloc offset and len is all not align 1048576. it cause to extents verify error after restart osd.

2023-08-26T16:42:17.682+0800 7f0de304a5c0 10 hybrid_btree2::__allocate insert to extents  offset 8192 rs_p length 1040384 len 1040384

@YiteGu
Copy link
Contributor

YiteGu commented Aug 31, 2023

I add some debug log, as below:

2023-08-26T16:42:17.682+0800 7f0de304a5c0 10 bluefs _flush_range_F 0x55685c543b00 pos 0x0 0x0~10 to file(ino 39 size 0x0 mtime 2023-08-26T16:42:17.684869+0800 allocated 0 alloc_commit 0 extents [])
2023-08-26T16:42:17.682+0800 7f0de304a5c0 10 bluefs _allocate len 0x10 from 1
2023-08-26T16:42:17.682+0800 7f0de304a5c0 10 hybrid_btree2::allocate want 0x100000 unit 0x100000 max_alloc_size 0x100000 hint 0x0
2023-08-26T16:42:17.682+0800 7f0de304a5c0 10 hybrid_btree2::maybe_get_from_cache res_offset 0 want 1048576 ret 0
2023-08-26T16:42:17.682+0800 7f0de304a5c0 10 hybrid_btree2::allocate want 0x100000 unit 0x100000 max_alloc_size 0x100000 hint 0x0
2023-08-26T16:42:17.682+0800 7f0de304a5c0 10 hybrid_btree2::_allocate want 1048576 unit 1048576 max_alloc_size 1048576 allocated 0 want_now 1048576
2023-08-26T16:42:17.682+0800 7f0de304a5c0 10 hybrid_btree2::__allocate bucket0 8 size 1048576 uniti 1048576
2023-08-26T16:42:17.682+0800 7f0de304a5c0 10 hybrid_btree2::__allocate insert to extents  offset 8192 rs_p length 1040384 len 1040384
2023-08-26T16:42:17.682+0800 7f0de304a5c0 10 hybrid_btree2::_allocate want 1048576 unit 1048576 max_alloc_size 1048576 allocated 1040384 want_now 8192
2023-08-26T16:42:17.682+0800 7f0de304a5c0 10 hybrid_btree2::__allocate bucket0 1 size 8192 uniti 1048576
2023-08-26T16:42:17.682+0800 7f0de304a5c0 10 hybrid_btree2::__allocate insert to extents  offset 2097152 rs_p length 2097152 len 8192
2023-08-26T16:42:17.682+0800 7f0de304a5c0 20 bluefs _flush_range_F file now, unflushed file(ino 39 size 0x10 mtime 2023-08-26T16:42:17.684869+0800 allocated 100000 alloc_commit 0 extents [1:0x2000~fe000,1:0x200000~2000])

file ino 39, it want need 1048576, unit 1048576, but btree2 allocate twice, the first time alloc offset and len is all not align 1048576. it cause to extents verify error after restart osd.

2023-08-26T16:42:17.682+0800 7f0de304a5c0 10 hybrid_btree2::__allocate insert to extents  offset 8192 rs_p length 1040384 len 1040384

I found the initial position of this error is hybrid_btree2::init_rm_free, the log as below:

2023-08-28T17:23:30.594+0800 7f176b4205c0 10 hybrid_btree2::init_rm_free offset 0x100000 length 0x100000
2023-08-28T17:23:30.594+0800 7f176b4205c0 10 hybrid_btree2::_remove_from_tree start 1048576 rt_p->first 8192 end 2097152 rt_p->second 4194304
2023-08-28T17:23:30.594+0800 7f176b4205c0 10 hybrid_btree2::__try_insert_range rs.satrt 2097152 rs.end 4194304
2023-08-28T17:23:30.594+0800 7f176b4205c0 10 hybrid_btree2::_range_size_tree_add rs.length 2097152 bucket 9 num_free 1063256064 rsum 1062207488 lsum 1048576 tree size 1
2023-08-28T17:23:30.594+0800 7f176b4205c0 10 hybrid_btree2::__try_insert_range rs.satrt 8192 rs.end 1048576
2023-08-28T17:23:30.594+0800 7f176b4205c0 10 hybrid_btree2::_range_size_tree_add rs.length 1040384 bucket 8 num_free 1064296448 rsum 1062207488 lsum 2088960 tree size 2

this leads to product a btree node with a size of less than 1M in bucket 8, and then we have not align rs->start with unit size while get btree node.

@ifed01
Copy link
Contributor Author

ifed01 commented Aug 31, 2023

@YiteGu - thanks a lot for the investigation. Indeed the new allocator doesn't follow the requirement to allocate unit size aligned extents any more. From now on unit size parameter determines new extent's length only. And extent's location to be aligned with block device unit (=4K). Hence I'll have to get rid off the relevant verification at BlueFS::_check_allocations() method. Will do that shortly
Meanwhile one can workaround the issue be setting bluefs_log_replay_check_allocations config parameter to false.

@YiteGu
Copy link
Contributor

YiteGu commented Aug 31, 2023

@YiteGu - thanks a lot for the investigation. Indeed the new allocator doesn't follow the requirement to allocate unit size aligned extents any more. From now on unit size parameter determines new extent's length only. And extent's location to be aligned with block device unit (=4K). Hence I'll have to get rid off the relevant verification at BlueFS::_check_allocations() method. Will do that shortly Meanwhile one can workaround the issue be setting bluefs_log_replay_check_allocations config parameter to false.

I got it. :)

@ifed01
Copy link
Contributor Author

ifed01 commented Aug 31, 2023

@YiteGu - I've just realized that all the releases back to Quincy have already been switched to 4K alignment requirement in BlueFS::_verify_alloc_granularity() method. Hence they shouldn't report the above error. I presume you're trying new allocator on top of pacific release prior to 16.2.14, aren't you? Please confirm.
If that's the case please use the workaround with config parameter tuning I shared earlier. Upcoming 16.2.14 release will use the new checking approach and hence fix the problem.

@YiteGu
Copy link
Contributor

YiteGu commented Aug 31, 2023

Yes, your presume is right.

@github-actions
Copy link

github-actions bot commented Sep 9, 2023

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@github-actions
Copy link

github-actions bot commented Oct 4, 2023

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@ifed01
Copy link
Contributor Author

ifed01 commented Mar 12, 2024

jenkins test make check

@ifed01
Copy link
Contributor Author

ifed01 commented Mar 12, 2024

jenkins test api

Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
Intended for the forthcoming major update with new allocator
implementation.

Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
Refactor hybrid allocator in a way to permit alternative
hybrid allocator implementations using the same code base.

Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
Test case depends strongly on avl allocator results. Hence sticking to
this allocator explicitly.

Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
#define dout_context (T::get_context())
#define dout_subsys ceph_subsys_bluestore
#undef dout_prefix
#define dout_prefix *_dout << (std::string(this->get_type()) + "::").c_str()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need to degrade std::string to c_str ?

want, unit, max_alloc_size, hint, extents);
ceph_assert(res >= 0); // all the errors to be handled inside
// __allocate_or_rollback()
if ((uint64_t)res < want) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its either res == 0 or res == want.
So, how about
ceph_assert(res==0 || res == want); if (res == 0) {
and there is no need for confusing 'want-resbelow, norres += res2`.

if (primary) {
res = T::_allocate(want, unit, max_alloc_size, hint, extents);
} else if (bmap_alloc) {
res = bmap_alloc->allocate(want, unit, max_alloc_size, hint, extents);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be better ceph_assert(bmap_alloc) ?
__allocate_or_release is not an interface at any point, and I see its integrated in fallback logic.

} else if (bmap_alloc) {
res = bmap_alloc->allocate(want, unit, max_alloc_size, hint, extents);
}
if (res < 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case that res < 0 it would be r==-ENOSPC.
I think you had in mind res < want.

}
if (res < 0) {
// got a failure, release already allocated
PExtentVector local_extents;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for optimizing for successful case.

}

template <typename T>
int64_t HybridAllocatorBase<T>::__allocate_or_rollback(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what special meaning is behind double underscore __ that appeared here.

* [0, base]
* (base, base*factor*2]
* (base*factor*2, base*factor*4]
* (base*factor*4, base*factor*8]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correction:

   * [0, base]
   * (base, base*2^factor]
   * (base*2^factor, base*2^(factor*2)]
   * (base*2^(factor*2), base*2^(factor*3]

Copy link
Contributor

@aclamk aclamk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the work is ready to go.
Minor non-critical comments.
Note: I did not review logic of btree allocator. I assume it is the copy from avl.

@markhpc
Copy link
Member

markhpc commented May 2, 2024

@aclamk Is it worth doing some preliminary performance sanity checks at this point?

/*
* return whether x is aligned with (align)
* eg, p2aligned(1200, 0x400) ==> false
* eg, p2aligned(1024, 0x400) ==> true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very confusing to use decimal vs hex here.

#include "BitmapAllocator.h"

class HybridAllocator : public AvlAllocator {
template <typename T>
Copy link
Contributor

@pereman2 pereman2 May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
template <typename T>
template <typename PrimaryAllocator>


template <typename T>
void HybridAllocatorBase<T>::init_rm_free(uint64_t offset, uint64_t length)
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In non-corrupted hybrid allocator state there is always a gap between chunk free in primary (avl/btree/btree2) and chunk free seconary (bitmap).

                  1                 2                 3                 4                 5
0        8       f0        8       f0        8       f0        8       f0        8       f0
rrrrrrrrr bbbbb bb bb b     b b rrrrrrrrrrrrrrrrrr       rrrrrrrrrrrrrrrrrrrr b rrrrrrrrrr

r - ranged, b - bitmap

It is not possible to have: rrrrrrbbbbbb because bitmap part will be moved to range,
and they would form continuous rrrrrrrrrrrr.

So, there are only 2 kinds of valid init_rm_free(off,len) ranges:
a) [off-len) is completely inside some range rrrrrrrrr
b) bitmap has all set in range [off-len) bbbbbbb

I claim that logic of _try_remove_from_tree can be much simpler.

<< std::dec
<< dendl;
ceph_assert(size);
if (!bmap_alloc) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main, if not only goal of hybrid was to save memory.

It is suboptimal that when we are consuming too much memory with ranged allocator, we create bitmap and consume even more.
We should have a strategy, that if we pass over some threshold, we create bitmap, but then move ranges to bitmap until we are again under memory limit.

}

void Btree2Allocator::_remove_from_tree(uint64_t start, uint64_t size)
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we create the Bitmap side of hybrid, there is no path when we decide that it is no longer needed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants