Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os/bluestore: fix the allocate in bluefs #19030

Merged
merged 1 commit into from Nov 23, 2017

Conversation

tangwenjun3
Copy link
Contributor

@tangwenjun3 tangwenjun3 commented Nov 20, 2017

when bluefs succeed to reserve but failed to alloc in db space,
it will cause a assert, just because of the space fragmentation.
in this situation, it could not use slow device space,and it
would happen in stupid or bitmap allocator.

Signed-off-by: tangwenjun tang.wenjun3@zte.com.cn

@tchaikov
Copy link
Contributor

src/os/bluestore/BlueFS.cc 100644 → 100755

@tangwenjun3 please do not add "x" permission bit to the source file, unless it's supposed to be executable.

@tangwenjun3
Copy link
Contributor Author

@tchaikov fix

extents.reserve(4); // 4 should be (more than) enough for most allocations
alloc_len = alloc[id]->allocate(left, min_alloc_size, hint, &extents);
}
if (r < 0 || (alloc_len < (int64_t)left)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tangwenjun3 If reserve() is okay but we fail to allocate(), we should call unreserve() first before we switch to try other devices(e.g., slow), right? otherwise there is potential space leaks..

@tangwenjun3 tangwenjun3 force-pushed the wip-fix-bluefs-allocate branch 7 times, most recently from 4550cfd to c8899f2 Compare November 20, 2017 11:10
Copy link
Member

@xiexingguo xiexingguo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch

}
if (r < 0 || (alloc_len < (int64_t)left)) {
if (r == 0)
alloc[id]->unreserve(left);
Copy link
Contributor

@ifed01 ifed01 Nov 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should probably unreserve the difference between left and alloc_len and release already allocated extents

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, right.

release already allocated extents

But instead of doing this, I think we can just push the allocated extents into ev as we technically support mix-usage of space from different devices:

    if (r ==0) {
      alloc[id]->unreserve(left - alloc_len);
      for (auto& p : extents) {
        bluefs_extent_t e = bluefs_extent_t(id, p.offset, p.length);
        if (!ev->empty() &&
	    ev->back().bdev == e.bdev &&
	    ev->back().end() == (uint64_t) e.offset) {
          ev->back().length += e.length;
        } else {
          ev->push_back(e);
        }
      }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, on a second thought, "release already allocated extents" perhaps is a better choice as supporting a single request of allocating space from multiple devices is a little bit too complicated...

@tangwenjun3 tangwenjun3 force-pushed the wip-fix-bluefs-allocate branch 5 times, most recently from 0c8f27d to 22188dd Compare November 21, 2017 10:37
when bluefs succeed to reserve but failed to alloc in db space,

it will cause a assert, just because of the space fragmentation.

in this situation, it could not use slow device space,

and it would happen in stupid or avl allocator.

Signed-off-by: tangwenjun <tang.wenjun3@zte.com.cn>
@tangwenjun3
Copy link
Contributor Author

@ifed01 @xiexingguo
think twice, choose "release already allocated extents"

@xiexingguo
Copy link
Member

lgtm

@tchaikov tchaikov merged commit f21ef22 into ceph:master Nov 23, 2017
@tchaikov
Copy link
Contributor

@tangwenjun3 could you please help prepare the luminous backport? as the conflict resolution is non-trivial.

@tangwenjun3
Copy link
Contributor Author

@tchaikov ok

tangwenjun3 added a commit to tangwenjun3/ceph that referenced this pull request Nov 23, 2017
backport from pr ceph#19030

Signed-off-by: tangwenjun <tang.wenjun3@zte.com.cn>
tangwenjun3 added a commit to tangwenjun3/ceph that referenced this pull request Nov 23, 2017
backport from pr ceph#19030

Signed-off-by: tangwenjun <tang.wenjun3@zte.com.cn>
@Aran85
Copy link
Contributor

Aran85 commented Apr 20, 2018

@xiexingguo should we append a perf counter in bluefs to watch the count of falling to slow drive which because of fragmentation? if it is frequently ,i think we should do something else to avoid this.

@xiexingguo
Copy link
Member

@xiexingguo should we append a perf counter in bluefs to watch the count of falling to slow drive which because of fragmentation? if it is frequently ,i think we should do something else to avoid this.

Yeah, make sense to me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants