Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rbd: when Ceph cluster is full, return -ENOSPC for creating image command. #14167

Closed
wants to merge 1 commit into from
Closed

rbd: when Ceph cluster is full, return -ENOSPC for creating image command. #14167

wants to merge 1 commit into from

Conversation

liupan1111
Copy link
Contributor

@liupan1111 liupan1111 commented Mar 27, 2017

As we talked in #14116, I improved "create" path, and return -ENOSPC when cluster is full.

@liupan1111
Copy link
Contributor Author

@dillaman Please help take a look, thanks!

@yangdongsheng
Copy link
Contributor

Just curious, how can we decide a action should set_osdmap_full_try() or not? So far, I think remove and snap remove looks reasonable to me. But why create? If create should not be blocked, what about other actions? resize? list? What's the policy for this?

@liupan1111
Copy link
Contributor Author

list, resize, won't be blocked at this moment, because they are not in "write" path.

@yangdongsheng
Copy link
Contributor

wait a minute, resize will not be blocked? how can I request a larger size when the cluster is full?

@liupan1111
Copy link
Contributor Author

rbd image is thin-provisioned, "resize" will only change the value of metadata, won't really change the size.

@yangdongsheng
Copy link
Contributor

yangdongsheng commented Mar 28, 2017

But it's a "write" on header object, refer to this:
https://github.com/ceph/ceph/blob/master/src/librbd/operation/ResizeRequest.cc#L383

and I got a block in my testing.

$ rbd resize test -s 10G
2017-03-28 16:44:52.547655 7f98217fa700  0 client.4152.objecter  FULL, paused modify 0x7f9800008a70 tid 7
^C
$ ^C

@liupan1111
Copy link
Contributor Author

cool, good catch. I want to fix create at this moment, and fix other write related commands together later.

@yangdongsheng
Copy link
Contributor

I am just wondering, what the kind of the actions should be blocked, just data writing?
So, should we set_osdmap_full_try() on ImageCtx.md_ctx?

I don't think it's a good idea to set_osdmap_full_try() on actions one by one.

@dillaman
Copy link

@yangdongsheng While I also agree that this continued effort has the potential to get beyond ridiculous real fast when the simple solution is don't run with clusters at 100% capacity, we cannot just set the flag on ImageCtx::md_ctx unconditionally since some IO things also hit it (like object map updates).

@dillaman
Copy link

@liupan1111 That first commit against the osdc should be a separate PR for the core team to review. What exactly is the bug you are fixing?

@liupan1111
Copy link
Contributor Author

liupan1111 commented Mar 28, 2017

@dillaman, the first commit is about incomplete fix in #12627, which is also about full try. I added op->target.flags |= CEPH_OSD_FLAG_FULL_TRY in wrong place.

I added this commit in this PR, because it is a prerequisite of next fix for create.

@liupan1111
Copy link
Contributor Author

@yangdongsheng While I also agree that this continued effort has the potential to get beyond ridiculous real fast when the simple solution is don't run with clusters at 100% capacity, we cannot just set the flag on ImageCtx::md_ctx unconditionally since some IO things also hit it (like object map updates).

One thing just want to clarify is: not 100% capacity, just full, which is controlled by mon_osd_full_ratio(default 95%).

@dillaman
Copy link

@liupan1111 Understood -- but it's one thing to support freeing up space from your cluster when you are full and it's another to try to work around every possible way to touch images when full.

@liupan1111
Copy link
Contributor Author

@dillaman Agree, so how about I just leave the second commit in this pr, and move the first one into a new PR?

@dillaman
Copy link

@liupan1111 Perfect, thanks

…mand.

Signed-off-by: Pan Liu <liupan1111@gmail.com>
@liupan1111
Copy link
Contributor Author

@dillaman done, thanks.

@liupan1111
Copy link
Contributor Author

@dillaman ping

@liupan1111
Copy link
Contributor Author

#14193

@dillaman
Copy link

dillaman commented Apr 3, 2017

@liupan1111 I agree w/ @yangdongsheng in that we don't want to keep adding these things one-by-one for each CLI action. It was one thing when it was just to help remove RBD images when the cluster is full, but it's going to be another thing to support any arbitrary RBD action. I think the only way to solve this would be to have a new librados::IoCtx within ImageCtx that is used for all operations throughout librbd w/o the need to only address this fix in the rbd CLI.

@liupan1111
Copy link
Contributor Author

@dillaman , I agree. I also talked with dongsheng offline, and I will think about it.

@dillaman dillaman closed this May 1, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants