Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rbd-nbd: update size only when NBD_SET_SIZE successful #14005

Merged
merged 1 commit into from
Mar 18, 2017
Merged

rbd-nbd: update size only when NBD_SET_SIZE successful #14005

merged 1 commit into from
Mar 18, 2017

Conversation

liupan1111
Copy link
Contributor

Signed-off-by: Pan Liu liupan1111@gmail.com

Signed-off-by: Pan Liu <liupan1111@gmail.com>
@liupan1111
Copy link
Contributor Author

@trociny Please help take a look.

@dmick
Copy link
Member

dmick commented Mar 17, 2017

My fault the submodule test failed, ignore

@trociny
Copy link
Contributor

trociny commented Mar 17, 2017

@liupan1111 Actually I am not sure this makes things better (or different). For this I would need to know real cases when ioctl may fail. I suppose the only difference after your change, that it will try ioctl again when the next notification (that does not change size) comes. But I suppose ioctl will fail again.

Not sure what would be the best in this case. May be nbd process shut down, or switch to RO mode, or all subsequent nbd requests returning error? @dillaman do you have any opinion?

@dillaman
Copy link

@trociny IMHO, I don't see anything wrong with the change but also probably provides little value. If the nbd resize failed and the client writes to an extent outside the new image size, the IO will fail -- and in the case where the image size was increased, the client just won't have access to the new space. @liupan1111 did you actually see this ioctl fail before? [1]

[1] https://github.com/torvalds/linux/blob/master/drivers/block/nbd.c#L779

@liupan1111
Copy link
Contributor Author

@dillaman and @trociny I haven't see this IOCtl failed actually. But the logic of change size was not reasonable to me, so I change it. I want to confirm very logic correctly before there is really an possible error happen after it is online.

Copy link
Contributor

@trociny trociny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@liupan1111
Copy link
Contributor Author

liupan1111 commented Mar 17, 2017

@dillaman @trociny talking about NBD and librbd, I met an issue: if we are writing one nbd device, and manually killed a rbd-nbd process, there will be linux panic sometimes ... Do you have any idea to resolve this?

@trociny trociny changed the title rbd-nbd: only set size to new_size when NBD_SET_SIZE successfully. rbd-nbd: update size only when NBD_SET_SIZE successful Mar 17, 2017
@trociny
Copy link
Contributor

trociny commented Mar 17, 2017

if we are writing one nbd device, and manually killed a rbd-nbd process, there will be linux panic sometimes ... Do you have any idea to resolve this?

I think this is the kernel/nbd driver problem. They should properly handle this.

@trociny
Copy link
Contributor

trociny commented Mar 17, 2017

I think this is the kernel/nbd driver problem. They should properly handle this.

I mean, in general, it is certainly the kernel problem: any misbehaviour of a user-space process should be handled.

But I think we can improve rbd-nbd for case when it is killed with SIGINT or SIGTERM (have you observed the panic when killing with the default signal?). We can register a signal handler. See rbd-mirror as an example (start from main.cc, register_async_signal_handler).

@liupan1111
Copy link
Contributor Author

@trociny, yes, for SIGINT or SIGTERM, we could register handler, but it is a ideal case. If rbd-nbd crashed for some exceptional reason: out of memory, program bugs, ... how does user space handle this?

@trociny
Copy link
Contributor

trociny commented Mar 18, 2017

If a process crashes due to out of memory or program bug its state is not consistent. E.g. you can't be sure that a memory region that contain an object you are trying to access is valid. In this case the best thing the program can do is to die. The ceph programs have a handler that tries to log the backtrace, usually it succeeds when the inconsistency was detected internally (assert failed).

@trociny trociny merged commit 949915c into ceph:master Mar 18, 2017
@liupan1111 liupan1111 deleted the wip-fix-resize-issue branch March 18, 2017 12:06
@liupan1111
Copy link
Contributor Author

@trociny , i agree, i would like to implement it and let u review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants