New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rbd-nbd: update size only when NBD_SET_SIZE successful #14005

Merged
merged 1 commit into from Mar 18, 2017

Conversation

Projects
None yet
4 participants
@liupan1111
Contributor

liupan1111 commented Mar 17, 2017

Signed-off-by: Pan Liu liupan1111@gmail.com

rbd-nbd: only set size to new_size when NBD_SET_SIZE successfully.
Signed-off-by: Pan Liu <liupan1111@gmail.com>
@liupan1111

This comment has been minimized.

Contributor

liupan1111 commented Mar 17, 2017

@trociny Please help take a look.

@liupan1111 liupan1111 requested a review from trociny Mar 17, 2017

@dmick

This comment has been minimized.

Member

dmick commented Mar 17, 2017

My fault the submodule test failed, ignore

@trociny

This comment has been minimized.

Contributor

trociny commented Mar 17, 2017

@liupan1111 Actually I am not sure this makes things better (or different). For this I would need to know real cases when ioctl may fail. I suppose the only difference after your change, that it will try ioctl again when the next notification (that does not change size) comes. But I suppose ioctl will fail again.

Not sure what would be the best in this case. May be nbd process shut down, or switch to RO mode, or all subsequent nbd requests returning error? @dillaman do you have any opinion?

@dillaman

This comment has been minimized.

Contributor

dillaman commented Mar 17, 2017

@trociny IMHO, I don't see anything wrong with the change but also probably provides little value. If the nbd resize failed and the client writes to an extent outside the new image size, the IO will fail -- and in the case where the image size was increased, the client just won't have access to the new space. @liupan1111 did you actually see this ioctl fail before? [1]

[1] https://github.com/torvalds/linux/blob/master/drivers/block/nbd.c#L779

@liupan1111

This comment has been minimized.

Contributor

liupan1111 commented Mar 17, 2017

@dillaman and @trociny I haven't see this IOCtl failed actually. But the logic of change size was not reasonable to me, so I change it. I want to confirm very logic correctly before there is really an possible error happen after it is online.

@trociny

LGTM

@liupan1111

This comment has been minimized.

Contributor

liupan1111 commented Mar 17, 2017

@dillaman @trociny talking about NBD and librbd, I met an issue: if we are writing one nbd device, and manually killed a rbd-nbd process, there will be linux panic sometimes ... Do you have any idea to resolve this?

@trociny trociny changed the title from rbd-nbd: only set size to new_size when NBD_SET_SIZE successfully. to rbd-nbd: update size only when NBD_SET_SIZE successful Mar 17, 2017

@trociny

This comment has been minimized.

Contributor

trociny commented Mar 17, 2017

if we are writing one nbd device, and manually killed a rbd-nbd process, there will be linux panic sometimes ... Do you have any idea to resolve this?

I think this is the kernel/nbd driver problem. They should properly handle this.

@trociny

This comment has been minimized.

Contributor

trociny commented Mar 17, 2017

I think this is the kernel/nbd driver problem. They should properly handle this.

I mean, in general, it is certainly the kernel problem: any misbehaviour of a user-space process should be handled.

But I think we can improve rbd-nbd for case when it is killed with SIGINT or SIGTERM (have you observed the panic when killing with the default signal?). We can register a signal handler. See rbd-mirror as an example (start from main.cc, register_async_signal_handler).

@liupan1111

This comment has been minimized.

Contributor

liupan1111 commented Mar 18, 2017

@trociny, yes, for SIGINT or SIGTERM, we could register handler, but it is a ideal case. If rbd-nbd crashed for some exceptional reason: out of memory, program bugs, ... how does user space handle this?

@trociny

This comment has been minimized.

Contributor

trociny commented Mar 18, 2017

If a process crashes due to out of memory or program bug its state is not consistent. E.g. you can't be sure that a memory region that contain an object you are trying to access is valid. In this case the best thing the program can do is to die. The ceph programs have a handler that tries to log the backtrace, usually it succeeds when the inconsistency was detected internally (assert failed).

@trociny trociny merged commit 949915c into ceph:master Mar 18, 2017

2 of 3 checks passed

Unmodifed Submodules Approval needed: modified submodules found
Details
Signed-off-by all commits in this PR are signed
Details
default Build finished.
Details

@liupan1111 liupan1111 deleted the liupan1111:wip-fix-resize-issue branch Mar 18, 2017

@liupan1111

This comment has been minimized.

Contributor

liupan1111 commented Mar 18, 2017

@trociny , i agree, i would like to implement it and let u review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment