New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Buffer I/O error" regression between kernels 4.11 and 4.12 #57

Closed
alkisg opened this Issue Sep 29, 2017 · 12 comments

Comments

Projects
None yet
4 participants
@alkisg

alkisg commented Sep 29, 2017

Using Ubuntu 16.04, nbd-server and client=1:3.13-1.

a) With mainline (vanilla) kernel 4.11.12 on the client, the commands below run fine without errors.
b) Then I upgrade to mainline kernel 4.12.0, and I run again:

modprobe nbd
nbd-client server-ip -N /opt/ltsp/i386 /dev/nbd5
dmesg

And I see the following errors:

[ 73.824873] nbd: registered device at major 43
[ 84.791001] nbd5: detected capacity change from 0 to 20936916992
[ 84.791071] block nbd5: Attempted send on invalid socket
[ 84.791077] blk_update_request: I/O error, dev nbd5, sector 0
[ 84.791080] Buffer I/O error on dev nbd5, logical block 0, async page read
<the 3 lines above repeated 10 times>
[ 84.791132] block nbd5: Attempted send on invalid socket
[ 84.791133] blk_update_request: I/O error, dev nbd5, sector 2
[ 84.791134] Buffer I/O error on dev nbd5, logical block 1, async page read
[ 84.791140] ldm_validate_partition_table(): Disk read failed.
[ 84.791175] Dev nbd5: unable to read RDB block 0
[ 84.791228] nbd5: unable to read partition table

I can reproduce this in many installations, real or VMs.
Thanks!

@alkisg alkisg changed the title from "Buffer I/O error" regression between kernels 4.8 and 4.10 to "Buffer I/O error" regression between kernels 4.11 and 4.12 Sep 30, 2017

@alkisg

This comment has been minimized.

alkisg commented Sep 30, 2017

Ubuntu uses patched kernels with various backports, so the numbers were misleading.
I updated the issue description to reflect the mainline (vanilla) numbers instead.
Just for completeness, the issue happens:
In Ubuntu's kernels: after 4.8.0-58 and before 4.10.0-14
In mainline kernels: after 4.11.12 and before 4.12.0

I also saw that to make things 100% reproducible, I needed to call udevadm settle, so my test case now is:

modprobe nbd
udevadm settle
nbd-client server-ip -N /opt/ltsp/i386 /dev/nbd5
udevadm settle
dmesg | grep nbd

@alkisg

This comment has been minimized.

alkisg commented Sep 30, 2017

I also tested on Debian Stretch:

  • 4.9.0-3-686-pae: OK
  • 4.11.0-0.bpo.1-686-pae: OK
  • 4.12.0-0.bpo.2-686-pae: Has the problem
@yoe

This comment has been minimized.

Member

yoe commented Oct 1, 2017

Are you saying that the problem does not exist in the most recent kernels you could find? Or do I misunderstand you there?

@alkisg

This comment has been minimized.

alkisg commented Oct 2, 2017

Hi Wouter, let me phrase it better,

the problem started with kernel 4.12 and is still happening in the most recent kernel I could find, which was 4.14-rc2.

@yoe

This comment has been minimized.

Member

yoe commented Oct 3, 2017

Oh, okay then.

@josefbacik, any idea?

@josefbacik

This comment has been minimized.

josefbacik commented Oct 5, 2017

Oops, I'll take a look in the morning.

@josefbacik

This comment has been minimized.

josefbacik commented Oct 5, 2017

Oh actually I think this is my timeout patch that I fixed later, can you try Linus master?

@alkisg

This comment has been minimized.

alkisg commented Oct 6, 2017

Hi Josef, I tried the latest vanilla kernel from Ubuntu's daily builds and the problem still happens there:
http://kernel.ubuntu.com/~kernel-ppa/mainline/daily/current/
cod/tip/daily/2017-10-05
42b76d0e6b1fe0fcb90e0ff6b4d053d50597b031

I.e. Linux master torvalds/linux@42b76d0

@josefbacik

This comment has been minimized.

josefbacik commented Oct 9, 2017

Alright got it nailed down, sorry about that, apparently all of my regression tests only do the netlink interface, save the one that checks that the ioctl and netlink interfaces behave with eachother. I've submitted the patch

[PATCH] nbd: don't set the device size until we're connected

to fix it and cc'ed stable so it'll make its way back to distro kernels. @yoe sorry, I thought I had updated the mailinglist but accidentally sent it to the old sf list. I've fixed my stuff so that won't happen again.

@alkisg

This comment has been minimized.

alkisg commented Oct 10, 2017

Thanks a lot! I'll try to detect when it lands on daily builds, so that I can test it.

@alkisg

This comment has been minimized.

alkisg commented Jan 17, 2018

I tested on Ubuntu 18.04 with the upcoming 4.15 kernel and it works fine there.
Closing, thank you. :)

@alkisg alkisg closed this Jan 17, 2018

@chenhaiq

This comment has been minimized.

chenhaiq commented Oct 15, 2018

I still can reproduce this problem in ubuntu 1804, kernel 4.15.0-36-generic:
cmd to reproduce:

qemu-img create -f qcow2 sample.img 10G
modprobe nbd

udevadm settle
qemu-nbd -c /dev/nbd0 sample.img
udevadm settle
Oct 15 11:24:21 i-uvwxiqu9 kernel: [   44.003028] block nbd0: NBD_DISCONNECT
Oct 15 11:24:21 i-uvwxiqu9 kernel: [   44.003147] block nbd0: shutting down sockets
Oct 15 11:24:21 i-uvwxiqu9 kernel: [   44.005229] nbd0: detected capacity change from 0 to 10737418240
Oct 15 11:24:21 i-uvwxiqu9 kernel: [   44.005296] print_req_error: 2 callbacks suppressed
Oct 15 11:24:21 i-uvwxiqu9 kernel: [   44.005297] print_req_error: I/O error, dev nbd0, sector 0
Oct 15 11:24:21 i-uvwxiqu9 kernel: [   44.006937] buffer_io_error: 2 callbacks suppressed
Oct 15 11:24:21 i-uvwxiqu9 kernel: [   44.006939] Buffer I/O error on dev nbd0, logical block 0, async page read
Oct 15 11:24:21 i-uvwxiqu9 kernel: [   44.009225] print_req_error: I/O error, dev nbd0, sector 0
Oct 15 11:24:21 i-uvwxiqu9 kernel: [   44.010670] Buffer I/O error on dev nbd0, logical block 0, async page read
Oct 15 11:24:21 i-uvwxiqu9 kernel: [   44.012777] print_req_error: I/O error, dev nbd0, sector 0
Oct 15 11:24:21 i-uvwxiqu9 kernel: [   44.014259] Buffer I/O error on dev nbd0, logical block 0, async page read
Oct 15 11:24:21 i-uvwxiqu9 kernel: [   44.015965] ldm_validate_partition_table(): Disk read failed.
Oct 15 11:24:21 i-uvwxiqu9 kernel: [   44.015973] print_req_error: I/O error, dev nbd0, sector 0
Oct 15 11:24:21 i-uvwxiqu9 kernel: [   44.018006] Buffer I/O error on dev nbd0, logical block 0, async page read
Oct 15 11:24:21 i-uvwxiqu9 kernel: [   44.020183] print_req_error: I/O error, dev nbd0, sector 0
Oct 15 11:24:21 i-uvwxiqu9 kernel: [   44.021638] Buffer I/O error on dev nbd0, logical block 0, async page read
Oct 15 11:24:21 i-uvwxiqu9 kernel: [   44.023379] print_req_error: I/O error, dev nbd0, sector 0
Oct 15 11:24:21 i-uvwxiqu9 kernel: [   44.024763] Buffer I/O error on dev nbd0, logical block 0, async page read
Oct 15 11:24:21 i-uvwxiqu9 kernel: [   44.026803] print_req_error: I/O error, dev nbd0, sector 0
Oct 15 11:24:21 i-uvwxiqu9 kernel: [   44.028193] Buffer I/O error on dev nbd0, logical block 0, async page read
Oct 15 11:24:21 i-uvwxiqu9 kernel: [   44.030244] Dev nbd0: unable to read RDB block 0
Oct 15 11:24:21 i-uvwxiqu9 kernel: [   44.031491] print_req_error: I/O error, dev nbd0, sector 0
Oct 15 11:24:21 i-uvwxiqu9 kernel: [   44.032871] Buffer I/O error on dev nbd0, logical block 0, async page read
Oct 15 11:24:21 i-uvwxiqu9 kernel: [   44.034815] print_req_error: I/O error, dev nbd0, sector 0
Oct 15 11:24:21 i-uvwxiqu9 kernel: [   44.036188] Buffer I/O error on dev nbd0, logical block 0, async page read
Oct 15 11:24:21 i-uvwxiqu9 kernel: [   44.038083] print_req_error: I/O error, dev nbd0, sector 24
Oct 15 11:24:21 i-uvwxiqu9 kernel: [   44.039485] Buffer I/O error on dev nbd0, logical block 3, async page read
Oct 15 11:24:21 i-uvwxiqu9 kernel: [   44.041432]  nbd0: unable to read partition table
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment