Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

9.1.8 can't see metadata from earlier versions #45

Closed
richarson opened this issue Aug 9, 2022 · 6 comments
Closed

9.1.8 can't see metadata from earlier versions #45

richarson opened this issue Aug 9, 2022 · 6 comments

Comments

@richarson
Copy link

Hi,

I have a couple of CentOS 7 servers which were using older drbd versions from elrepo.org (9.0.30) and after an update to 9.1.8 and a reboot, the resources stayed in Diskless status.

I see this in dmesg:

[Wed Aug 3 14:22:56 2022] drbd disk1/0 drbd1: drbd_md_sync_page_io(,1875385000s,READ) failed with error -5
[Wed Aug 3 14:22:56 2022] drbd disk1/0 drbd1: Error while reading metadata.

After downgrading to drbd 9.1.7 everithing went back to normal.

A few days later, the same happened with a pair of AlmaLinux 8 servers: 9.1.8 stays in Diskless mode, 9.1.7 and older work fine.

Those systems are otherwise fully uptodate.

@chrboe
Copy link
Contributor

chrboe commented Aug 19, 2022

Hi,

thanks for the report. We have implemented a fix for this (d7d76aa) which will soon be released, likely within the next week.

If you easily can (and if this is a non-production system), it would be great if you could build DRBD from that commit and verify that the fix resolves the issue for you.

If not, the obvious workaround is to stay on 9.1.7 for now (or downgrade).

@richarson
Copy link
Author

Hi, ElRepo.org maintainer kindly provided testing packages with this patch applied:

https://elrepo.org/bugs/view.php?id=1250

I'll try them today in a couple of test servers.

@chrboe
Copy link
Contributor

chrboe commented Aug 23, 2022

Great, please let us know if the issue is fixed.

By the way, we have also just released 9.1.9-rc.1 which also includes this patch: https://lists.linbit.com/pipermail/drbd-user/2022-August/026291.html

@richarson
Copy link
Author

richarson commented Aug 23, 2022

No luck with elrepo.org provided packages:

[root@lab-b ~] # grep -s version /proc/drbd
version: 9.1.7 (api:2/proto:110-121)
[root@lab-a ~] # grep -s version /proc/drbd
version: 9.1.8 (api:2/proto:86-121)
[root@lab-a ~] # drbdadm status
home1 role:Secondary
  disk:Diskless
  lab-b role:Primary
    peer-disk:UpToDate

home2 role:Secondary
  disk:Diskless
  lab-b role:Primary
    peer-disk:UpToDate

dmesg output

[mar ago 23 18:15:48 2022] drbd home1: Starting worker thread (from drbdsetup [19656])
[mar ago 23 18:15:48 2022] drbd home2: Starting worker thread (from drbdsetup [19658])
[mar ago 23 18:15:48 2022] drbd home1 lab-b: Starting sender thread (from drbdsetup [19686])
[mar ago 23 18:15:48 2022] drbd home2 lab-b: Starting sender thread (from drbdsetup [19691])
[mar ago 23 18:15:48 2022] drbd home1/0 drbd2: meta-data IO uses: blk-bio
[mar ago 23 18:15:48 2022] drbd home1/0 drbd2: drbd_md_sync_page_io(,1953525160s,READ) failed with error -5
[mar ago 23 18:15:48 2022] drbd home1/0 drbd2: Error while reading metadata.
[mar ago 23 18:15:48 2022] drbd home2/0 drbd3: meta-data IO uses: blk-bio
[mar ago 23 18:15:48 2022] drbd home2/0 drbd3: drbd_md_sync_page_io(,1953525160s,READ) failed with error -5
[mar ago 23 18:15:48 2022] drbd home2/0 drbd3: Error while reading metadata.
[mar ago 23 18:15:48 2022] drbd home1 lab-b: conn( StandAlone -> Unconnected )
[mar ago 23 18:15:48 2022] drbd home1 lab-b: Starting receiver thread (from drbd_w_home1 [19657])
[mar ago 23 18:15:48 2022] drbd home1 lab-b: conn( Unconnected -> Connecting )
[mar ago 23 18:15:48 2022] drbd home2 lab-b: conn( StandAlone -> Unconnected )
[mar ago 23 18:15:48 2022] drbd home2 lab-b: Starting receiver thread (from drbd_w_home2 [19659])
[mar ago 23 18:15:48 2022] drbd home2 lab-b: conn( Unconnected -> Connecting )
[mar ago 23 18:15:49 2022] drbd home1 lab-b: Handshake to peer 1 successful: Agreed network protocol version 121
[mar ago 23 18:15:49 2022] drbd home1 lab-b: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
[mar ago 23 18:15:49 2022] drbd home1 lab-b: Starting ack_recv thread (from drbd_r_home1 [19733])
[mar ago 23 18:15:49 2022] drbd home2 lab-b: Handshake to peer 1 successful: Agreed network protocol version 121
[mar ago 23 18:15:49 2022] drbd home2 lab-b: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
[mar ago 23 18:15:49 2022] drbd home2 lab-b: Starting ack_recv thread (from drbd_r_home2 [19735])
[mar ago 23 18:15:49 2022] drbd home1: Preparing cluster-wide state change 3641581203 (0->1 499/146)
[mar ago 23 18:15:49 2022] drbd home2: Preparing cluster-wide state change 3975294569 (0->1 499/146)
[mar ago 23 18:15:49 2022] drbd home2/0 drbd3: disabling discards due to peer capabilities
[mar ago 23 18:15:49 2022] drbd home1/0 drbd2: disabling discards due to peer capabilities
[mar ago 23 18:15:49 2022] drbd home2/0 drbd3: size = 927 GB (971649028 KB)
[mar ago 23 18:15:49 2022] drbd home1/0 drbd2: size = 927 GB (971649028 KB)
[mar ago 23 18:15:49 2022] drbd home2/0 drbd3 lab-b: my exposed UUID: 0000000000000000
[mar ago 23 18:15:49 2022] drbd home2/0 drbd3 lab-b: peer 3B29AC44594D67A0:0000000000000000:69CC366B62245032:E3DD42074C878166 bits:0 flags:120
[mar ago 23 18:15:49 2022] drbd home1/0 drbd2 lab-b: my exposed UUID: 0000000000000000
[mar ago 23 18:15:49 2022] drbd home1/0 drbd2 lab-b: peer 0C1A198B548AD30C:0000000000000000:4E94AFC73414BC9A:2A8E3B7E31F66CD8 bits:0 flags:120
[mar ago 23 18:15:49 2022] drbd home1: State change 3641581203: primary_nodes=2, weak_nodes=FFFFFFFFFFFFFFFC
[mar ago 23 18:15:49 2022] drbd home1: Committing cluster-wide state change 3641581203 (343ms)
[mar ago 23 18:15:49 2022] drbd home1 lab-b: conn( Connecting -> Connected ) peer( Unknown -> Primary )
[mar ago 23 18:15:49 2022] drbd home1/0 drbd2 lab-b: pdsk( DUnknown -> UpToDate ) repl( Off -> Established )
[mar ago 23 18:15:49 2022] drbd home2: State change 3975294569: primary_nodes=2, weak_nodes=FFFFFFFFFFFFFFFC
[mar ago 23 18:15:49 2022] drbd home2: Committing cluster-wide state change 3975294569 (362ms)
[mar ago 23 18:15:49 2022] drbd home2 lab-b: conn( Connecting -> Connected ) peer( Unknown -> Primary )
[mar ago 23 18:15:49 2022] drbd home2/0 drbd3 lab-b: pdsk( DUnknown -> UpToDate ) repl( Off -> Established )

@chrboe
Copy link
Contributor

chrboe commented Sep 1, 2022

Seems like something went wrong with the rebuild there. I suspect the kernel compatibility patch cache was not updated correctly.

We have internally verified that this issue is fixed with >9.1.9. Please try with 9.1.10 and see if that works for you.

@richarson
Copy link
Author

Sorry for not replying before, I didn't have time to test it till right now.

Version 9.1.10 seems to work fine with 9.1.7:

[root@lab-a ~] # grep -s version /proc/drbd
version: 9.1.7 (api:2/proto:110-121)

[root@lab-a ~] # drbdadm status
home1 role:Primary
  disk:UpToDate
  lab-b.dattaweb.com role:Secondary
    peer-disk:UpToDate

home2 role:Primary
  disk:UpToDate
  lab-b.dattaweb.com role:Secondary
    peer-disk:UpToDate
[root@lab-b ~] # grep -s version /proc/drbd
version: 9.1.10 (api:2/proto:86-121)

[root@lab-b ~] # drbdadm status
home1 role:Secondary
  disk:UpToDate
  lab-a.dattaweb.com role:Primary
    peer-disk:UpToDate

home2 role:Secondary
  disk:UpToDate
  lab-a.dattaweb.com role:Primary
    peer-disk:UpToDate

Thanks!

@chrboe chrboe closed this as completed Sep 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants