Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rbd-nbd: fix stuck with disable request #50593

Merged
merged 2 commits into from Oct 30, 2023

Conversation

pkalever
Copy link

@pkalever pkalever commented Mar 20, 2023

Problem:

Trying to disable any feature on an rbd image mapped with nbd leads to stuck
in rbd-nbd.

The rbd-nbd registers a watcher callback to detect image resize in
NBDWatchCtx::handle_notify(). The handle_notify calls image info method, which
calls refresh_if_required and it got stuck there.

It is getting stuck in ImageState::refresh_if_required() because
DisableFeaturesRequest issues update notifications while still holding onto
the exclusive lock with everything that has to do with it blocked.

Solution:

Set only notify flag as part of NBDWatchCtx::handle_notify() and handle
the resize detection part as part of a different thread.

Fixes: https://tracker.ceph.com/issues/58740
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@pkalever pkalever requested a review from a team as a code owner March 20, 2023 09:24
@github-actions github-actions bot added the rbd label Mar 20, 2023
@pkalever
Copy link
Author

pkalever commented Mar 20, 2023

Local client logs:

2023-03-20T10:39:38.314+0530 7f6dad46c640 10 librbd::ImageWatcher: 0x7f6d78008b20 async request: [4316,140108007228656,1]                                                                                                                     
2023-03-20T10:39:38.314+0530 7f6dad46c640 20 librbd::ImageWatcher: scheduling async request time out: [4316,140108007228656,1]                                                                                                                
2023-03-20T10:39:38.314+0530 7f6dad46c640 20 librbd::image_watcher::NotifyLockOwner: 0x7f6d940020b0 send_notify                                                                                                                               
2023-03-20T10:39:38.314+0530 7f6dad46c640 20 librbd::watcher::Notifier: 0x7f6d78008be0 notify: pending=1                                                                                                                                      
2023-03-20T10:39:38.315+0530 7f6dadc6d640  5 librbd::Watcher: 0x7f6d78008b20 notifications_blocked: blocked=0                                                                                                                                 
2023-03-20T10:39:38.315+0530 7f6dadc6d640 10 librbd::Watcher::C_NotifyAck 0x558bf3a2f850 C_NotifyAck: id=240518168583, handle=140108007228656                                                                                                 
2023-03-20T10:39:38.315+0530 7f6dadc6d640 10 librbd::ImageWatcher: 0x7f6d78008b20 remote update_features request: [4316,140108007228656,1] 64 disabled                                                                                        
2023-03-20T10:39:38.315+0530 7f6dadc6d640 20 librbd::ExclusiveLock: 0x7f6d70001d30 accept_request=0 (request_type=0)                                                                                                                          
2023-03-20T10:39:38.315+0530 7f6dadc6d640 10 librbd::Watcher::C_NotifyAck 0x558bf3a2f850 finish: r=0                                                                                                                                          
2023-03-20T10:39:38.316+0530 7f6dadc6d640 20 librbd::watcher::Notifier: 0x7f6d78008be0 handle_notify: r=0                                                                                                                                     
2023-03-20T10:39:38.316+0530 7f6dadc6d640 20 librbd::watcher::Notifier: 0x7f6d78008be0 handle_notify: pending=0                                                                                                                               
2023-03-20T10:39:38.316+0530 7f6dadc6d640 20 librbd::image_watcher::NotifyLockOwner: 0x7f6d940020b0 handle_notify: r=0                                                                                                                        
2023-03-20T10:39:38.316+0530 7f6dadc6d640 20 librbd::image_watcher::NotifyLockOwner: 0x7f6d940020b0 handle_notify client responded with r=0                                                                                                   
2023-03-20T10:39:38.345+0530 7f6dad46c640  5 librbd::Watcher: 0x7f6d78008b20 notifications_blocked: blocked=0                                                                                                                                 
2023-03-20T10:39:38.345+0530 7f6dad46c640 10 librbd::Watcher::C_NotifyAck 0x7f6d70010980 C_NotifyAck: id=240518168584, handle=140108007228656                                                                                                 
2023-03-20T10:39:38.345+0530 7f6dad46c640 10 librbd::ImageWatcher: 0x7f6d78008b20 image header updated                                                                                                                                        
2023-03-20T10:39:38.345+0530 7f6dad46c640 20 librbd::ImageState: 0x7f6d9800ae00 handle_update_notification: refresh_seq = 1, last_refresh = 0                                                                                                 
2023-03-20T10:39:38.345+0530 7f6dad46c640 20 librbd::ImageState: 0x558bf3a2e4d0 ImageUpdateWatchers::notify                                                                                                                                   
2023-03-20T10:39:38.345+0530 7f6dad46c640 20 librbd::ImageState: 0x7f6d9800ae00 flush_update_watchers                                                                                                                                         
2023-03-20T10:39:38.345+0530 7f6dad46c640 20 librbd::ImageState: 0x558bf3a2e4d0 ImageUpdateWatchers::flush                                                                                                                                    
2023-03-20T10:39:38.345+0530 7f6dad46c640 20 librbd::ImageState: 0x558bf3a2e4d0 ImageUpdateWatchers::flush: completing flush                                                                                                                  
2023-03-20T10:39:38.345+0530 7f6dad46c640 10 librbd::ImageWatcher: 0x7f6d70003580 C_ResponseMessage: r=0                                                                                                                                      
2023-03-20T10:39:38.345+0530 7f6dad46c640 10 librbd::Watcher::C_NotifyAck 0x7f6d70010980 finish: r=0                                                                                                                                          
2023-03-20T10:39:43.348+0530 7f6dadc6d640  5 librbd::Watcher: 0x7f6d78008b20 notifications_blocked: blocked=0                                                                                                                                 
2023-03-20T10:39:43.348+0530 7f6dadc6d640 10 librbd::Watcher::C_NotifyAck 0x7f6d70003aa0 C_NotifyAck: id=257698037769, handle=140108007228656                                                                                                 
2023-03-20T10:39:43.348+0530 7f6dadc6d640 20 librbd::ImageWatcher: remove_async_request: [4316,140108007228656,1]                                                                                                                             
2023-03-20T10:39:43.348+0530 7f6dadc6d640 10 librbd::ImageWatcher: 0x7f6d78008b20 request finished: [4316,140108007228656,1]=-110                                                                                                             
2023-03-20T10:39:43.348+0530 7f6dadc6d640 10 librbd::Watcher::C_NotifyAck 0x7f6d70003aa0 finish: r=0                                                                                                                                          
2023-03-20T10:39:43.348+0530 7f6dadc6d640 20 librbd::Operations: handle_remote_request: r=-110                                                                                                                                                
2023-03-20T10:39:43.348+0530 7f6dadc6d640  5 librbd::Operations: update features timed out notifying lock owner                                                                                                                               
[...]                                                                                                                                                                                                                                         
2023-03-20T10:39:43.370+0530 7f6dadc6d640 10 librbd::ImageWatcher: 0x7f6d78008b20 async request: [4316,140108007228656,1]                                                                                                                     
2023-03-20T10:39:43.370+0530 7f6dadc6d640 20 librbd::ImageWatcher: scheduling async request time out: [4316,140108007228656,1]                                                                                                                
2023-03-20T10:39:43.370+0530 7f6dadc6d640 20 librbd::image_watcher::NotifyLockOwner: 0x7f6da003b000 send_notify                                                                                                                               
2023-03-20T10:39:43.370+0530 7f6dadc6d640 20 librbd::watcher::Notifier: 0x7f6d78008be0 notify: pending=1                                                                                                                                      
2023-03-20T10:39:43.371+0530 7f6dad46c640  5 librbd::Watcher: 0x7f6d78008b20 notifications_blocked: blocked=0                                                                                                                                 
2023-03-20T10:39:43.371+0530 7f6dad46c640 10 librbd::Watcher::C_NotifyAck 0x7f6d780153d0 C_NotifyAck: id=257698037770, handle=140108007228656                                                                                                 
2023-03-20T10:39:43.371+0530 7f6dad46c640 10 librbd::ImageWatcher: 0x7f6d78008b20 remote update_features request: [4316,140108007228656,1] 64 disabled                                                                                        
2023-03-20T10:39:43.371+0530 7f6dad46c640 20 librbd::ExclusiveLock: 0x7f6d70001d30 accept_request=0 (request_type=0)                                                                                                                          
2023-03-20T10:39:43.371+0530 7f6dad46c640 10 librbd::Watcher::C_NotifyAck 0x7f6d780153d0 finish: r=0                                                                                                                                          
2023-03-20T10:39:43.372+0530 7f6dadc6d640 20 librbd::watcher::Notifier: 0x7f6d78008be0 handle_notify: r=0                                                                                                                                     
2023-03-20T10:39:43.372+0530 7f6dadc6d640 20 librbd::watcher::Notifier: 0x7f6d78008be0 handle_notify: pending=0                                                                                                                               
2023-03-20T10:39:43.372+0530 7f6dadc6d640 20 librbd::image_watcher::NotifyLockOwner: 0x7f6da003b000 handle_notify: r=0                                                                                                                        
2023-03-20T10:39:43.372+0530 7f6dadc6d640 20 librbd::image_watcher::NotifyLockOwner: 0x7f6da003b000 handle_notify client responded with r=-110                                                                                                
2023-03-20T10:39:43.372+0530 7f6dadc6d640 20 librbd::ImageWatcher: remove_async_request: [4316,140108007228656,1]                                                                                                                             
2023-03-20T10:39:43.372+0530 7f6dadc6d640 20 librbd::Operations: handle_remote_request: r=-110                                                                                                                                                
2023-03-20T10:39:43.372+0530 7f6dadc6d640  5 librbd::Operations: update features timed out notifying lock owner                                                                                                                               

After 5 seconds (notifier timeout) you can see request finished: with timeout error

Remote client logs:

2023-03-20T10:39:38.317+0530 7f1fe3fff640 20 librbd::mirror::GetInfoRequest: 0x7f1fc800eac0 handle_get_mirror_image: mirroring is disabled                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                
2023-03-20T10:39:38.317+0530 7f1fe3fff640 20 librbd::mirror::GetInfoRequest: 0x7f1fc800eac0 finish: r=-2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
2023-03-20T10:39:38.317+0530 7f1fe3fff640 10 librbd::mirror::DisableRequest: 0x7f1fc800e090 handle_get_mirror_info: r=-2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
2023-03-20T10:39:38.317+0530 7f1fe3fff640 20 librbd::mirror::DisableRequest: 0x7f1fc800e090 handle_get_mirror_info: mirroring is not enabled for this image                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
2023-03-20T10:39:38.317+0530 7f1fe3fff640 20 librbd::DisableFeaturesRequest: 0x7f1fc0008a90 handle_disable_mirror_image: r=0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
2023-03-20T10:39:38.317+0530 7f1fe3fff640 20 librbd::DisableFeaturesRequest: 0x7f1fc0008a90 send_close_journal                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
2023-03-20T10:39:38.317+0530 7f1fe3fff640 20 librbd::Journal: 0x7f1fa4001170 close                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
2023-03-20T10:39:38.317+0530 7f1fe3fff640 20 librbd::Journal: 0x7f1fa4001170 transition_state: new state=7                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                
2023-03-20T10:39:38.317+0530 7f1fe3fff640 20 librbd::Journal: 0x7f1fa4001170 wait_for_steady_state: on_state=0x7f1fc000ffb0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
2023-03-20T10:39:38.317+0530 7f1ff88ac640 20 librbd::Journal: 0x7f1fa4001170 handle_recording_stopped: r=0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                
2023-03-20T10:39:38.317+0530 7f1ff88ac640 20 librbd::Journal: 0x7f1fa4001170 destroy_journaler: r=0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
2023-03-20T10:39:38.318+0530 7f1ff88ac640 20 librbd::Journal: 0x7f1fa4001170 transition_state: new state=8                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                
2023-03-20T10:39:38.322+0530 7f1fe3fff640 20 librbd::Journal: 0x7f1fa4001170 handle_journal_destroyed: r=0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                
2023-03-20T10:39:38.322+0530 7f1fe3fff640 20 librbd::Journal: 0x7f1fa4001170 transition_state: new state=9                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                
2023-03-20T10:39:38.322+0530 7f1fe3fff640  5 librbd::io::Dispatcher: 0x556f9e658910 shut_down_dispatch: dispatch_layer=3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
2023-03-20T10:39:38.322+0530 7f1fe3fff640 20 librbd::DisableFeaturesRequest: 0x7f1fc0008a90 handle_close_journal: r=0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
2023-03-20T10:39:38.322+0530 7f1fe3fff640 20 librbd::DisableFeaturesRequest: 0x7f1fc0008a90 send_remove_journal                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           
2023-03-20T10:39:38.322+0530 7f1fe3fff640 20 librbd::Journal::RemoveRequest: 0x556f9e7ec9a0 send                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          
2023-03-20T10:39:38.322+0530 7f1fe3fff640 20 librbd::Journal::RemoveRequest: 0x556f9e7ec9a0 stat_journal                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
2023-03-20T10:39:38.323+0530 7f1ff88ac640 20 librbd::Journal::RemoveRequest: handle_stat_journal: r=0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
2023-03-20T10:39:38.323+0530 7f1ff88ac640 20 librbd::Journal::RemoveRequest: 0x556f9e7ec9a0 init_journaler                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                
2023-03-20T10:39:38.328+0530 7f1fc4ff9640 20 librbd::Journal::RemoveRequest: handle_init_journaler: r=0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
2023-03-20T10:39:38.328+0530 7f1fc4ff9640 20 librbd::Journal::RemoveRequest: 0x556f9e7ec9a0 remove_journal                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                
2023-03-20T10:39:38.341+0530 7f1ff88ac640 20 librbd::Journal::RemoveRequest: handle_remove_journal: r=0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
2023-03-20T10:39:38.341+0530 7f1ff88ac640 20 librbd::Journal::RemoveRequest: 0x556f9e7ec9a0 shut_down_journaler                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           
2023-03-20T10:39:38.341+0530 7f1fc4ff9640 20 librbd::Journal::RemoveRequest: handle_journaler_shutdown: r=0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
2023-03-20T10:39:38.341+0530 7f1fc4ff9640 20 librbd::Journal::RemoveRequest: done.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
2023-03-20T10:39:38.341+0530 7f1fc4ff9640 20 librbd::DisableFeaturesRequest: 0x7f1fc0008a90 handle_remove_journal: r=0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
2023-03-20T10:39:38.341+0530 7f1fc4ff9640 20 librbd::DisableFeaturesRequest: 0x7f1fc0008a90 send_remove_object_map                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
2023-03-20T10:39:38.341+0530 7f1fc4ff9640 20 librbd::DisableFeaturesRequest: 0x7f1fc0008a90 send_set_features: new_features=61, features_mask=64                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          
2023-03-20T10:39:38.341+0530 7f1fc4ff9640 20 librbd::DisableFeaturesRequest: 0x7f1fc0008a90 send_append_op_event                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          
2023-03-20T10:39:38.344+0530 7f1fe3fff640 20 librbd::DisableFeaturesRequest: 0x7f1fc0008a90 handle_set_features: r=0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
2023-03-20T10:39:38.344+0530 7f1fe3fff640 20 librbd::DisableFeaturesRequest: 0x7f1fc0008a90 send_notify_update                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
2023-03-20T10:39:38.344+0530 7f1fe3fff640 20 librbd::ImageState: 0x556f9e6599a0 handle_update_notification: refresh_seq = 2, last_refresh = 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
2023-03-20T10:39:38.344+0530 7f1fe3fff640 20 librbd::ImageState: 0x556f9e856a40 ImageUpdateWatchers::notify                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
2023-03-20T10:39:38.344+0530 7f1fe3fff640 20 librbd::ImageState: 0x556f9e856a40 ImageUpdateWatchers::send_notify: handle=0, watcher=0x7ffc2f4f0060                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
2023-03-20T10:39:38.344+0530 7f1fe3fff640 10 librbd::ImageWatcher: 0x7f1fc8007800: notify_header_update                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
2023-03-20T10:39:38.344+0530 7f1fe3fff640 20 librbd::watcher::Notifier: 0x7f1fc80078c0 notify: pending=1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
2023-03-20T10:39:38.344+0530 7f1fc57fa640 20 librbd::ImageState: 0x556f9e856a40 ImageUpdateWatchers::handle_notify: handle=0, watcher=0x7ffc2f4f0060                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
2023-03-20T10:39:38.344+0530 7f1fc57fa640 20 librbd: info 0x556f9e808780                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
2023-03-20T10:39:38.345+0530 7f1ff88ac640  5 librbd::Watcher: 0x7f1fc8007800 notifications_blocked: blocked=0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
2023-03-20T10:39:38.345+0530 7f1ff88ac640 10 librbd::Watcher::C_NotifyAck 0x7f1fb8002400 C_NotifyAck: id=240518168584, handle=139774341970368                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
2023-03-20T10:39:38.345+0530 7f1ff88ac640 10 librbd::ImageWatcher: 0x7f1fc8007800 image header updated                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
2023-03-20T10:39:38.345+0530 7f1ff88ac640 20 librbd::ImageState: 0x556f9e6599a0 handle_update_notification: refresh_seq = 3, last_refresh = 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
2023-03-20T10:39:38.345+0530 7f1ff88ac640 20 librbd::ImageState: 0x556f9e856a40 ImageUpdateWatchers::notify                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
2023-03-20T10:39:38.345+0530 7f1ff88ac640 20 librbd::ImageState: 0x556f9e856a40 ImageUpdateWatchers::send_notify: handle=0, watcher=0x7ffc2f4f0060                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
2023-03-20T10:39:38.345+0530 7f1ff88ac640 20 librbd::ImageState: 0x556f9e6599a0 flush_update_watchers                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
2023-03-20T10:39:38.345+0530 7f1ff88ac640 20 librbd::ImageState: 0x556f9e856a40 ImageUpdateWatchers::flush                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                
2023-03-20T10:39:43.346+0530 7f1fe3fff640 20 librbd::watcher::Notifier: 0x7f1fc80078c0 handle_notify: r=-110                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
2023-03-20T10:39:43.346+0530 7f1fe3fff640 20 librbd::watcher::Notifier: 0x7f1fc80078c0 handle_notify: pending=0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           
2023-03-20T10:39:43.346+0530 7f1fe3fff640 20 librbd::DisableFeaturesRequest: 0x7f1fc0008a90 handle_notify_update: r=-110                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
2023-03-20T10:39:43.346+0530 7f1fe3fff640 20 librbd::DisableFeaturesRequest: 0x7f1fc0008a90 handle_finish: r=-110                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
2023-03-20T10:39:43.346+0530 7f1fe3fff640 20 librbd::ExclusiveLock: 0x7f1fc0000b60 unblock_requests                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
2023-03-20T10:39:43.346+0530 7f1fe3fff640  5 librbd::io::WriteBlockImageDispatch: 0x556f9e6b3750 unblock_writes: 0x556f9e808780, num=0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
2023-03-20T10:39:43.346+0530 7f1fe3fff640 10 librbd::ImageState: 0x556f9e6599a0 handle_prepare_lock_complete                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
2023-03-20T10:39:43.346+0530 7f1fe3fff640 10 librbd::ImageState: 0x556f9e6599a0 0x556f9e6599a0 send_refresh_unlock                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
2023-03-20T10:39:43.346+0530 7f1fe3fff640 10 librbd::image::RefreshRequest: 0x7f1fa4001170 send_v2_get_mutable_metadata                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
2023-03-20T10:39:43.346+0530 7f1fe3fff640 10 librbd::Request: 0x7f1fc0008a90 create_context_finisher                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
2023-03-20T10:39:43.346+0530 7f1fe3fff640 10 librbd::Request: 0x7f1fc0008a90 finish: r=-110                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
2023-03-20T10:39:43.346+0530 7f1fe3fff640 20 librbd::Operations: finish_op: update features r=-110                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
2023-03-20T10:39:43.346+0530 7f1fc77fe640 20 librbd::ImageWatcher: 0x7f1fc8007800 remote async request finished: [4316,140108007228656,1] = -110                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          
2023-03-20T10:39:43.346+0530 7f1fc77fe640 20 librbd::watcher::Notifier: 0x7f1fc80078c0 notify: pending=1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
2023-03-20T10:39:43.348+0530 7f1ff88ac640 10 librbd::image::RefreshRequest: 0x7f1fa4001170 handle_v2_get_mutable_metadata: r=0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
2023-03-20T10:39:43.348+0530 7f1ff88ac640 10 librbd::image::RefreshRequest: 0x7f1fa4001170 send_v2_get_parent: legacy=0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
2023-03-20T10:39:43.348+0530 7f1fe3fff640  5 librbd::Watcher: 0x7f1fc8007800 notifications_blocked: blocked=0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
2023-03-20T10:39:43.348+0530 7f1fe3fff640 10 librbd::Watcher::C_NotifyAck 0x7f1fb80039c0 C_NotifyAck: id=257698037769, handle=139774341970368                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
2023-03-20T10:39:43.348+0530 7f1fe3fff640 20 librbd::ImageWatcher: remove_async_request: [4316,140108007228656,1]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
2023-03-20T10:39:43.348+0530 7f1fe3fff640 10 librbd::Watcher::C_NotifyAck 0x7f1fb80039c0 finish: r=0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
2023-03-20T10:39:43.349+0530 7f1fe3fff640 10 librbd::image::RefreshRequest: 0x7f1fa4001170 handle_v2_get_parent: r=0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
2023-03-20T10:39:43.349+0530 7f1fe3fff640 10 librbd::image::RefreshRequest: 0x7f1fa4001170 send_v2_get_metadata                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           
2023-03-20T10:39:43.349+0530 7f1fe3fff640 15 librbd::image::GetMetadataRequest: 0x7f1fe801ef40 metadata_list: start_key=conf_                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
2023-03-20T10:39:43.349+0530 7f1fe3fff640 20 librbd::watcher::Notifier: 0x7f1fc80078c0 handle_notify: r=0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 
2023-03-20T10:39:43.349+0530 7f1fe3fff640 20 librbd::watcher::Notifier: 0x7f1fc80078c0 handle_notify: pending=0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           
2023-03-20T10:39:43.349+0530 7f1fe3fff640 20 librbd::ImageWatcher: 0x7f1fc8007800 handle_async_complete: request=[4316,140108007228656,1], r=0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
2023-03-20T10:39:43.349+0530 7f1fe3fff640 15 librbd::image::GetMetadataRequest: 0x7f1fe801ef40 handle_metadata_list: r=0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
2023-03-20T10:39:43.349+0530 7f1fe3fff640 15 librbd::image::GetMetadataRequest: 0x7f1fe801ef40 finish: r=0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                
2023-03-20T10:39:43.349+0530 7f1fe3fff640 10 librbd::image::RefreshRequest: 0x7f1fa4001170 handle_v2_get_metadata: r=0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
2023-03-20T10:39:43.349+0530 7f1fe3fff640 10 librbd::image::RefreshRequest: 0x7f1fa4001170 send_v2_get_pool_metadata                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
2023-03-20T10:39:43.349+0530 7f1fe3fff640 15 librbd::image::GetMetadataRequest: 0x556f9e8660c0 metadata_list: start_key=conf_                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
2023-03-20T10:39:43.350+0530 7f1fe3fff640 15 librbd::image::GetMetadataRequest: 0x556f9e8660c0 handle_metadata_list: r=0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
2023-03-20T10:39:43.350+0530 7f1fe3fff640 15 librbd::image::GetMetadataRequest: 0x556f9e8660c0 finish: r=0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                
2023-03-20T10:39:43.350+0530 7f1fe3fff640 10 librbd::image::RefreshRequest: 0x7f1fa4001170 handle_v2_get_pool_metadata: r=0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
2023-03-20T10:39:43.350+0530 7f1fe3fff640 20 librbd::ImageCtx: apply_metadata                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
2023-03-20T10:39:43.350+0530 7f1fe3fff640 10 librbd::image::RefreshRequest: 0x7f1fa4001170 send_v2_get_group                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
2023-03-20T10:39:43.351+0530 7f1ff88ac640 10 librbd::image::RefreshRequest: 0x7f1fa4001170 handle_v2_get_group: r=0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
2023-03-20T10:39:43.351+0530 7f1ff88ac640 10 librbd::image::RefreshRequest: 0x7f1fa4001170 send_v2_apply                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
2023-03-20T10:39:43.351+0530 7f1ff88ac640 10 librbd::image::RefreshRequest: 0x7f1fa4001170 handle_v2_apply                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                
2023-03-20T10:39:43.351+0530 7f1ff88ac640 20 librbd::image::RefreshRequest: 0x7f1fa4001170 apply                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          
2023-03-20T10:39:43.351+0530 7f1ff88ac640 10 librbd::ImageState: 0x556f9e6599a0 0x556f9e6599a0 handle_refresh: r=0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
2023-03-20T10:39:43.351+0530 7f1fc57fa640 20 librbd::ImageState: 0x556f9e856a40 ImageUpdateWatchers::handle_notify: handle=0, watcher=0x7ffc2f4f0060                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
2023-03-20T10:39:43.351+0530 7f1fc57fa640 20 librbd: info 0x556f9e808780                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
2023-03-20T10:39:43.351+0530 7f1fc57fa640 10 librbd::ImageState: 0x556f9e6599a0 0x556f9e6599a0 send_refresh_unlock                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
2023-03-20T10:39:43.351+0530 7f1fc57fa640 10 librbd::image::RefreshRequest: 0x7f1fa40092a0 send_v2_get_mutable_metadata                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
2023-03-20T10:39:43.352+0530 7f1fe3fff640 10 librbd::image::RefreshRequest: 0x7f1fa40092a0 handle_v2_get_mutable_metadata: r=0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
2023-03-20T10:39:43.352+0530 7f1fe3fff640 10 librbd::image::RefreshRequest: 0x7f1fa40092a0 send_v2_get_parent: legacy=0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
2023-03-20T10:39:43.352+0530 7f1ff88ac640 10 librbd::image::RefreshRequest: 0x7f1fa40092a0 handle_v2_get_parent: r=0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
2023-03-20T10:39:43.352+0530 7f1ff88ac640 10 librbd::image::RefreshRequest: 0x7f1fa40092a0 send_v2_get_metadata                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           
2023-03-20T10:39:43.352+0530 7f1ff88ac640 15 librbd::image::GetMetadataRequest: 0x7f1ff0046040 metadata_list: start_key=conf_                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
2023-03-20T10:39:43.353+0530 7f1fe3fff640 15 librbd::image::GetMetadataRequest: 0x7f1ff0046040 handle_metadata_list: r=0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
2023-03-20T10:39:43.353+0530 7f1fe3fff640 15 librbd::image::GetMetadataRequest: 0x7f1ff0046040 finish: r=0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                
2023-03-20T10:39:43.353+0530 7f1fe3fff640 10 librbd::image::RefreshRequest: 0x7f1fa40092a0 handle_v2_get_metadata: r=0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
2023-03-20T10:39:43.353+0530 7f1fe3fff640 10 librbd::image::RefreshRequest: 0x7f1fa40092a0 send_v2_get_pool_metadata                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
2023-03-20T10:39:43.353+0530 7f1fe3fff640 15 librbd::image::GetMetadataRequest: 0x556f9e8660c0 metadata_list: start_key=conf_                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
2023-03-20T10:39:43.353+0530 7f1ff88ac640 15 librbd::image::GetMetadataRequest: 0x556f9e8660c0 handle_metadata_list: r=0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
2023-03-20T10:39:43.353+0530 7f1ff88ac640 15 librbd::image::GetMetadataRequest: 0x556f9e8660c0 finish: r=0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                
2023-03-20T10:39:43.353+0530 7f1ff88ac640 10 librbd::image::RefreshRequest: 0x7f1fa40092a0 handle_v2_get_pool_metadata: r=0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
2023-03-20T10:39:43.353+0530 7f1ff88ac640 20 librbd::ImageCtx: apply_metadata                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
2023-03-20T10:39:43.354+0530 7f1ff88ac640 10 librbd::image::RefreshRequest: 0x7f1fa40092a0 send_v2_get_group                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
2023-03-20T10:39:43.354+0530 7f1fe3fff640 10 librbd::image::RefreshRequest: 0x7f1fa40092a0 handle_v2_get_group: r=0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
023-03-20T10:39:43.354+0530 7f1fe3fff640 10 librbd::image::RefreshRequest: 0x7f1fa40092a0 send_v2_apply                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
2023-03-20T10:39:43.354+0530 7f1fe3fff640 10 librbd::image::RefreshRequest: 0x7f1fa40092a0 handle_v2_apply                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                
2023-03-20T10:39:43.354+0530 7f1fe3fff640 20 librbd::image::RefreshRequest: 0x7f1fa40092a0 apply                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          
2023-03-20T10:39:43.354+0530 7f1fe3fff640 10 librbd::ImageState: 0x556f9e6599a0 0x556f9e6599a0 handle_refresh: r=0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
2023-03-20T10:39:43.354+0530 7f1fc57fa640 20 librbd::ImageState: 0x556f9e856a40 ImageUpdateWatchers::operator(): completing flush                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
2023-03-20T10:39:43.354+0530 7f1fc57fa640 10 librbd::ImageWatcher: 0x7f1fc8018430 C_ResponseMessage: r=0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
2023-03-20T10:39:43.354+0530 7f1fc57fa640 10 librbd::Watcher::C_NotifyAck 0x7f1fb8002400 finish: r=0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
2023-03-20T10:39:43.371+0530 7f1ff88ac640  5 librbd::Watcher: 0x7f1fc8007800 notifications_blocked: blocked=0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
2023-03-20T10:39:43.371+0530 7f1ff88ac640 10 librbd::Watcher::C_NotifyAck 0x7f1ff0000e80 C_NotifyAck: id=257698037770, handle=139774341970368                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
2023-03-20T10:39:43.371+0530 7f1ff88ac640 10 librbd::ImageWatcher: 0x7f1fc8007800 remote update_features request: [4316,140108007228656,1] 64 disabled                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
2023-03-20T10:39:43.371+0530 7f1ff88ac640 20 librbd::ExclusiveLock: 0x7f1fc0000b60 accept_request=1 (request_type=0)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
2023-03-20T10:39:43.371+0530 7f1ff88ac640 10 librbd::Watcher::C_NotifyAck 0x7f1ff0000e80 finish: r=0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
2023-03-20T10:39:43.380+0530 7f1fe3fff640  5 librbd::Watcher: 0x7f1fc8007800 notifications_blocked: blocked=0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
2023-03-20T10:39:43.380+0530 7f1fe3fff640 10 librbd::Watcher::C_NotifyAck 0x7f1fc00023f0 C_NotifyAck: id=257698037771, handle=139774341970368                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
2023-03-20T10:39:43.380+0530 7f1fe3fff640 10 librbd::ImageWatcher: 0x7f1fc8007800 remote update_features request: [4316,140108007228656,1] 64 disabled                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
2023-03-20T10:39:43.380+0530 7f1fe3fff640 20 librbd::ExclusiveLock: 0x7f1fc0000b60 accept_request=1 (request_type=0)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
2023-03-20T10:39:43.380+0530 7f1fe3fff640 10 librbd::Watcher::C_NotifyAck 0x7f1fc00023f0 finish: r=0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
2023-03-20T10:39:43.390+0530 7f1ff88ac640  5 librbd::Watcher: 0x7f1fc8007800 notifications_blocked: blocked=0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
2023-03-20T10:39:43.390+0530 7f1ff88ac640 10 librbd::Watcher::C_NotifyAck 0x7f1ff0050040 C_NotifyAck: id=261993005068, handle=139774341970368                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
2023-03-20T10:39:43.390+0530 7f1ff88ac640 10 librbd::ImageWatcher: 0x7f1fc8007800 remote update_features request: [4316,140108007228656,1] 64 disabled                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
2023-03-20T10:39:43.390+0530 7f1ff88ac640 20 librbd::ExclusiveLock: 0x7f1fc0000b60 accept_request=1 (request_type=0)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
2023-03-20T10:39:43.390+0530 7f1ff88ac640 10 librbd::Watcher::C_NotifyAck 0x7f1ff0050040 finish: r=0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
2023-03-20T10:39:43.395+0530 7f1fe3fff640  5 librbd::Watcher: 0x7f1fc8007800 notifications_blocked: blocked=0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
2023-03-20T10:39:43.395+0530 7f1fe3fff640 10 librbd::Watcher::C_NotifyAck 0x7f1fa400c1c0 C_NotifyAck: id=261993005069, handle=139774341970368                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
2023-03-20T10:39:43.395+0530 7f1fe3fff640 10 librbd::ImageWatcher: 0x7f1fc8007800 remote update_features request: [4316,140108007228656,1] 64 disabled                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
2023-03-20T10:39:43.395+0530 7f1fe3fff640 20 librbd::ExclusiveLock: 0x7f1fc0000b60 accept_request=1 (request_type=0)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
2023-03-20T10:39:43.395+0530 7f1fe3fff640 10 librbd::Watcher::C_NotifyAck 0x7f1fa400c1c0 finish: r=0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
~                                                                                                                                                                                                                                                                                                                                                                                 

Note:

  • 2023-03-20T10:39:38.345+0530 7f1ff88ac640 20 librbd::ImageState: 0x556f9e856a40 ImageUpdateWatchers::flush
  • 2023-03-20T10:39:43.346+0530 7f1fe3fff640 20 librbd::watcher::Notifier: 0x7f1fc80078c0 handle_notify: r=-110
    [...]
  • 2023-03-20T10:39:43.346+0530 7f1fc77fe640 20 librbd::ImageWatcher: 0x7f1fc8007800 remote async request finished: [4316,140108007228656,1] = -110

ImageUpdateWatchers::flush() hangs as its added to the worker queue, while it is waiting on the features update, this will simply block until the notifier timeout to trigger which then fails the entire async request with a timeout.

But since remote async request finished: on the remote side, the request will be permanently marked as complete and any new request sent with the same id will be simply ignored by the remote.

@pkalever
Copy link
Author

@idryomov I had run qa/workunits/rbd/rbd-nbd.sh with these changes, it also contains resize test and many more nbd related, it returned OK.

Here are the relevant nbd_test.log

Thanks!

m_work_queue->queue(ctx, 0);
return;
}
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trociny do you remember why the above check is added?
Please see the tracker its 100% reproducible . Always we hit "update features timed out notifying lock owner".
Thanks!

Copy link
Contributor

@trociny trociny Apr 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we just remove this block the flush will not make sense.

The flush is called when we want to make sure all in-flight notifications are complete (see ImageWatcher::handle_payload(HeaderUpdatePayload) where it is called). So if there are in-flight notification we queue the completion callback so it will be called after all previously queued notifications are processed.

So the question should actually be why it gets stuck there, and you should look at the rbd-nbd side, where it actually gets stuck. The rbd-nbd registers a watcher callback to detect image resize, see NBDWatchCtx::handle_notify. I added some debug to this callback (when it enters and returns) and running rbd-nbd with debug_rbd=30 I see:

2023-04-17T18:46:03.056+0100 7f58537fe6c0 20 librbd::DisableFeaturesRequest: 0x7f5824010bc0 handle_set_features: r=0
2023-04-17T18:46:03.056+0100 7f58537fe6c0 20 librbd::DisableFeaturesRequest: 0x7f5824010bc0 send_notify_update
2023-04-17T18:46:03.056+0100 7f58537fe6c0 20 librbd::ImageState: 0x5561f94f4580 handle_update_notification: refresh_seq = 2, last_refresh = 1
2023-04-17T18:46:03.056+0100 7f58537fe6c0 20 librbd::ImageState: 0x5561f97028a0 ImageUpdateWatchers::notify
2023-04-17T18:46:03.056+0100 7f58537fe6c0 20 librbd::ImageState: 0x5561f97028a0 ImageUpdateWatchers::send_notify: handle=0, watcher=0x7ffd9a7c4900
2023-04-17T18:46:03.056+0100 7f58537fe6c0 10 librbd::ImageWatcher: 0x7f582c008b60: notify_header_update
2023-04-17T18:46:03.056+0100 7f58537fe6c0 20 librbd::watcher::Notifier: 0x7f582c008c20 notify: pending=1
2023-04-17T18:46:03.056+0100 7f580ffff6c0 20 librbd::ImageState: 0x5561f97028a0 ImageUpdateWatchers::handle_notify: handle=0, watcher=0x7ffd9a7c4900
2023-04-17T18:46:03.056+0100 7f580ffff6c0  5 rbd-nbd: XXXMG: handle_notify: enter
2023-04-17T18:46:03.056+0100 7f580ffff6c0 20 librbd: info 0x5561f96b28f0
2023-04-17T18:46:03.056+0100 7f5852ffd6c0  5 librbd::Watcher: 0x7f582c008b60 notifications_blocked: blocked=0
2023-04-17T18:46:03.056+0100 7f5852ffd6c0 10 librbd::Watcher::C_NotifyAck 0x7f582c019470 C_NotifyAck: id=1151051236169, handle=140016537847264
2023-04-17T18:46:03.056+0100 7f5852ffd6c0 10 librbd::ImageWatcher: 0x7f582c008b60 image header updated
2023-04-17T18:46:03.056+0100 7f5852ffd6c0 20 librbd::ImageState: 0x5561f94f4580 handle_update_notification: refresh_seq = 3, last_refresh = 1
2023-04-17T18:46:03.056+0100 7f5852ffd6c0 20 librbd::ImageState: 0x5561f97028a0 ImageUpdateWatchers::notify
2023-04-17T18:46:03.056+0100 7f5852ffd6c0 20 librbd::ImageState: 0x5561f97028a0 ImageUpdateWatchers::send_notify: handle=0, watcher=0x7ffd9a7c4900
2023-04-17T18:46:03.056+0100 7f5852ffd6c0 20 librbd::ImageState: 0x5561f94f4580 flush_update_watchers
2023-04-17T18:46:03.056+0100 7f5852ffd6c0 20 librbd::ImageState: 0x5561f97028a0 ImageUpdateWatchers::flush
2023-04-17T18:46:08.056+0100 7f5852ffd6c0 20 librbd::watcher::Notifier: 0x7f582c008c20 handle_notify: r=-110
2023-04-17T18:46:08.056+0100 7f5852ffd6c0 20 librbd::watcher::Notifier: 0x7f582c008c20 handle_notify: pending=0
2023-04-17T18:46:08.056+0100 7f5852ffd6c0 20 librbd::DisableFeaturesRequest: 0x7f5824010bc0 handle_notify_update: r=-110
2023-04-17T18:46:08.056+0100 7f5852ffd6c0 20 librbd::DisableFeaturesRequest: 0x7f5824010bc0 handle_finish: r=-110
2023-04-17T18:46:08.056+0100 7f5852ffd6c0 20 librbd::ExclusiveLock: 0x7f5824001d40 unblock_requests
2023-04-17T18:46:08.056+0100 7f5852ffd6c0  5 librbd::io::WriteBlockImageDispatch: 0x5561f954f740 unblock_writes: 0x5561f96b28f0, num=0
2023-04-17T18:46:08.056+0100 7f5852ffd6c0 10 librbd::ImageState: 0x5561f94f4580 handle_prepare_lock_complete
2023-04-17T18:46:08.056+0100 7f5852ffd6c0 10 librbd::ImageState: 0x5561f94f4580 0x5561f94f4580 send_refresh_unlock
2023-04-17T18:46:08.056+0100 7f5852ffd6c0 10 librbd::image::RefreshRequest: 0x7f5804003520 send_v2_get_mutable_metadata
2023-04-17T18:46:08.056+0100 7f5852ffd6c0 10 librbd::Request: 0x7f5824010bc0 create_context_finisher
2023-04-17T18:46:08.056+0100 7f5852ffd6c0 10 librbd::Request: 0x7f5824010bc0 finish: r=-110
2023-04-17T18:46:08.056+0100 7f5852ffd6c0 20 librbd::Operations: finish_op: update features r=-110
2023-04-17T18:46:08.056+0100 7f58327fc6c0 20 librbd::ImageWatcher: 0x7f582c008b60 remote async request finished: [19699,139671329929968,1] = -110
2023-04-17T18:46:08.056+0100 7f58327fc6c0 20 librbd::watcher::Notifier: 0x7f582c008c20 notify: pending=1
2023-04-17T18:46:08.060+0100 7f58537fe6c0 10 librbd::image::RefreshRequest: 0x7f5804003520 handle_v2_get_mutable_metadata: r=0
2023-04-17T18:46:08.060+0100 7f58537fe6c0 10 librbd::image::RefreshRequest: 0x7f5804003520 send_v2_get_parent: legacy=0
2023-04-17T18:46:08.060+0100 7f5852ffd6c0  5 librbd::Watcher: 0x7f582c008b60 notifications_blocked: blocked=0
2023-04-17T18:46:08.060+0100 7f5852ffd6c0 10 librbd::Watcher::C_NotifyAck 0x7f581c0066b0 C_NotifyAck: id=1151051236170, handle=140016537847264
2023-04-17T18:46:08.060+0100 7f5852ffd6c0 20 librbd::ImageWatcher: remove_async_request: [19699,139671329929968,1]
2023-04-17T18:46:08.060+0100 7f5852ffd6c0 10 librbd::Watcher::C_NotifyAck 0x7f581c0066b0 finish: r=0
2023-04-17T18:46:08.060+0100 7f5852ffd6c0 10 librbd::image::RefreshRequest: 0x7f5804003520 handle_v2_get_parent: r=0
2023-04-17T18:46:08.060+0100 7f5852ffd6c0 10 librbd::image::RefreshRequest: 0x7f5804003520 send_v2_get_metadata
2023-04-17T18:46:08.060+0100 7f5852ffd6c0 15 librbd::image::GetMetadataRequest: 0x7f584c00a950 metadata_list: start_key=conf_
2023-04-17T18:46:08.060+0100 7f5852ffd6c0 20 librbd::watcher::Notifier: 0x7f582c008c20 handle_notify: r=0
2023-04-17T18:46:08.060+0100 7f5852ffd6c0 20 librbd::watcher::Notifier: 0x7f582c008c20 handle_notify: pending=0
2023-04-17T18:46:08.060+0100 7f5852ffd6c0 20 librbd::ImageWatcher: 0x7f582c008b60 handle_async_complete: request=[19699,139671329929968,1], r=0
2023-04-17T18:46:08.060+0100 7f58537fe6c0 15 librbd::image::GetMetadataRequest: 0x7f584c00a950 handle_metadata_list: r=0
2023-04-17T18:46:08.060+0100 7f58537fe6c0 15 librbd::image::GetMetadataRequest: 0x7f584c00a950 finish: r=0
2023-04-17T18:46:08.060+0100 7f58537fe6c0 10 librbd::image::RefreshRequest: 0x7f5804003520 handle_v2_get_metadata: r=0
2023-04-17T18:46:08.060+0100 7f58537fe6c0 10 librbd::image::RefreshRequest: 0x7f5804003520 send_v2_get_pool_metadata
2023-04-17T18:46:08.060+0100 7f58537fe6c0 15 librbd::image::GetMetadataRequest: 0x7f584c00af40 metadata_list: start_key=conf_
2023-04-17T18:46:08.064+0100 7f5852ffd6c0 15 librbd::image::GetMetadataRequest: 0x7f584c00af40 handle_metadata_list: r=0
2023-04-17T18:46:08.064+0100 7f5852ffd6c0 15 librbd::image::GetMetadataRequest: 0x7f584c00af40 finish: r=0
2023-04-17T18:46:08.064+0100 7f5852ffd6c0 10 librbd::image::RefreshRequest: 0x7f5804003520 handle_v2_get_pool_metadata: r=0
2023-04-17T18:46:08.064+0100 7f5852ffd6c0 20 librbd::ImageCtx: apply_metadata
2023-04-17T18:46:08.064+0100 7f5852ffd6c0 10 librbd::image::RefreshRequest: 0x7f5804003520 send_v2_get_group
2023-04-17T18:46:08.064+0100 7f58537fe6c0 10 librbd::image::RefreshRequest: 0x7f5804003520 handle_v2_get_group: r=0
2023-04-17T18:46:08.064+0100 7f58537fe6c0 10 librbd::image::RefreshRequest: 0x7f5804003520 send_v2_apply
2023-04-17T18:46:08.064+0100 7f58537fe6c0 10 librbd::image::RefreshRequest: 0x7f5804003520 handle_v2_apply
2023-04-17T18:46:08.064+0100 7f58537fe6c0 20 librbd::image::RefreshRequest: 0x7f5804003520 apply
2023-04-17T18:46:08.064+0100 7f58537fe6c0 10 librbd::ImageState: 0x5561f94f4580 0x5561f94f4580 handle_refresh: r=0
2023-04-17T18:46:08.064+0100 7f580ffff6c0  5 rbd-nbd: XXXMG: handle_notify: finish
2023-04-17T18:46:08.064+0100 7f580ffff6c0 20 librbd::ImageState: 0x5561f97028a0 ImageUpdateWatchers::handle_notify: handle=0, watcher=0x7ffd9a7c4900
2023-04-17T18:46:08.064+0100 7f580ffff6c0  5 rbd-nbd: XXXMG: handle_notify: enter

Note, where rbd-nbd: XXXMG: handle_notify: enter is printed and where rbd-nbd: XXXMG: handle_notify: finish is printed. The handle_notify calls image info method, which calls refresh_if_required and it got stuck there.

The problem seems that we fire ImageUpdateWatchers notifications twice. Note, DisableFeaturesRequest calls send_notify_update, which actually calls ImageCtx::notify_update, which is currently:

  void ImageCtx::notify_update() {
    state->handle_update_notification();
    ImageWatcher<>::notify_header_update(md_ctx, header_oid);
  }

The state->handle_update_notification() will fire ImageUpdateWatchers notifications (the rbd-nbd handle_notify is called) but does not wait it for complete and then it calls ImageWatcher<>::notify_header_update, which will send the "header update" notification for all image watchers, including itself, so it will receive it itself and will fire ImageUpdateWatchers the second time interfering with the first call.

The question is do we really need state->handle_update_notification() in ImageCtx::notify_update()? If we are going to receive our own notification anyway, may be we may just drop it.

If we really need it by some reason (not evident me right now) then another approach would be to modify it not to just fire the ImageUpdateWatchers notification but actually wait for it to complete before proceeding with ImageWatcher<>::notify_header_update. This will complicate the code a little, as instead of void ImageCtx::notify_update() we will have to use ImageCtx::notify_update(Context *on_finish).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for the detailed answer and logs @trociny :-) That really helps.

The question is do we really need state->handle_update_notification() in ImageCtx::notify_update()? If we are going to receive our own notification anyway, may be we may just drop it.

This sounds really reasonable to drop it as we seem to be getting our own notification anyway.

@idryomov any comments ? can we go ahead and make the changes to remove it from both void ImageCtx::notify_update() and ImageCtx::notify_update(Context *on_finish) or do you like to keep it as it is and think we should wait for handle_update_notification() to complete (atleast this was not the case in other places where notify_header_update() is called).

Copy link
Author

@pkalever pkalever Apr 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trociny I tried both the solutions you suggested, and it didn't solve the infinite loop issue:

2023-04-18T21:18:45.693+0530 7f0a450da640 20 librbd::ImageState: 0x55a75257c6b0 ImageUpdateWatchers::send_notify: handle=0, watcher=0x7ffc3a442d90
2023-04-18T21:18:45.693+0530 7f0a450da640 20 librbd::ImageState: 0x55a7523839a0 flush_update_watchers
2023-04-18T21:18:45.693+0530 7f0a450da640 20 librbd::ImageState: 0x55a75257c6b0 ImageUpdateWatchers::flush
2023-04-18T21:18:45.694+0530 7f0a1dffb640 20 librbd::ImageState: 0x55a75257c6b0 ImageUpdateWatchers::handle_notify: handle=0, watcher=0x7ffc3a442d90
2023-04-18T21:18:45.694+0530 7f0a1dffb640  5 rbd-nbd: handle_notify Prasanna: enter
2023-04-18T21:18:45.694+0530 7f0a1dffb640 20 librbd: info 0x55a75252e2f0
2023-04-18T21:18:50.695+0530 7f0a448d9640 20 librbd::watcher::Notifier: 0x7f0a10007900 handle_notify: r=-110
2023-04-18T21:18:50.695+0530 7f0a448d9640 20 librbd::watcher::Notifier: 0x7f0a10007900 handle_notify: pending=0
2023-04-18T21:18:50.695+0530 7f0a448d9640 20 librbd::DisableFeaturesRequest: 0x7f0a340339d0 handle_notify_update: r=-110
2023-04-18T21:18:50.695+0530 7f0a448d9640 20 librbd::DisableFeaturesRequest: 0x7f0a340339d0 handle_finish: r=-110
2023-04-18T21:18:50.695+0530 7f0a448d9640 20 librbd::ExclusiveLock: 0x7f0a08000b60 unblock_requests
2023-04-18T21:18:50.695+0530 7f0a448d9640  5 librbd::io::WriteBlockImageDispatch: 0x55a7523dd750 unblock_writes: 0x55a75252e2f0, num=0
2023-04-18T21:18:50.695+0530 7f0a448d9640 10 librbd::ImageState: 0x55a7523839a0 handle_prepare_lock_complete
2023-04-18T21:18:50.695+0530 7f0a448d9640 10 librbd::ImageState: 0x55a7523839a0 0x55a7523839a0 send_refresh_unlock
2023-04-18T21:18:50.695+0530 7f0a448d9640 10 librbd::image::RefreshRequest: 0x7f0a1000dd60 send_v2_get_mutable_metadata
2023-04-18T21:18:50.695+0530 7f0a448d9640 10 librbd::Request: 0x7f0a340339d0 create_context_finisher
2023-04-18T21:18:50.695+0530 7f0a448d9640 10 librbd::Request: 0x7f0a340339d0 finish: r=-110
2023-04-18T21:18:50.695+0530 7f0a448d9640 20 librbd::Operations: finish_op: update features r=-110
2023-04-18T21:18:50.695+0530 7f0a1ffff640 20 librbd::ImageWatcher: 0x7f0a10007840 remote async request finished: [4305,140018014309120,1] = -110
2023-04-18T21:18:50.695+0530 7f0a1ffff640 20 librbd::watcher::Notifier: 0x7f0a10007900 notify: pending=1
2023-04-18T21:18:50.696+0530 7f0a450da640 10 librbd::image::RefreshRequest: 0x7f0a1000dd60 handle_v2_get_mutable_metadata: r=0
2023-04-18T21:18:50.696+0530 7f0a450da640 10 librbd::image::RefreshRequest: 0x7f0a1000dd60 send_v2_get_parent: legacy=0
2023-04-18T21:18:50.696+0530 7f0a448d9640  5 librbd::Watcher: 0x7f0a10007840 notifications_blocked: blocked=0
2023-04-18T21:18:50.696+0530 7f0a448d9640 10 librbd::Watcher::C_NotifyAck 0x7f0a04001c70 C_NotifyAck: id=244813135878, handle=139681060649408
2023-04-18T21:18:50.696+0530 7f0a448d9640 20 librbd::ImageWatcher: remove_async_request: [4305,140018014309120,1]
2023-04-18T21:18:50.696+0530 7f0a448d9640 10 librbd::Watcher::C_NotifyAck 0x7f0a04001c70 finish: r=0
2023-04-18T21:18:50.697+0530 7f0a450da640 10 librbd::image::RefreshRequest: 0x7f0a1000dd60 handle_v2_get_parent: r=0
2023-04-18T21:18:50.697+0530 7f0a450da640 10 librbd::image::RefreshRequest: 0x7f0a1000dd60 send_v2_get_metadata
2023-04-18T21:18:50.697+0530 7f0a450da640 15 librbd::image::GetMetadataRequest: 0x7f0a3801cf40 metadata_list: start_key=conf_
2023-04-18T21:18:50.697+0530 7f0a448d9640 20 librbd::watcher::Notifier: 0x7f0a10007900 handle_notify: r=0
2023-04-18T21:18:50.697+0530 7f0a448d9640 20 librbd::watcher::Notifier: 0x7f0a10007900 handle_notify: pending=0
2023-04-18T21:18:50.697+0530 7f0a448d9640 20 librbd::ImageWatcher: 0x7f0a10007840 handle_async_complete: request=[4305,140018014309120,1], r=0
2023-04-18T21:18:50.698+0530 7f0a450da640 15 librbd::image::GetMetadataRequest: 0x7f0a3801cf40 handle_metadata_list: r=0
2023-04-18T21:18:50.698+0530 7f0a450da640 15 librbd::image::GetMetadataRequest: 0x7f0a3801cf40 finish: r=0
2023-04-18T21:18:50.698+0530 7f0a450da640 10 librbd::image::RefreshRequest: 0x7f0a1000dd60 handle_v2_get_metadata: r=0
2023-04-18T21:18:50.698+0530 7f0a450da640 10 librbd::image::RefreshRequest: 0x7f0a1000dd60 send_v2_get_pool_metadata
2023-04-18T21:18:50.698+0530 7f0a450da640 15 librbd::image::GetMetadataRequest: 0x55a75258c050 metadata_list: start_key=conf_
2023-04-18T21:18:50.699+0530 7f0a448d9640 15 librbd::image::GetMetadataRequest: 0x55a75258c050 handle_metadata_list: r=0
2023-04-18T21:18:50.699+0530 7f0a448d9640 15 librbd::image::GetMetadataRequest: 0x55a75258c050 finish: r=0
2023-04-18T21:18:50.699+0530 7f0a448d9640 10 librbd::image::RefreshRequest: 0x7f0a1000dd60 handle_v2_get_pool_metadata: r=0
2023-04-18T21:18:50.699+0530 7f0a448d9640 20 librbd::ImageCtx: apply_metadata
2023-04-18T21:18:50.699+0530 7f0a448d9640 10 librbd::image::RefreshRequest: 0x7f0a1000dd60 send_v2_get_group
2023-04-18T21:18:50.700+0530 7f0a450da640 10 librbd::image::RefreshRequest: 0x7f0a1000dd60 handle_v2_get_group: r=0
2023-04-18T21:18:50.700+0530 7f0a450da640 10 librbd::image::RefreshRequest: 0x7f0a1000dd60 send_v2_apply
2023-04-18T21:18:50.700+0530 7f0a450da640 10 librbd::image::RefreshRequest: 0x7f0a1000dd60 handle_v2_apply
2023-04-18T21:18:50.700+0530 7f0a450da640 20 librbd::image::RefreshRequest: 0x7f0a1000dd60 apply
2023-04-18T21:18:50.700+0530 7f0a450da640 10 librbd::ImageState: 0x55a7523839a0 0x55a7523839a0 handle_refresh: r=0
2023-04-18T21:18:50.700+0530 7f0a1dffb640  5 rbd-nbd: handle_notify Prasanna: finish
2023-04-18T21:18:50.700+0530 7f0a1dffb640 20 librbd::ImageState: 0x55a75257c6b0 ImageUpdateWatchers::operator(): completing flush
2023-04-18T21:18:50.700+0530 7f0a1dffb640 10 librbd::ImageWatcher: 0x7f0a1000fae0 C_ResponseMessage: r=0
2023-04-18T21:18:50.700+0530 7f0a1dffb640 10 librbd::Watcher::C_NotifyAck 0x7f0a34001fc0 finish: r=0
2023-04-18T21:18:50.708+0530 7f0a448d9640  5 librbd::Watcher: 0x7f0a10007840 notifications_blocked: blocked=0
2023-04-18T21:18:50.708+0530 7f0a448d9640 10 librbd::Watcher::C_NotifyAck 0x7f0a34009b90 C_NotifyAck: id=244813135879, handle=139681060649408
2023-04-18T21:18:50.708+0530 7f0a448d9640 10 librbd::ImageWatcher: 0x7f0a10007840 remote update_features request: [4305,140018014309120,1] 64 disabled
2023-04-18T21:18:50.708+0530 7f0a448d9640 20 librbd::ExclusiveLock: 0x7f0a08000b60 accept_request=1 (request_type=0)
2023-04-18T21:18:50.708+0530 7f0a448d9640 10 librbd::Watcher::C_NotifyAck 0x7f0a34009b90 finish: r=0
2023-04-18T21:18:50.715+0530 7f0a450da640  5 librbd::Watcher: 0x7f0a10007840 notifications_blocked: blocked=0
2023-04-18T21:18:50.715+0530 7f0a450da640 10 librbd::Watcher::C_NotifyAck 0x7f09f4004270 C_NotifyAck: id=244813135880, handle=139681060649408
2023-04-18T21:18:50.715+0530 7f0a450da640 10 librbd::ImageWatcher: 0x7f0a10007840 remote update_features request: [4305,140018014309120,1] 64 disabled
2023-04-18T21:18:50.715+0530 7f0a450da640 20 librbd::ExclusiveLock: 0x7f0a08000b60 accept_request=1 (request_type=0)
2023-04-18T21:18:50.715+0530 7f0a450da640 10 librbd::Watcher::C_NotifyAck 0x7f09f4004270 finish: r=0
2023-04-18T21:18:50.767+0530 7f0a448d9640  5 librbd::Watcher: 0x7f0a10007840 notifications_blocked: blocked=0
2023-04-18T21:18:50.767+0530 7f0a448d9640 10 librbd::Watcher::C_NotifyAck 0x7f0a34002010 C_NotifyAck: id=244813135881, handle=139681060649408
2023-04-18T21:18:50.767+0530 7f0a448d9640 10 librbd::ImageWatcher: 0x7f0a10007840 remote update_features request: [4305,140018014309120,1] 64 disabled
2023-04-18T21:18:50.767+0530 7f0a448d9640 20 librbd::ExclusiveLock: 0x7f0a08000b60 accept_request=1 (request_type=0)
2023-04-18T21:18:50.767+0530 7f0a448d9640 10 librbd::Watcher::C_NotifyAck 0x7f0a34002010 finish: r=0
2023-04-18T21:18:50.773+0530 7f0a450da640  5 librbd::Watcher: 0x7f0a10007840 notifications_blocked: blocked=0
2023-04-18T21:18:50.773+0530 7f0a450da640 10 librbd::Watcher::C_NotifyAck 0x7f0a04001c70 C_NotifyAck: id=244813135882, handle=139681060649408
2023-04-18T21:18:50.773+0530 7f0a450da640 10 librbd::ImageWatcher: 0x7f0a10007840 remote update_features request: [4305,140018014309120,1] 64 disabled
2023-04-18T21:18:50.773+0530 7f0a450da640 20 librbd::ExclusiveLock: 0x7f0a08000b60 accept_request=1 (request_type=0)
2023-04-18T21:18:50.773+0530 7f0a450da640 10 librbd::Watcher::C_NotifyAck 0x7f0a04001c70 finish: r=0
2023-04-18T21:18:50.779+0530 7f0a448d9640  5 librbd::Watcher: 0x7f0a10007840 notifications_blocked: blocked=0
2023-04-18T21:18:50.779+0530 7f0a448d9640 10 librbd::Watcher::C_NotifyAck 0x7f09f000d390 C_NotifyAck: id=244813135883, handle=139681060649408
2023-04-18T21:18:50.779+0530 7f0a448d9640 10 librbd::ImageWatcher: 0x7f0a10007840 remote update_features request: [4305,140018014309120,1] 64 disabled
2023-04-18T21:18:50.779+0530 7f0a448d9640 20 librbd::ExclusiveLock: 0x7f0a08000b60 accept_request=1 (request_type=0)
2023-04-18T21:18:50.779+0530 7f0a448d9640 10 librbd::Watcher::C_NotifyAck 0x7f09f000d390 finish: r=0
2023-04-18T21:18:50.784+0530 7f0a450da640  5 librbd::Watcher: 0x7f0a10007840 notifications_blocked: blocked=0
2023-04-18T21:18:50.784+0530 7f0a450da640 10 librbd::Watcher::C_NotifyAck 0x7f0a34009b90 C_NotifyAck: id=244813135884, handle=139681060649408
2023-04-18T21:18:50.784+0530 7f0a450da640 10 librbd::ImageWatcher: 0x7f0a10007840 remote update_features request: [4305,140018014309120,1] 64 disabled
2023-04-18T21:18:50.784+0530 7f0a450da640 20 librbd::ExclusiveLock: 0x7f0a08000b60 accept_request=1 (request_type=0)
2023-04-18T21:18:50.784+0530 7f0a450da640 10 librbd::Watcher::C_NotifyAck 0x7f0a34009b90 finish: r=0
2023-04-18T21:18:50.790+0530 7f0a448d9640  5 librbd::Watcher: 0x7f0a10007840 notifications_blocked: blocked=0
2023-04-18T21:18:50.791+0530 7f0a448d9640 10 librbd::Watcher::C_NotifyAck 0x7f09f4004270 C_NotifyAck: id=244813135885, handle=139681060649408
2023-04-18T21:18:50.791+0530 7f0a448d9640 10 librbd::ImageWatcher: 0x7f0a10007840 remote update_features request: [4305,140018014309120,1] 64 disabled
2023-04-18T21:18:50.791+0530 7f0a448d9640 20 librbd::ExclusiveLock: 0x7f0a08000b60 accept_request=1 (request_type=0)
2023-04-18T21:18:50.791+0530 7f0a448d9640 10 librbd::Watcher::C_NotifyAck 0x7f09f4004270 finish: r=0
2023-04-18T21:18:50.797+0530 7f0a450da640  5 librbd::Watcher: 0x7f0a10007840 notifications_blocked: blocked=0
2023-04-18T21:18:50.797+0530 7f0a450da640 10 librbd::Watcher::C_NotifyAck 0x7f0a34002010 C_NotifyAck: id=244813135886, handle=139681060649408
2023-04-18T21:18:50.797+0530 7f0a450da640 10 librbd::ImageWatcher: 0x7f0a10007840 remote update_features request: [4305,140018014309120,1] 64 disabled
2023-04-18T21:18:50.797+0530 7f0a450da640 20 librbd::ExclusiveLock: 0x7f0a08000b60 accept_request=1 (request_type=0)
2023-04-18T21:18:50.797+0530 7f0a450da640 10 librbd::Watcher::C_NotifyAck 0x7f0a34002010 finish: r=0
2023-04-18T21:18:50.804+0530 7f0a448d9640  5 librbd::Watcher: 0x7f0a10007840 notifications_blocked: blocked=0
2023-04-18T21:18:50.804+0530 7f0a448d9640 10 librbd::Watcher::C_NotifyAck 0x7f0a04001c70 C_NotifyAck: id=244813135887, handle=139681060649408
2023-04-18T21:18:50.804+0530 7f0a448d9640 10 librbd::ImageWatcher: 0x7f0a10007840 remote update_features request: [4305,140018014309120,1] 64 disabled
2023-04-18T21:18:50.804+0530 7f0a448d9640 20 librbd::ExclusiveLock: 0x7f0a08000b60 accept_request=1 (request_type=0)
2023-04-18T21:18:50.804+0530 7f0a448d9640 10 librbd::Watcher::C_NotifyAck 0x7f0a04001c70 finish: r=0
2023-04-18T21:18:50.810+0530 7f0a450da640  5 librbd::Watcher: 0x7f0a10007840 notifications_blocked: blocked=0
2023-04-18T21:18:50.810+0530 7f0a450da640 10 librbd::Watcher::C_NotifyAck 0x7f09f000d390 C_NotifyAck: id=244813135888, handle=139681060649408
2023-04-18T21:18:50.810+0530 7f0a450da640 10 librbd::ImageWatcher: 0x7f0a10007840 remote update_features request: [4305,140018014309120,1] 64 disabled
2023-04-18T21:18:50.810+0530 7f0a450da640 20 librbd::ExclusiveLock: 0x7f0a08000b60 accept_request=1 (request_type=0)
2023-04-18T21:18:50.810+0530 7f0a450da640 10 librbd::Watcher::C_NotifyAck 0x7f09f000d390 finish: r=0
2023-04-18T21:18:50.817+0530 7f0a448d9640  5 librbd::Watcher: 0x7f0a10007840 notifications_blocked: blocked=0
2023-04-18T21:18:50.817+0530 7f0a448d9640 10 librbd::Watcher::C_NotifyAck 0x7f0a34009b90 C_NotifyAck: id=244813135889, handle=139681060649408
2023-04-18T21:18:50.817+0530 7f0a448d9640 10 librbd::ImageWatcher: 0x7f0a10007840 remote update_features request: [4305,140018014309120,1] 64 disabled
2023-04-18T21:18:50.817+0530 7f0a448d9640 20 librbd::ExclusiveLock: 0x7f0a08000b60 accept_request=1 (request_type=0)
2023-04-18T21:18:50.817+0530 7f0a448d9640 10 librbd::Watcher::C_NotifyAck 0x7f0a34009b90 finish: r=0
2023-04-18T21:18:50.824+0530 7f0a450da640  5 librbd::Watcher: 0x7f0a10007840 notifications_blocked: blocked=0
2023-04-18T21:18:50.824+0530 7f0a450da640 10 librbd::Watcher::C_NotifyAck 0x7f09f4004270 C_NotifyAck: id=244813135890, handle=139681060649408
2023-04-18T21:18:50.824+0530 7f0a450da640 10 librbd::ImageWatcher: 0x7f0a10007840 remote update_features request: [4305,140018014309120,1] 64 disabled
2023-04-18T21:18:50.824+0530 7f0a450da640 20 librbd::ExclusiveLock: 0x7f0a08000b60 accept_request=1 (request_type=0)
2023-04-18T21:18:50.824+0530 7f0a450da640 10 librbd::Watcher::C_NotifyAck 0x7f09f4004270 finish: r=0
2023-04-18T21:18:50.831+0530 7f0a448d9640  5 librbd::Watcher: 0x7f0a10007840 notifications_blocked: blocked=0
2023-04-18T21:18:50.831+0530 7f0a448d9640 10 librbd::Watcher::C_NotifyAck 0x7f0a34002010 C_NotifyAck: id=244813135891, handle=139681060649408
2023-04-18T21:18:50.831+0530 7f0a448d9640 10 librbd::ImageWatcher: 0x7f0a10007840 remote update_features request: [4305,140018014309120,1] 64 disabled
2023-04-18T21:18:50.831+0530 7f0a448d9640 20 librbd::ExclusiveLock: 0x7f0a08000b60 accept_request=1 (request_type=0)
2023-04-18T21:18:50.831+0530 7f0a448d9640 10 librbd::Watcher::C_NotifyAck 0x7f0a34002010 finish: r=0
2023-04-18T21:18:50.837+0530 7f0a450da640  5 librbd::Watcher: 0x7f0a10007840 notifications_blocked: blocked=0
2023-04-18T21:18:50.837+0530 7f0a450da640 10 librbd::Watcher::C_NotifyAck 0x7f0a04001c70 C_NotifyAck: id=244813135892, handle=139681060649408
2023-04-18T21:18:50.837+0530 7f0a450da640 10 librbd::ImageWatcher: 0x7f0a10007840 remote update_features request: [4305,140018014309120,1] 64 disabled
2023-04-18T21:18:50.837+0530 7f0a450da640 20 librbd::ExclusiveLock: 0x7f0a08000b60 accept_request=1 (request_type=0)
2023-04-18T21:18:50.837+0530 7f0a450da640 10 librbd::Watcher::C_NotifyAck 0x7f0a04001c70 finish: r=0
2023-04-18T21:18:50.845+0530 7f0a448d9640  5 librbd::Watcher: 0x7f0a10007840 notifications_blocked: blocked=0
2023-04-18T21:18:50.845+0530 7f0a448d9640 10 librbd::Watcher::C_NotifyAck 0x7f09f000d390 C_NotifyAck: id=244813135893, handle=139681060649408
2023-04-18T21:18:50.845+0530 7f0a448d9640 10 librbd::ImageWatcher: 0x7f0a10007840 remote update_features request: [4305,140018014309120,1] 64 disabled
2023-04-18T21:18:50.845+0530 7f0a448d9640 20 librbd::ExclusiveLock: 0x7f0a08000b60 accept_request=1 (request_type=0)
2023-04-18T21:18:50.845+0530 7f0a448d9640 10 librbd::Watcher::C_NotifyAck 0x7f09f000d390 finish: r=0
2023-04-18T21:18:50.851+0530 7f0a450da640  5 librbd::Watcher: 0x7f0a10007840 notifications_blocked: blocked=0
2023-04-18T21:18:50.851+0530 7f0a450da640 10 librbd::Watcher::C_NotifyAck 0x7f0a34009b90 C_NotifyAck: id=244813135894, handle=139681060649408
2023-04-18T21:18:50.851+0530 7f0a450da640 10 librbd::ImageWatcher: 0x7f0a10007840 remote update_features request: [4305,140018014309120,1] 64 disabled
2023-04-18T21:18:50.851+0530 7f0a450da640 20 librbd::ExclusiveLock: 0x7f0a08000b60 accept_request=1 (request_type=0)
2023-04-18T21:18:50.851+0530 7f0a450da640 10 librbd::Watcher::C_NotifyAck 0x7f0a34009b90 finish: r=0
2023-04-18T21:18:50.858+0530 7f0a448d9640  5 librbd::Watcher: 0x7f0a10007840 notifications_blocked: blocked=0
2023-04-18T21:18:50.858+0530 7f0a448d9640 10 librbd::Watcher::C_NotifyAck 0x7f09f4004270 C_NotifyAck: id=244813135895, handle=139681060649408
2023-04-18T21:18:50.858+0530 7f0a448d9640 10 librbd::ImageWatcher: 0x7f0a10007840 remote update_features request: [4305,140018014309120,1] 64 disabled
2023-04-18T21:18:50.858+0530 7f0a448d9640 20 librbd::ExclusiveLock: 0x7f0a08000b60 accept_request=1 (request_type=0)
2023-04-18T21:18:50.858+0530 7f0a448d9640 10 librbd::Watcher::C_NotifyAck 0x7f09f4004270 finish: r=0
2023-04-18T21:18:50.864+0530 7f0a450da640  5 librbd::Watcher: 0x7f0a10007840 notifications_blocked: blocked=0
2023-04-18T21:18:50.864+0530 7f0a450da640 10 librbd::Watcher::C_NotifyAck 0x7f0a34002010 C_NotifyAck: id=244813135896, handle=139681060649408
2023-04-18T21:18:50.864+0530 7f0a450da640 10 librbd::ImageWatcher: 0x7f0a10007840 remote update_features request: [4305,140018014309120,1] 64 disabled
2023-04-18T21:18:50.864+0530 7f0a450da640 20 librbd::ExclusiveLock: 0x7f0a08000b60 accept_request=1 (request_type=0)
2023-04-18T21:18:50.864+0530 7f0a450da640 10 librbd::Watcher::C_NotifyAck 0x7f0a34002010 finish: r=0
2023-04-18T21:18:50.870+0530 7f0a448d9640  5 librbd::Watcher: 0x7f0a10007840 notifications_blocked: blocked=0
2023-04-18T21:18:50.870+0530 7f0a448d9640 10 librbd::Watcher::C_NotifyAck 0x7f0a04001c70 C_NotifyAck: id=244813135897, handle=139681060649408
2023-04-18T21:18:50.870+0530 7f0a448d9640 10 librbd::ImageWatcher: 0x7f0a10007840 remote update_features request: [4305,140018014309120,1] 64 disabled
2023-04-18T21:18:50.870+0530 7f0a448d9640 20 librbd::ExclusiveLock: 0x7f0a08000b60 accept_request=1 (request_type=0)
2023-04-18T21:18:50.870+0530 7f0a448d9640 10 librbd::Watcher::C_NotifyAck 0x7f0a04001c70 finish: r=0
2023-04-18T21:18:50.876+0530 7f0a450da640  5 librbd::Watcher: 0x7f0a10007840 notifications_blocked: blocked=0
2023-04-18T21:18:50.876+0530 7f0a450da640 10 librbd::Watcher::C_NotifyAck 0x7f09f000d390 C_NotifyAck: id=244813135898, handle=139681060649408
2023-04-18T21:18:50.876+0530 7f0a450da640 10 librbd::ImageWatcher: 0x7f0a10007840 remote update_features request: [4305,140018014309120,1] 64 disabled
2023-04-18T21:18:50.876+0530 7f0a450da640 20 librbd::ExclusiveLock: 0x7f0a08000b60 accept_request=1 (request_type=0)
2023-04-18T21:18:50.876+0530 7f0a450da640 10 librbd::Watcher::C_NotifyAck 0x7f09f000d390 finish: r=0
2023-04-18T21:18:50.904+0530 7f0a448d9640  5 librbd::Watcher: 0x7f0a10007840 notifications_blocked: blocked=0
2023-04-18T21:18:50.904+0530 7f0a448d9640 10 librbd::Watcher::C_NotifyAck 0x7f0a34009b90 C_NotifyAck: id=244813135899, handle=139681060649408
2023-04-18T21:18:50.904+0530 7f0a448d9640 10 librbd::ImageWatcher: 0x7f0a10007840 remote update_features request: [4305,140018014309120,1] 64 disabled
2023-04-18T21:18:50.904+0530 7f0a448d9640 20 librbd::ExclusiveLock: 0x7f0a08000b60 accept_request=1 (request_type=0)
2023-04-18T21:18:50.904+0530 7f0a448d9640 10 librbd::Watcher::C_NotifyAck 0x7f0a34009b90 finish: r=0
2023-04-18T21:18:50.910+0530 7f0a450da640  5 librbd::Watcher: 0x7f0a10007840 notifications_blocked: blocked=0
2023-04-18T21:18:50.910+0530 7f0a450da640 10 librbd::Watcher::C_NotifyAck 0x7f09f4004270 C_NotifyAck: id=244813135900, handle=139681060649408

The rbd-nbd notify_handle() is not done in 5 seconds notify timeout.

[pkalever@localhost build]$ git diff ../src/librbd/ImageCtx.cc
diff --git a/src/librbd/ImageCtx.cc b/src/librbd/ImageCtx.cc
index b990d8adc76..c4d4fe45385 100644
--- a/src/librbd/ImageCtx.cc
+++ b/src/librbd/ImageCtx.cc
@@ -897,12 +897,10 @@ librados::IoCtx duplicate_io_ctx(librados::IoCtx& io_ctx) {
   }
 
   void ImageCtx::notify_update() {
-    state->handle_update_notification();
     ImageWatcher<>::notify_header_update(md_ctx, header_oid);
   }
 
   void ImageCtx::notify_update(Context *on_finish) {
-    state->handle_update_notification();
     image_watcher->notify_header_update(on_finish);
   }
 

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@idryomov any comments ? can we go ahead and make the changes to remove it from both void ImageCtx::notify_update() and ImageCtx::notify_update(Context *on_finish)

I don't think that would help. The issue that @trociny shed some light on is not that there are two update notifications but that the update callback in rbd-nbd gets stuck. It appears to be getting stuck in ImageState::refresh_if_required() and that is likely because DisableFeaturesRequest issues update notifications while still holding onto the exclusive lock with everything that has to do with it blocked. Notice how refresh only starts executing after DisableFeaturesRequest gets through handle_prepare_lock_complete() in handle_finish:

2023-04-17T18:46:08.056+0100 7f5852ffd6c0 20 librbd::DisableFeaturesRequest: 0x7f5824010bc0 handle_finish: r=-110
2023-04-17T18:46:08.056+0100 7f5852ffd6c0 20 librbd::ExclusiveLock: 0x7f5824001d40 unblock_requests
2023-04-17T18:46:08.056+0100 7f5852ffd6c0  5 librbd::io::WriteBlockImageDispatch: 0x5561f954f740 unblock_writes: 0x5561f96b28f0, num=0
2023-04-17T18:46:08.056+0100 7f5852ffd6c0 10 librbd::ImageState: 0x5561f94f4580 handle_prepare_lock_complete
2023-04-17T18:46:08.056+0100 7f5852ffd6c0 10 librbd::ImageState: 0x5561f94f4580 0x5561f94f4580 send_refresh_unlock

I think the same hang would happen even with a single "header update" notification because, as part of processing it, ImageWatcher would still want to flush update callbacks:

template <typename I>
bool ImageWatcher<I>::handle_payload(const HeaderUpdatePayload &payload,
			             C_NotifyAck *ack_ctx) {
  ldout(m_image_ctx.cct, 10) << this << " image header updated" << dendl;

  m_image_ctx.state->handle_update_notification();
  m_image_ctx.perfcounter->inc(l_librbd_notify);
  if (ack_ctx != nullptr) {
    m_image_ctx.state->flush_update_watchers(new C_ResponseMessage(ack_ctx));
    return false;
  }
  return true;
}

So the fix for this "where ETIMEDOUT error is coming from" issue would probably be to see if it's possible to change DisableFeaturesRequest to lift all blocks that it puts in place before it issues any update notifications.

But there is a another, let's call it "where the busy loop on rbd feature disable side is coming from", issue though. ETIMEDOUT error is not that big of deal, it could occur naturally if the update callback just did something that took a while. We can't really control what the user executes there, and, in general, have to be resilient to various things timing out. In my view this second issue is much more serious.

I suspect that it has to do with the fact async request processing code "remembers" the error code and then instead of figuring out that the request is in fact done (feature is disabled) on a retry, rbd feature disable gets ETIMEDOUT again.

Copy link
Contributor

@idryomov idryomov Apr 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. RPC timeout erros
  2. Actual operation errors (timeout or anything else)
  3. Notifiers timeout (say the actual operation is in the notifier pending queue and is timeout because the previous ones didn't finish on time)

How should we differentiate them?

I don't think (1) and (3) need to be distinguished. As for how, this is for you to propose after examining the on-wire protocol and the code on both sides. I suspect that the issue boils down to only one 32-bit integer getting communicated which causes the different errors to collide.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DisableFeaturesRequest calls ImageCtx::notify_update, which first calls state->handle_update_notification (where the nbd callback is fired the first time and gets stuck) and not waiting for the result it calls image_watcher->notify_header_update (we receive our own notification, the the nbd callback is fired the second time and gets stuck), and the notification machinery returns ETIMEDOUT here. I don't see how an additional (distinguishable) error code will be useful here (and where it should be set). I suppose what DisableFeaturesReques could do is to check the return value of ImageCtx::notify_update (in DisableFeaturesRequest<I>::handle_notify_update) and if it is ETIMEDOUT then reset it to 0, which would mean "the notification might have got stuck but I don't care much".

And I still think it would be useful to get rid off that dup callback, i.e. remove state->handle_update_notification() form ImageCtx::notify_update, so only one callback is fired when HeaderUpdatePayload is received.

And then, if want to go further, we may add async_request_id to HeaderUpdatePayload (i.e. make header update notifications "async"), then it would not return ETIMEDOUT after 5 sec watcher timeout but would wait for the notification to complete as other "async" requests, like "snap create".

I am not sure if we want to go so far though, i.e. should we care so much for notification to be answered by all potential watchers, which might be misbehaving. May be we just want to send the header update notification and proceed. In this case we may want to return to @pkalever's initial approach, i.e. don't call notification flush (but instead of modifying the flush operation making it NOP, we could just remove m_image_ctx.state->flush_update_watchers in ImageWatcher<I>::handle_payload(const HeaderUpdatePayload).

Copy link
Contributor

@idryomov idryomov Apr 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DisableFeaturesRequest calls ImageCtx::notify_update, which first calls state->handle_update_notification (where the nbd callback is fired the first time and gets stuck) and not waiting for the result it calls image_watcher->notify_header_update (we receive our own notification, the the nbd callback is fired the second time and gets stuck), and the notification machinery returns ETIMEDOUT here.

As I highlighted earlier and @pkalever confirmed by running a test, the issue is not that the callback gets fired twice. Yes, it's most likely redundant, but the whole thing would get stuck even if state->handle_update_notification-initiated callback was suppressed. The issue is between a) the body of the callback, b) DisableFeaturesRequest code and c) the fact that when we receive our own notification, we attempt to "flush" callbacks.

I don't see how an additional (distinguishable) error code will be useful here (and where it should be set).

There is a more general issue here. Consider an abstract operation handler: it's allowed to return an arbitrary error code. If it happens to return ETIMEDOUT (not necessarily because of a notification-related issue but even on it's own) while being executed on behalf of some other client (i.e. a "remote" request), the client gets into a busy loop. We should either make it so that "genuine" ETIMEDOUT errors aren't retried (i.e. distinguish them) or make sure that operation handlers never return ETIMEDOUT.

I'm arguing that a "remote" request should behave exactly the same as a "local" request as far as error propagation goes. Today, if DisableFeaturesRequest is executed locally and returns ETIMEDOUT, the error is propagated to a user immediately. If the same DisableFeaturesRequest is executed remotely and returns ETIMEDOUT, we get into a busy loop. (Note that I have used DisableFeaturesRequest just as an example, this applies to all operations that can be proxied.)

I suppose what DisableFeaturesReques could do is to check the return value of ImageCtx::notify_update (in DisableFeaturesRequest<I>::handle_notify_update) and if it is ETIMEDOUT then reset it to 0, which would mean "the notification might have got stuck but I don't care much".

I'd rather change it to call ImageCtx::notify_update after unblocking the exclusive lock. But this needs to be investigated -- is DisableFeaturesRequest unique in what it does, what other operation handlers that block the exclusive lock and/or issue notifications do, etc. We definitely need consistency here.

And I still think it would be useful to get rid off that dup callback, i.e. remove state->handle_update_notification() form ImageCtx::notify_update, so only one callback is fired when HeaderUpdatePayload is received.

No argument here but this is a separate, third, issue. When considered against 1) DisableFeaturesRequest generating ETIMEDOUT error and 2) ETIMEDOUT error getting mishandled in the "remote" request case, I'd say it's of lowest priority.

Copy link
Contributor

@idryomov idryomov Apr 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and if it is ETIMEDOUT then reset it to 0, which would mean "the notification might have got stuck but I don't care much".

I'd rather change it to call ImageCtx::notify_update after unblocking the exclusive lock.

To elaborate on this a bit: I really dislike how librbd does this kind of error code munging in various (often non-obvious!) places. One recent example that we ran into while working on client-side encryption that took a while to debug was rbd_flatten just swallowing EINVAL errors. I'm guessing that it's done on the assumption that EINVAL is generated only if there is no parent but this is not the case for encrypted flatten. There are many other examples, some seemingly for the sake of idempotency, some for no good reason at all.

This is super fragile because it's very easy to "introduce" an error that is being swallowed as part of changing some low-level API and not notice. I think we should avoid adding to that pile at all costs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we were discussing only ETIMEDOUT issue here, leaving apart the issue with the nbd callback got stuck. So my last comment was about that issue. And sure removing the redundant notification will not fix it but midht simplify our consideration about the fix. And as a possible fix I actually proposed either of two:

  1. Ignore ETIMEDOUT error returned by ImageCtx::notify_update in DisableFeaturesReques.

  2. Make HeaderUpdate notification not return ETIMEDOUT error but resend the request until either success or some other error is returned. I.e. add async_request_id to the HeaderUpdatePayload and deal with it the same way we deal with other async requests (like snap create) that may execute long.

@pkalever pkalever changed the title rbd: do not queue the image header update flush task librbd: fix stuck with disable request Apr 26, 2023
@pkalever
Copy link
Author

@idryomov @trociny Thanks for the great discussion :-) It was really helpful.

In the PR I had a chance to fix couple of the issues discussed above which are w.r.t to https://tracker.ceph.com/issues/58740 , PTAL

@idryomov it will be nice if you could please help with a tracker that has details of the left out issues which were part of our discussion and you think are good to have fixes.

Here are the logs with this PR:

2023-04-26T19:43:18.105+0530 7f51fd7fa640 20 librbd::DisableFeaturesRequest: 0x7f522401a040 send_append_op_event                                                                                                                                                                
2023-04-26T19:43:18.108+0530 7f5234dcc640 20 librbd::DisableFeaturesRequest: 0x7f522401a040 handle_set_features: r=0                                                                                                                                                            
2023-04-26T19:43:18.108+0530 7f5234dcc640 20 librbd::DisableFeaturesRequest: 0x7f522401a040 send_notify_update                                                                                                                                                                  
2023-04-26T19:43:18.108+0530 7f5234dcc640 20 librbd::ExclusiveLock: 0x7f5204000b60 unblock_requests                                                                                                                                                                             
2023-04-26T19:43:18.108+0530 7f5234dcc640  5 librbd::io::WriteBlockImageDispatch: 0x55c90bf34750 unblock_writes: 0x55c90c085400, num=0                                                                                                                                          
2023-04-26T19:43:18.108+0530 7f5234dcc640 10 librbd::ImageState: 0x55c90beda9a0 handle_prepare_lock_complete                                                                                                                                                                    
2023-04-26T19:43:18.108+0530 7f5234dcc640 10 librbd::ImageWatcher: 0x7f5208007860: notify_header_update                                                                                                                                                                         
2023-04-26T19:43:18.108+0530 7f5234dcc640 20 librbd::watcher::Notifier: 0x7f5208007920 notify: pending=1                                                                                                                                                                        
2023-04-26T19:43:18.109+0530 7f52355cd640  5 librbd::Watcher: 0x7f5208007860 notifications_blocked: blocked=0                                                                                                                                                                   
2023-04-26T19:43:18.109+0530 7f52355cd640 10 librbd::Watcher::C_NotifyAck 0x7f51ec0025d0 C_NotifyAck: id=111669149701, handle=139990231185904                                                                                                                                   
2023-04-26T19:43:18.109+0530 7f52355cd640 10 librbd::ImageWatcher: 0x7f5208007860 image header updated                                                                                                                                                                          
2023-04-26T19:43:18.109+0530 7f52355cd640 20 librbd::ImageState: 0x55c90beda9a0 handle_update_notification: refresh_seq = 2, last_refresh = 1                                                                                                                                   
2023-04-26T19:43:18.109+0530 7f52355cd640 20 librbd::ImageState: 0x55c90c0d3860 ImageUpdateWatchers::notify                                                                                                                                                                     
2023-04-26T19:43:18.109+0530 7f52355cd640 20 librbd::ImageState: 0x55c90c0d3860 ImageUpdateWatchers::send_notify: handle=0, watcher=0x7ffca9fbba30                                                                                                                              
2023-04-26T19:43:18.109+0530 7f52355cd640 20 librbd::ImageState: 0x55c90beda9a0 flush_update_watchers                                                                                                                                                                           
2023-04-26T19:43:18.109+0530 7f52355cd640 20 librbd::ImageState: 0x55c90c0d3860 ImageUpdateWatchers::flush                                                                                                                                                                      
2023-04-26T19:43:18.109+0530 7f51fdffb640 20 librbd::ImageState: 0x55c90c0d3860 ImageUpdateWatchers::handle_notify: handle=0, watcher=0x7ffca9fbba30                                                                                                                            
2023-04-26T19:43:18.109+0530 7f51fdffb640  5 rbd-nbd: handle_notify PrasannaXXX: enter                                                                                                                                                                                            
2023-04-26T19:43:18.109+0530 7f51fdffb640 20 librbd: info 0x55c90c085400                                                                                                                                                                                                        
2023-04-26T19:43:18.109+0530 7f51fdffb640 10 librbd::ImageState: 0x55c90beda9a0 0x55c90beda9a0 send_refresh_unlock                                                                                                                                                              
2023-04-26T19:43:18.109+0530 7f51fdffb640 10 librbd::image::RefreshRequest: 0x7f51e800a470 send_v2_get_mutable_metadata                                                                                                                                                         
2023-04-26T19:43:18.110+0530 7f5234dcc640 10 librbd::image::RefreshRequest: 0x7f51e800a470 handle_v2_get_mutable_metadata: r=0                                                                                                                                                  
2023-04-26T19:43:18.110+0530 7f5234dcc640 10 librbd::image::RefreshRequest: 0x7f51e800a470 send_v2_get_parent: legacy=0                                                                                                                                                         
2023-04-26T19:43:18.111+0530 7f52355cd640 10 librbd::image::RefreshRequest: 0x7f51e800a470 handle_v2_get_parent: r=0                                                                                                                                                            
2023-04-26T19:43:18.111+0530 7f52355cd640 10 librbd::image::RefreshRequest: 0x7f51e800a470 send_v2_get_metadata                                                                                                                                                                 
2023-04-26T19:43:18.111+0530 7f52355cd640 15 librbd::image::GetMetadataRequest: 0x7f522c021440 metadata_list: start_key=conf_                                                                                                                                                   
2023-04-26T19:43:18.111+0530 7f5234dcc640 15 librbd::image::GetMetadataRequest: 0x7f522c021440 handle_metadata_list: r=0                                                                                                                                                        
2023-04-26T19:43:18.111+0530 7f5234dcc640 15 librbd::image::GetMetadataRequest: 0x7f522c021440 finish: r=0                                                                                                                                                                      
2023-04-26T19:43:18.111+0530 7f5234dcc640 10 librbd::image::RefreshRequest: 0x7f51e800a470 handle_v2_get_metadata: r=0                                                                                                                                                          
2023-04-26T19:43:18.111+0530 7f5234dcc640 10 librbd::image::RefreshRequest: 0x7f51e800a470 send_v2_get_pool_metadata                                                                                                                                                            
2023-04-26T19:43:18.111+0530 7f5234dcc640 15 librbd::image::GetMetadataRequest: 0x7f522401a650 metadata_list: start_key=conf_                                                                                                                                                   
2023-04-26T19:43:18.112+0530 7f52355cd640 15 librbd::image::GetMetadataRequest: 0x7f522401a650 handle_metadata_list: r=0                                                                                                                                                        
2023-04-26T19:43:18.112+0530 7f52355cd640 15 librbd::image::GetMetadataRequest: 0x7f522401a650 finish: r=0                                                                                                                                                                      
2023-04-26T19:43:18.112+0530 7f52355cd640 10 librbd::image::RefreshRequest: 0x7f51e800a470 handle_v2_get_pool_metadata: r=0                                                                                                                                                     
2023-04-26T19:43:18.112+0530 7f52355cd640 20 librbd::ImageCtx: apply_metadata                                                                                                                                                                                                   
2023-04-26T19:43:18.112+0530 7f52355cd640 10 librbd::image::RefreshRequest: 0x7f51e800a470 send_v2_get_group                                                                                                                                                                    
2023-04-26T19:43:18.113+0530 7f5234dcc640 10 librbd::image::RefreshRequest: 0x7f51e800a470 handle_v2_get_group: r=0                                                                                                                                                             
2023-04-26T19:43:18.113+0530 7f5234dcc640 10 librbd::image::RefreshRequest: 0x7f51e800a470 send_v2_apply                                                                                                                                                                        
2023-04-26T19:43:18.113+0530 7f5234dcc640 10 librbd::image::RefreshRequest: 0x7f51e800a470 handle_v2_apply                                                                                                                                                                      
2023-04-26T19:43:18.113+0530 7f5234dcc640 20 librbd::image::RefreshRequest: 0x7f51e800a470 apply                                                                                                                                                                                
2023-04-26T19:43:18.113+0530 7f5234dcc640 10 librbd::ImageState: 0x55c90beda9a0 0x55c90beda9a0 handle_refresh: r=0                                                                                                                                                              
2023-04-26T19:43:18.113+0530 7f51fdffb640  5 rbd-nbd: handle_notify PrasannaXXX: finish                                                                                                                                                                                            
2023-04-26T19:43:18.113+0530 7f51fdffb640 20 librbd::ImageState: 0x55c90c0d3860 ImageUpdateWatchers::operator(): completing flush                                                                                                                                               
2023-04-26T19:43:18.113+0530 7f51fdffb640 10 librbd::ImageWatcher: 0x7f520801d440 C_ResponseMessage: r=0                                                                                                                                                                        
2023-04-26T19:43:18.113+0530 7f51fdffb640 10 librbd::Watcher::C_NotifyAck 0x7f51ec0025d0 finish: r=0                                                                                                                                                                            
2023-04-26T19:43:18.114+0530 7f5234dcc640 20 librbd::watcher::Notifier: 0x7f5208007920 handle_notify: r=0                                                                                                                                                                       
2023-04-26T19:43:18.114+0530 7f5234dcc640 20 librbd::watcher::Notifier: 0x7f5208007920 handle_notify: pending=0                                                                                                                                                                 
2023-04-26T19:43:18.114+0530 7f5234dcc640 20 librbd::DisableFeaturesRequest: 0x7f522401a040 handle_notify_update: r=0                                                                                                                                                           
2023-04-26T19:43:18.114+0530 7f5234dcc640 20 librbd::DisableFeaturesRequest: 0x7f522401a040 handle_finish: r=0                                                                                                                                                                  
2023-04-26T19:43:18.114+0530 7f5234dcc640 10 librbd::Request: 0x7f522401a040 create_context_finisher                                                                                                                                                                            
2023-04-26T19:43:18.114+0530 7f5234dcc640 10 librbd::Request: 0x7f522401a040 finish: r=0                                                                                                                                                                                        
2023-04-26T19:43:18.114+0530 7f5234dcc640 20 librbd::Operations: finish_op: update features r=0                                                                                                         

src/librbd/ImageCtx.cc Outdated Show resolved Hide resolved
@idryomov
Copy link
Contributor

@idryomov it will be nice if you could please help with a tracker that has details of the left out issues which were part of our discussion and you think are good to have fixes.

https://tracker.ceph.com/issues/59563
https://tracker.ceph.com/issues/58185

@pkalever
Copy link
Author

@idryomov it will be nice if you could please help with a tracker that has details of the left out issues which were part of our discussion and you think are good to have fixes.

https://tracker.ceph.com/issues/59563 https://tracker.ceph.com/issues/58185

Thank you @idryomov !

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@pkalever pkalever force-pushed the fix-feature-disable branch 2 times, most recently from 889d124 to 597bf2e Compare April 27, 2023 11:01
@github-actions github-actions bot added the tests label Apr 27, 2023
@pkalever
Copy link
Author

re-basing as there is a merge conflict.

@idryomov
Copy link
Contributor

idryomov commented Oct 7, 2023

I suppose an alternative (although not very nice) solution could be to open an additional image context and use it when checking the image size (i.e. image2.size(&new_size)). I think I would even prefer if image2 was created on every notification. Although it is not clear what to do with image.invalidate_cache(), which potentially might get stuck too.

Honestly, I'm not sure what that invalidate_cache() call is for. Taken with the rescan of the partition table, I initially assumed an attempt to pick up the pieces after an arbitrary write to the image, not just a straight resize. But I see that while the invalidate_cache() call has been there all along, the rescan was added as workaround for a kernel bug in https://tracker.ceph.com/issues/22131.

But, disregarding that, having to open an additional image context isn't nice indeed.

@idryomov
Copy link
Contributor

idryomov commented Oct 7, 2023

It has been a long-standing behavior of rbd_resize() API that it blocks on the notify. For both krbd and rbd-nbd (prior to this change) the "bubble up" step has been included, so this can be considered a regression.

I poked around and I see that rbd_resize() API doesn't return an error if the notify times out (e.g. because the "bubble up" step takes too long) and that is also a long-standing behavior:

void complete(int r) override {
CephContext *cct = image_ctx.cct;
if (notified) {
if (r == -ETIMEDOUT) {
// don't fail the op if a peer fails to get the update notification
lderr(cct) << "update notification timed-out" << dendl;
r = 0;

This goes back to Jewel (2016) and means that waiting for rbd_resize() to return, checking for success and proceeding with something like check_size() on the client isn't reliable even without this change.

I'm leaning towards patching the resize handler in fsx to make it wait for the device to actually get resized.

Copy link
Contributor

@idryomov idryomov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to patch

the resize handler in fsx to make it wait for the device to actually get resized.

@pkalever
Copy link
Author

@idryomov I thought you asked me to do something like:

diff --git a/src/test/librbd/fsx.cc b/src/test/librbd/fsx.cc
index acea8ede270..351b7a6c603 100644
--- a/src/test/librbd/fsx.cc
+++ b/src/test/librbd/fsx.cc
@@ -778,13 +778,27 @@ int
 __librbd_resize(struct rbd_ctx *ctx, uint64_t size)
 {
        int ret;
+       uint64_t effective_size;
+       int count = 0;
 
        ret = rbd_resize(ctx->image, size);
        if (ret < 0) {
-               prt("rbd_resize(%llu) failed\n", size);
+               prt("rbd_resize(0x%llx) failed\n", (unsigned long long)size);
                return ret;
        }
 
+       do {
+               sleep(count++);
+               ret = rbd_get_size(ctx->image, &effective_size);
+               if (ret < 0) {
+                       prt("rbd_get_size failed\n");
+                       return ret;
+               }
+               prt("Sizes: expected (0x%llx), effective (0x%llx)\n",
+                   (unsigned long long)size,
+                   (unsigned long long)effective_size);
+       } while (size != effective_size);
+
        return librbd_verify_object_map(ctx);
 }

But not sure why still the check_size() is failing...

# ceph_test_librbd_fsx -d -W -R -p 100 -r 512 -w 512 -t 512 -h 512 -l 250000000 -S 0 -N 6000 -M rbd test_image1 2>&1 | tee test_fsx_resize1.log
[...]
562 trunc       from 0xd19ee00 to 0x9119400
Sizes: expected (0x9119400), effective (0x9119400)
564 read        0x1c8b000 thru  0x1c94303       (0x9304 bytes)
566 write       0x25ce800 thru  0x25d9fa7       (0xb7a8 bytes)
567 read        0x6b3400 thru   0x6c29a4        (0xf5a5 bytes)
570 trunc       from 0x9119400 to 0xb2000
Sizes: expected (0xb2000), effective (0xb2000)
572 punch       from 0x41000 to 0x4fa00, (0xea00 bytes)
574 read        0x7ec00 thru    0x8dfc1 (0xf3c2 bytes)
Sizes: expected (0xe794e00), effective (0xe794e00)
575 write       0xe794800 thru  0xe794c2f       (0x430 bytes)
578 trunc       from 0xe794e00 to 0x3997600
Sizes: expected (0x3997600), effective (0x3997600)
581 punch       from 0x3662200 to 0x366aa00, (0x8800 bytes)
Sizes: expected (0xaa4e600), effective (0xaa4e600)
582 write       0xaa48c00 thru  0xaa4e4e0       (0x58e1 bytes)
583 punch       from 0x1e86800 to 0x1e93a00, (0xd200 bytes)
585 read        0x1f47600 thru  0x1f4bb20       (0x4521 bytes)
586 write       0xa6f4e00 thru  0xa6fc88a       (0x7a8b bytes)
587 punch       from 0x256c400 to 0x256dc00, (0x1800 bytes)
588 trunc       from 0xaa4e600 to 0xdff8800
Sizes: expected (0xdff8800), effective (0xdff8800)
Size error: expected 0xdff8800 stat 0xaa4e600

Note the Sizes: expected (0xdff8800), effective (0xdff8800) but the check size fails with Size error: expected 0xdff8800 stat 0xaa4e600 mismatched sizes?

@idryomov
Copy link
Contributor

@idryomov I thought you asked me to do something like:

This is not what we discussed. rbd_get_size() gets the size of the RBD image, not the size of the block device. The retries should be based on the size returned by BLKGETSIZE64 ioctl.

@pkalever
Copy link
Author

@idryomov I thought you asked me to do something like:

This is not what we discussed. rbd_get_size() gets the size of the RBD image, not the size of the block device. The retries should be based on the size returned by BLKGETSIZE64 ioctl.

@idryomov
Opps! Missed the actual point. Thanks fixed now.

Copy link
Contributor

@idryomov idryomov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change the title of the new commit to "test/librbd/fsx: wait for resize to propagate in krbd_resize()".

src/test/librbd/fsx.cc Outdated Show resolved Hide resolved
src/test/librbd/fsx.cc Show resolved Hide resolved
src/test/librbd/fsx.cc Outdated Show resolved Hide resolved
src/test/librbd/fsx.cc Outdated Show resolved Hide resolved
src/test/librbd/fsx.cc Outdated Show resolved Hide resolved
src/test/librbd/fsx.cc Outdated Show resolved Hide resolved
src/test/librbd/fsx.cc Outdated Show resolved Hide resolved
With this changes resize request will not be blocked until the resize is
completed. Because of this the fsx test fails as it assumes that the
request to resize immediately implies changes on the device size.

Hence we have to add a wait in resize handler of fsx for the device to
actually get resized.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
@idryomov
Copy link
Contributor

jenkins test api

@idryomov
Copy link
Contributor

@idryomov idryomov merged commit bf82a7b into ceph:main Oct 30, 2023
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants