Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os/bluestore: More support for cleaning zones. #41919

Merged
merged 7 commits into from
Jun 30, 2021

Conversation

agayev
Copy link
Contributor

@agayev agayev commented Jun 17, 2021

The protocol for cleaning zones is as follows:                                     
                                                                                   
1. The ZonedAllocator wakes up the cleaner thread.                                 
2. The cleaner thread acquires the list of zones to clean                          
3. Cleaning multiple zones is not atomic; therefore, to support resuming the       
   cleaning if crashed, the cleaner thread first persists the list of zones to     
   clean as a value of a key "cleaning_in_progress_zones", by calling              
   ZonedFreelistManager's mark_zones_to_clean_in_progress.                         
4. The cleaner thread then iterates over the zones and cleans zones by calling     
   _zoned_clean_zone on each zone. The latter calls an operation                   
   _do_move on each live object on the zone that atomically moves an object from   
   the cleaned zone to a new zone. (_do_move is to be implemented.)                
5. Once all of the zones are cleaned, the cleaner thread calls reset_zones,        
   which resets the write pointer within the physical zoned block device           
6. Finally, it calls ZonedFreelistManager's mark_zones_to_clean_free method which  
   in one atomic operation resets the write pointer of the cleaned zones in the    
   db and deletes the key "cleaning_in_progress_zones", that is, the list of       
   zones to be cleaned recorded in step 3.                                         
                                                                                   
A crash between or within any of these steps will leave the system in consistent   
state. Specifically, each zone will either be completely cleaned, or partially     
cleaned, or not cleaned. A recovery code will need to check for the existence of   
the "cleaning_in_progress_zones" key and if found, it will resume cleaning zones   
where it left off. It is possible that if we crash between steps 5 and 6, or       
within step 5, we end resetting the write pointer within the physical zoned        
block device multiple times, but that's okay because the latter is an idempotent   
operation.                                                                         
                                                                                   
Signed-off-by: Abutalib Aghayev <agayev@psu.edu>                                   

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

@agayev
Copy link
Contributor Author

agayev commented Jun 17, 2021

@ifed01 please take a look and let me know what you think. I tried to address your comment from #38641 about making zone cleaning and database update atomic.

@tchaikov it also contains some code from #41076 for which you wanted to see the use case. This PR makes use of this. Also, this PR builds on top of #41845, which @ifed01 has already approved (so it can be QA'ed and then merged). Thanks!

@agayev
Copy link
Contributor Author

agayev commented Jun 21, 2021

@ifed01 just letting you know that the crimson reviewer request was added by mistake and they will not review it; please review when you get a chance. Thanks!

@ifed01
Copy link
Contributor

ifed01 commented Jun 21, 2021

@ifed01 just letting you know that the crimson reviewer request was added by mistake and they will not review it; please review when you get a chance. Thanks!

@agayev - ok, will do.
we have a call on ZBD atm, could you join by any chance?

@ifed01
Copy link
Contributor

ifed01 commented Jun 21, 2021

Copy link
Contributor

@ifed01 ifed01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. Suggest to add recovery code for the sake of completeness though...

src/blk/zoned/HMSMRDevice.h Outdated Show resolved Hide resolved
@agayev
Copy link
Contributor Author

agayev commented Jun 21, 2021

@ifed01 just letting you know that the crimson reviewer request was added by mistake and they will not review it; please review when you get a chance. Thanks!

@agayev - ok, will do.
we have a call on ZBD atm, could you join by any chance?

Sorry, missed this. Replied to Sage to join for another meeting.

Signed-off-by: Abutalib Aghayev <agayev@psu.edu>
The protocol for cleaning zones is as follows:

1. The ZonedAllocator wakes up the cleaner thread.
2. The cleaner thread acquires the list of zones to clean
3. Cleaning multiple zones is not atomic; therefore, to support resuming the
   cleaning if crashed, the cleaner thread first persists the list of zones to
   clean as a value of a key "cleaning_in_progress_zones", by calling
   ZonedFreelistManager's mark_zones_to_clean_in_progress.
4. The cleaner thread then iterates over the zones and cleans zones by calling
   _zoned_clean_zone on each zone. The latter calls an operation
   _do_move on each live object on the zone that atomically moves an object from
   the cleaned zone to a new zone. (_do_move is to be implemented.)
5. Once all of the zones are cleaned, the cleaner thread calls reset_zones,
   which resets the write pointer within the physical zoned block device
6. Finally, it calls ZonedFreelistManager's mark_zones_to_clean_free method which
   in one atomic operation resets the write pointer of the cleaned zones in the
   db and deletes the key "cleaning_in_progress_zones", that is, the list of
   zones to be cleaned recorded in step 3.

A crash between or within any of these steps will leave the system in consistent
state. Specifically, each zone will either be completely cleaned, or partially
cleaned, or not cleaned. A recovery code will need to check for the existence of
the "cleaning_in_progress_zones" key and if found, it will resume cleaning zones
where it left off. It is possible that if we crash between steps 5 and 6, or
within step 5, we end resetting the write pointer within the physical zoned
block device multiple times, but that's okay because the latter is an idempotent
operation.

Signed-off-by: Abutalib Aghayev <agayev@psu.edu>
… pointer.

Signed-off-by: Abutalib Aghayev <agayev@psu.edu>
…ne by one.

Signed-off-by: Abutalib Aghayev <agayev@psu.edu>
@agayev agayev force-pushed the zoned-more-cleaning-support branch from 90355e0 to 9802a25 Compare June 22, 2021 13:35
…f zones

recorded as being cleaned in the database.

Signed-off-by: Abutalib Aghayev <agayev@psu.edu>
Signed-off-by: Abutalib Aghayev <agayev@psu.edu>
Signed-off-by: Abutalib Aghayev <agayev@psu.edu>
@agayev
Copy link
Contributor Author

agayev commented Jun 22, 2021

Overall LGTM. Suggest to add recovery code for the sake of completeness though...

Added the recovery code, although not confident if it is the ideal place for it. Please let me know what you think. Thanks!

@tchaikov tchaikov merged commit c42712d into ceph:master Jun 30, 2021
@aclamk
Copy link
Contributor

aclamk commented Oct 25, 2023

@agayev
I am preparing to make a refactor on BlueStore write path.
The problem is that it is impossible for me to test Zoned Namespace related code.
Is work related to it ongoing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants