osd: Avoid pwlc spanning intervals#67244
Conversation
|
No unit test included in this PR, but I'm working on a unit test harness for PeeringState (still cleaning it up ready for its own PR) and have a test for this issue there - see bill-scales@d1f9d5b |
Prevent the first write to FastEC in each interval from being a partial write to avoid the span of partial writes tracked by pwlc from spanning intervals. This stops bugs such as 73891 where a divergent write was not removed from the log because pwlc recorded that the shard had not participated in writes before and after the divergent write. Fixes: https://tracker.ceph.com/issues/73891 Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
6611dda to
dc0a195
Compare
|
Force push to rebase to pick up fix in main for API test failure |
aainscow
left a comment
There was a problem hiding this comment.
There is another mechanism in ECCommon which does something very similar:
next_write_all_shards.
This causes an empty transaction to be sent to shards which otherwise be sent nothing.
I wonder if this new mechanism should be used for this too?
|
I did look at next_write_all_shards, and while it is fairly similar it is not sufficient. This mechanism is used in cases where PrimaryLogPG is generating a log entry without doing a write (for example logging a failed write) and requires the log entry to get sent to and stored by each shard. The first_write_in_interval flag ensures that the first write transaction in the interval is sent to all shards AND that despite generate_transactions being called that written_shards is empty indicating that all shards have been updated, even if the write transaction is only modifying a subset of the shards. This ensures that PWLC gets cleared by PGBackend::partial_write and that Peering will roll back the write unless all shards active in the epoch have committed the update. |
| if (size_change || clear_whiteout) { | ||
| if (size_change || clear_whiteout || first_write_in_interval) { | ||
| all_shards_written(); | ||
| first_write_in_interval = false; |
There was a problem hiding this comment.
first_write_in_interval is a non-const reference, ACK.
Though, pointer for an in-out parameter would make it more visually explicit (*first_write_in_interval = false).
Prevent the first write to FastEC in each interval from being a partial write to avoid the span of partial writes tracked by pwlc from spanning intervals. This stops bugs such as 73891 where a divergent write was not removed from the log because pwlc recorded that the shard had not participated in writes before and after the divergent write.
Fixes: https://tracker.ceph.com/issues/73891
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins test classic perfJenkins Job | Jenkins Job Definitionjenkins test crimson perfJenkins Job | Jenkins Job Definitionjenkins test signedJenkins Job | Jenkins Job Definitionjenkins test make checkJenkins Job | Jenkins Job Definitionjenkins test make check arm64Jenkins Job | Jenkins Job Definitionjenkins test submodulesJenkins Job | Jenkins Job Definitionjenkins test dashboardJenkins Job | Jenkins Job Definitionjenkins test dashboard cephadmJenkins Job | Jenkins Job Definitionjenkins test apiJenkins Job | Jenkins Job Definitionjenkins test docsReadTheDocs | Github Workflow Definitionjenkins test ceph-volume allJenkins Jobs | Jenkins Jobs Definitionjenkins test windowsJenkins Job | Jenkins Job Definitionjenkins test rook e2eJenkins Job | Jenkins Job DefinitionYou must only issue one Jenkins command per-comment. Jenkins does not understand
comments with more than one command.