Replication/recovery requires a lot of pieces to work nicely together. The unit tests do alright but they don't show that the system works when all of the units are combined. More than that, they're a really inconvenient place to put any tests for interactions between different versions of the software (and the resulting state).
Add a different kind of tests that shows replicas are really usable for recovery and that a replica created by one version of the software can be recovered using other versions of the software (presumably, usually, newer versions).