This repository was archived by the owner on May 25, 2026. It is now read-only.
rev27
* Refactor scale down logic to not force remove unreachable instances * Add unit tests * Address lint warnings * Add sleep and idle timeout in test code in an attempt to stabilize tests * Address lint warnings * Add sleep after killing pod to give test environment time to stabilize * Wait for unit to be removed from cluster after force removing it * Wait for forcefully removed unit to actually be removed from the cluster * Add integration test for recovery after a network cut * Remove duplicate code introduced after merging main * Install helm3 in CI runner for self healing integration tests * Fix failing unit test * Pin flake8 to v5.0.4 due to incompatibilities with v6 * Install helm on CI runners using sudo * Port full cluster crash recovery logic from vm charm + add full cluster crash integration test * Replace extending pebble restart delay with using on-failure and on-success * Remove unused manifest file * Fix call to datetime.datetime.now() * Address PR feedback; recover cluster from full crash if one node * Avoid running update status when the cluster is in waiting state (e.g. for tls or initialization) * Attempt at bugfix for the full cluster crash recovery * Fix broken lint and unit tests * Fix lint warnings * Fix typo used to test locally * Fix minor bugs in full cluster crash recovery logic + make integration test pass * Fix lint warnings * Address PR feedback