New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DPE-3547 mitigations for container restart #377
Conversation
* set floor for max_connections in 100 * function retries * flush logs in single call * + test coverage
dpe-3547-mitigations-for-container-kills # Conflicts: # tests/unit/test_mysql_k8s_helpers.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see vm review comments canonical/mysql-operator#398 (review)
content = self.container.list_files(MYSQL_DATA_DIR) | ||
content_set = {item.name for item in content} | ||
logger.debug("Resetting MySQL data directory.") | ||
for item in content_set: | ||
self.container.remove_path(f"{MYSQL_DATA_DIR}/{item}", recursive=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you just remove the parent directory?
https://github.com/canonical/mysql-router-k8s-operator/blob/f2cbb11ba9c333563acbb9f9b1e159adbded15b6/src/rock.py#L64-L65
(or remove parent dir & mkdir)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've observed an eventual failure to remove the parent directory due some process accessing it, but could not determine the process. Removing the content instead did not present the issue
@@ -93,7 +93,7 @@ def test_on_leader_elected_secrets(self): | |||
secret_data = self.harness.model.get_secret(label="mysql-k8s.app").get_content() | |||
|
|||
# Test passwords in content and length | |||
required_passwords = ["root-password", "server-config-password", "cluster-admin-password"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing about his in PR description. Q: Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
forgot to pop from local stash, fixed ea1e96d
Issue
Workload containers get restarted due timeout on livenessProbe pebble endpoint.
Discussion at juju lp bug: https://bugs.launchpad.net/bugs/2052517
Solution
Here some mitigations and optimizations, not a final solution though.