Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry if MySQL Server is unreachable #190

Merged
merged 13 commits into from Jan 22, 2024

Conversation

carlcsaposs-canonical
Copy link
Contributor

Fixes #86

Context

These are the situations the charm should handle without raising an uncaught exception:

  • mysqlrouter --bootstrap fails because server does not have quorum
  • mysqlrouter --bootstrap fails because server is unreachable (error code 2003)
  • mysqlsh shell.connect() fails because server is unreachable (error code 2003)

(mysqlsh commands seem to not fail even if server does not have quorum)

Solution

Catch exceptions in the aforementioned situations & set waiting status. Retry connections on the next Juju event

Alternatives considered

We decided not to retry within the Juju event so that

  • (primary reason) the user is not blocked from removing the unit(s)
  • if the endpoint from mysql server changes, we need to get a new event to get the updated endpoint

A retry within the Juju event with a short timeout was considered. However, during initial deployment ~15 events are currently fired (so a 1 min timeout could block the charm for 15 minutes). It would be possible to mitigate that by storing information about the last retry—but the complexity was not worth it.

Discussion about Juju limitation that we may not get another event until update status: https://matrix.to/#/!xdClnUGkurzjxqiQcN:ubuntu.com/$_SXnGJWUQCeFGu9vpBxD5GJ5hgMMAHchOmWBPiqjDFY?via=ubuntu.com&via=matrix.org&via=cutefunny.art

Implementation/refactoring

  • Catch mysql shell DBError exceptions, serialize them to JSON, and raise corresponding exception in charm code. This was done to move complexity (i.e. exception handling) from mysqlsh python code to charm code
  • Move mysqlsh python code from strings to Jinja templates (for better syntax highlighting/analysis & code review/diffs)
  • Use JSON files to transfer data from mysqlsh python code to charm code instead of printing & parsing stdout

Copy link
Collaborator

@paulomach paulomach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

be my guest to refactor shell code on the server ;)

@@ -23,6 +25,15 @@
logger = logging.getLogger(__name__)


class _NoQuorum(server_exceptions.Error):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

besides using only here, why not concentrate this in src/server_exceptions.py?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's why I put it in workload.py—since it's only used there

my preference is to avoid splitting code across multiple files since imo it's harder to read—I think it's worth splitting if multiple other files use it

Copy link
Contributor

@taurus-forever taurus-forever left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done. Thank you!

@carlcsaposs-canonical carlcsaposs-canonical merged commit 014365a into main Jan 22, 2024
13 checks passed
@carlcsaposs-canonical carlcsaposs-canonical deleted the retry-server-connection branch January 22, 2024 08:28
carlcsaposs-canonical added a commit to canonical/mysql-router-operator that referenced this pull request Mar 5, 2024
carlcsaposs-canonical added a commit to canonical/mysql-router-operator that referenced this pull request Mar 5, 2024
carlcsaposs-canonical added a commit to canonical/mysql-router-operator that referenced this pull request Mar 5, 2024
carlcsaposs-canonical added a commit to canonical/mysql-router-operator that referenced this pull request Mar 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add retry logic if MySQL cluster unreachable
3 participants