Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blocked status failed to recover cluster. #329

Open
carlcsaposs-canonical opened this issue Oct 23, 2023 · 6 comments
Open

Blocked status failed to recover cluster. #329

carlcsaposs-canonical opened this issue Oct 23, 2023 · 6 comments
Labels
bug Something isn't working

Comments

@carlcsaposs-canonical
Copy link
Contributor

carlcsaposs-canonical commented Oct 23, 2023

Steps to reproduce

  1. Steps 3-6 from https://microstack.run/#get-started
  2. juju refresh mysql --channel 8.0/edge

Expected behavior

mysql app upgrades successfully & goes into active state

Actual behavior

mysql app enters blocked status failed to recover cluster.

Versions

Operating system: Ubuntu 22.04.3 LTS

Juju CLI: 3.2.3-genericlinux-amd64

Juju agent: 3.2.0

Charm revision: 99 before refresh (current 8.0/stable), 109 after refresh (current 8.0/edge)

microk8s: MicroK8s v1.26.9 revision 6059

Log output

Juju debug log:
sunbeam-debug-log.txt
sunbeam-debug-log-filtered.txt

unit-mysql-0: 09:22:26 INFO juju.cmd running containerAgent [3.2.0 c7107ada8c471aa3ba105e5433e61861227e2ed4 gc go1.20.4]
unit-mysql-0: 09:22:26 INFO juju.worker.upgradesteps upgrade steps for 3.2.0 have already been run.
unit-mysql-0: 09:22:26 INFO juju.api connection established to "wss://10.150.15.206:17070/model/9b07ebf5-8cf1-4858-8a94-3086f8416535/api"
unit-mysql-0: 09:22:26 INFO juju.worker.migrationminion migration phase is now: NONE
unit-mysql-0: 09:22:26 INFO juju.worker.caasupgrader abort check blocked until version event received
unit-mysql-0: 09:22:26 WARNING juju.worker.proxyupdater unable to set snap core settings [proxy.http= proxy.https= proxy.store=]: exec: "snap": executable file not found in $PATH, output: ""
unit-mysql-0: 09:22:26 INFO juju.agent.tools ensure jujuc symlinks in /var/lib/juju/tools/unit-mysql-0
unit-mysql-0: 09:22:27 INFO juju.worker.uniter hooks are retried true
unit-mysql-0: 09:22:27 INFO juju.downloader downloading from ch:amd64/jammy/mysql-k8s-109
unit-mysql-0: 09:22:27 INFO juju.downloader download verified ("ch:amd64/jammy/mysql-k8s-109")
unit-mysql-0: 09:22:37 INFO juju.worker.uniter found queued "upgrade-charm" hook
unit-mysql-0: 09:22:39 ERROR unit.mysql/0.juju-log Cluster upgrade failed, ensure pre-upgrade checks are ran first.
unit-mysql-0: 09:22:39 INFO juju.worker.uniter found queued "config-changed" hook
unit-mysql-0: 09:22:40 INFO juju.worker.uniter.operation ran "config-changed" hook (via hook dispatching script: dispatch)
unit-mysql-0: 09:22:40 INFO juju.worker.uniter reboot detected; triggering implicit start hook to notify charm
unit-mysql-0: 09:22:41 INFO unit.mysql/0.juju-log Running legacy hooks/start.
unit-mysql-0: 09:22:44 INFO unit.mysql/0.juju-log Setting up the logrotate configurations
unit-mysql-0: 09:22:51 INFO unit.mysql/0.juju-log Unit workload member-state is offline with member-role unknown
unit-mysql-0: 09:22:52 ERROR unit.mysql/0.juju-log Failed to reboot cluster
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-mysql-0/charm/src/mysql_k8s_helpers.py", line 684, in _run_mysqlsh_script
    stdout, _ = process.wait_output()
  File "/var/lib/juju/agents/unit-mysql-0/charm/venv/ops/pebble.py", line 1359, in wait_output
    raise ExecError[AnyStr](self._command, exit_code, out_value, err_value)
ops.pebble.ExecError: non-zero exit code 1 executing ['/usr/bin/mysqlsh', '--no-wizard', '--python', '--verbose=1', '-f', '/tmp/script.py', ';', 'rm', '/tmp/script.py'], stdout='', stderr="Cannot set LC_ALL to locale en_US.UTF-8: No such file or directory\nverbose: 2023-10-23T09:22:52Z: Loading startup files...\nverbose: 2023-10-23T09:22:52Z: Loading plugins...\nverbose: 2023-10-23T09:22:52Z: Connecting to MySQL at: clusteradmin@mysql-0.mysql-endpoints\nverbose: 2023-10-23T09:22:52Z: Shell.connect: tid=33: CONNECTED: mysql-0.mysql-endpoints\nverbose: 2023-10-23T09:22:52Z: Connecting to MySQL at: mysql://clusteradmin@mysql-0.mysql-endpoints:3306?connect-timeout=5000\nverbose: 2023-10-23T09:22:52Z: Dba.reboot_cluster_from_complete_outage: tid=34: CONNECTED: mysql-0.mysql-endpoints:3306\nverbose: 2023-10-23T09:22:52Z: Connecting to MySQL at: mysql://clusteradmin@mysql-0.mysql-endpoints:3306?connect-timeout=5000\nverbose: 2023-10-23T09:22:52Z: Dba.reboot_cluster_from_complete_outage: tid=35: CONNECTED: mysql-0.mysql-endpoints:3306\nverbose: 2023-10-23T09:22:52Z: Group Replication 'group_name' value: 072799b1-7180-11ee-bc9f-76d5c7fb0362\nverbose: 2023-10-23T09:22:52Z: Metadata 'group_name' value: 072799b1-718" [truncated]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-mysql-0/charm/lib/charms/mysql/v0/mysql.py", line 1989, in reboot_from_complete_outage
    self._run_mysqlsh_script("\n".join(reboot_from_outage_command))
  File "/var/lib/juju/agents/unit-mysql-0/charm/src/mysql_k8s_helpers.py", line 687, in _run_mysqlsh_script
    raise MySQLClientError(e.stderr)
charms.mysql.v0.mysql.MySQLClientError: Cannot set LC_ALL to locale en_US.UTF-8: No such file or directory
verbose: 2023-10-23T09:22:52Z: Loading startup files...
verbose: 2023-10-23T09:22:52Z: Loading plugins...
verbose: 2023-10-23T09:22:52Z: Connecting to MySQL at: clusteradmin@mysql-0.mysql-endpoints
verbose: 2023-10-23T09:22:52Z: Shell.connect: tid=33: CONNECTED: mysql-0.mysql-endpoints
verbose: 2023-10-23T09:22:52Z: Connecting to MySQL at: mysql://clusteradmin@mysql-0.mysql-endpoints:3306?connect-timeout=5000
verbose: 2023-10-23T09:22:52Z: Dba.reboot_cluster_from_complete_outage: tid=34: CONNECTED: mysql-0.mysql-endpoints:3306
verbose: 2023-10-23T09:22:52Z: Connecting to MySQL at: mysql://clusteradmin@mysql-0.mysql-endpoints:3306?connect-timeout=5000
verbose: 2023-10-23T09:22:52Z: Dba.reboot_cluster_from_complete_outage: tid=35: CONNECTED: mysql-0.mysql-endpoints:3306
verbose: 2023-10-23T09:22:52Z: Group Replication 'group_name' value: 072799b1-7180-11ee-bc9f-76d5c7fb0362
verbose: 2023-10-23T09:22:52Z: Metadata 'group_name' value: 072799b1-7180-11ee-bc9f-76d5c7fb0362
verbose: 2023-10-23T09:22:52Z: Connecting to MySQL at: mysql://clusteradmin@mysql-0.mysql-endpoints.openstack.svc.cluster.local:3306?connect-timeout=5000
verbose: 2023-10-23T09:22:52Z: Dba.reboot_cluster_from_complete_outage: tid=36: CONNECTED: mysql-0.mysql-endpoints.openstack.svc.cluster.local:3306
verbose: 2023-10-23T09:22:52Z: Connecting to MySQL at: mysql://clusteradmin@mysql-0.mysql-endpoints.openstack.svc.cluster.local:3306?connect-timeout=5000
verbose: 2023-10-23T09:22:52Z: Dba.reboot_cluster_from_complete_outage: tid=37: CONNECTED: mysql-0.mysql-endpoints.openstack.svc.cluster.local:3306
No PRIMARY member found for cluster 'cluster-b56bbe7bd4a6cc012b44ba93360df3b5'
verbose: 2023-10-23T09:22:52Z: ClusterSet info: member, primary, not primary_invalidated, not removed from set, primary status: UNKNOWN
Restoring the Cluster 'cluster-b56bbe7bd4a6cc012b44ba93360df3b5' from complete outage...

ERROR: RuntimeError: The current session instance does not belong to the Cluster: 'cluster-b56bbe7bd4a6cc012b44ba93360df3b5'.
Traceback (most recent call last):
  File "<string>", line 2, in <module>
RuntimeError: Dba.reboot_cluster_from_complete_outage: The current session instance does not belong to the Cluster: 'cluster-b56bbe7bd4a6cc012b44ba93360df3b5'.


unit-mysql-0: 09:22:53 INFO juju.worker.uniter.operation ran "mysql-pebble-ready" hook (via hook dispatching script: dispatch)

Additional context

Attempted to reproduce issue encountered by @javacruft

@carlcsaposs-canonical carlcsaposs-canonical added the bug Something isn't working label Oct 23, 2023
@carlcsaposs-canonical carlcsaposs-canonical transferred this issue from canonical/mysql-operator Oct 23, 2023
@github-actions
Copy link
Contributor

@canonical canonical deleted a comment from github-actions bot Oct 23, 2023
@carlcsaposs-canonical
Copy link
Contributor Author

potential cause: ERROR unit.mysql/0.juju-log Cluster upgrade failed, ensure pre-upgrade checks are ran first.

@carlcsaposs-canonical
Copy link
Contributor Author

carlcsaposs-canonical commented Oct 23, 2023

Tried with pre-upgrade-check before juju refresh

Result: blocked status upgrade failed. Check logs for rollback instruction

pre-upgrade-debug-log.txt
pre-upgrade-debug-log-filtered.txt

@gboutry
Copy link

gboutry commented Oct 31, 2023

Encountered the same issue in a deployment with 7 mysql servers. 5 out of the 7 failed to recover after a machine reboot with the same error.

Complete debug log:
debug-log.log
Each failing mysql server logs:
cinder-mysql.log
heat-mysql.log
keystone-mysql.log
nova-mysql.log
placement-mysql.log

@paulomach
Copy link
Contributor

Encountered the same issue in a deployment with 7 mysql servers. 5 out of the 7 failed to recover after a machine reboot with the same error.

@gboutry there's a fix on PR #324, released in edge channel. We are working to promote it to stable.

@paulomach
Copy link
Contributor

@gboutry have you had the chance to validate the fix?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants