-
-
Notifications
You must be signed in to change notification settings - Fork 279
Closed
Description
Info
Version: the latest operator and cluster chart from this repository.
Description
Our test environment consists of 3 nodes.
- $release-mysqlcluster-db-0 (master)
- $release-mysqlcluster-db-1
- $release-mysqlcluster-db-2
We are simulating a master node failure by killing the mysqld process in db-0
In the event of a DeadMaster event, orchestrator automatically promotes db-1 to master, but the new master node is stuck at not replicating error.
What would be the correct recovery process?
Operator log
{"severity":"INFO","timestamp":"2020-10-15T21:21:41.371071513Z","logger":"orchestrator-reconciler","message":"cluster not ready for acknowledge","key":"$namespace/$release-mysqlcluster-db","threshold":600}
{"severity":"ERROR","timestamp":"2020-10-15T21:26:14.340158975Z","logger":"kubebuilder.controller","message":"Reconciler error","controller":"mysqlbackup-controller","request":"$namespace/$release-mysql-cluster-db-auto-2020-10-14t19-24-00","error":"MysqlCluster.mysql.presslabs.org \"$release-mysql-cluster-db\" not found","stacktrace":"github.com/presslabs/mysql-operator/vendor/github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/presslabs/mysql-operator/vendor/github.com/go-logr/zapr/zapr.go:128\ngithub.com/presslabs/mysql-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/presslabs/mysql-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:217\ngithub.com/presslabs/mysql-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/go/src/github.com/presslabs/mysql-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158\ngithub.com/presslabs/mysql-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/github.com/presslabs/mysql-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\ngithub.com/presslabs/mysql-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/github.com/presslabs/mysql-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134\ngithub.com/presslabs/mysql-operator/vendor/k8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/github.com/presslabs/mysql-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"}
Dead master pod (db-0) log after restart
2020-10-15T21:11:23.994309Z 0 [Note] mysqld: ready for connections.
Version: '5.7.26-29-log' socket: '/var/lib/mysql/mysql.sock' port: 3306 Percona Server (GPL), Release 29, Revision 11ad961
2020-10-15T21:11:23.995177Z 3 [Note] Got an error reading communication packets
2020-10-15T21:11:23.995742Z 4 [Note] Got an error reading communication packets
2020-10-15T21:11:24.173001Z 6 [Note] Start binlog_dump to master_thread_id(6) slave_server(101), pos(, 4)
2020-10-15T21:11:28.133393Z 18 [Warning] Neither --relay-log nor --relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use '--relay-log=$release-mysqlcluster-db-mysql-0-relay-bin' to avoid this problem.
2020-10-15T21:11:28.150677Z 18 [Note] 'CHANGE MASTER TO FOR CHANNEL '' executed'. Previous state master_host='', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='$release-mysqlcluster-db-mysql-1.mysql.$release', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''.
2020-10-15T21:11:28.175401Z 20 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2020-10-15T21:11:28.176719Z 21 [Note] Slave SQL thread for channel '' initialized, starting replication in log 'FIRST' at position 0, relay log './$release-mysqlcluster-db-mysql-0-relay-bin.000001' position: 4
2020-10-15T21:11:28.184056Z 20 [Note] Slave I/O thread for channel '': connected to master 'sys_replication@$release-mysqlcluster-db-mysql-1.mysql.$release:3306',replication started in log 'FIRST' at position 4
2020-10-15T21:11:31.551274Z 6 [Note] Aborted connection 6 to db: 'unconnected' user: 'sys_replication' host: '172.30.254.227' (failed on flush_net())
New master node (db-1)
2020-10-15T21:40:51.873465Z 576 [ERROR] Slave I/O for channel '': error connecting to master 'sys_replication@//$release-mysqlcluster-db-mysql-0.mysql.$namespace:3306' - retry-time: 1 retries: 1755, Error_code: 2005
stefanandres and KerryDRX
Metadata
Metadata
Assignees
Labels
No labels