Mysql dont start completely and readiness fail #416

cpanato · 2019-10-24T15:58:08Z

Notice sometimes when the pod restart for any reason it never comes up again
for example

NAME                       READY   STATUS     RESTARTS   AGE
db-mysql-0                 3/4     Running    0          5m29s

events:

Events:
  Type     Reason     Age                From                                    Message
  ----     ------     ----               ----                                    -------
  Normal   Scheduled  2m2s               default-scheduler                       Successfully assigned 6c7951pozpb68x3w3es6pbodqe/db-mysql-0 to ipxxxxxxxx
  Normal   Pulled     103s               kubelet, ipxxxxxxxx  Container image "quay.io/presslabs/mysql-operator-sidecar:0.3.2" already present on machine
  Normal   Created    103s               kubelet, ipxxxxxxxx  Created container
  Normal   Started    103s               kubelet, ipxxxxxxxx  Started container
  Normal   Pulled     102s               kubelet, ipxxxxxxxx  Container image "percona@sha256:713c1817615b333b17d0fbd252b0ccc53c48a665d4cfcb42178167435a957322" already present on machine
  Normal   Created    102s               kubelet, ipxxxxxxxx  Created container
  Normal   Started    102s               kubelet, ipxxxxxxxx  Started container
  Normal   Pulled     101s               kubelet, ipxxxxxxxx  Container image "prom/mysqld-exporter:v0.11.0" already present on machine
  Normal   Created    101s               kubelet, ipxxxxxxxx  Created container
  Normal   Started    101s               kubelet, ipxxxxxxxx  Started container
  Normal   Pulled     101s               kubelet, ipxxxxxxxx  Container image "percona@sha256:713c1817615b333b17d0fbd252b0ccc53c48a665d4cfcb42178167435a957322" already present on machine
  Normal   Created    101s               kubelet, ipxxxxxxxx  Created container
  Normal   Started    101s               kubelet, ipxxxxxxxx  Started container
  Normal   Pulled     101s               kubelet, ipxxxxxxxx  Container image "quay.io/presslabs/mysql-operator-sidecar:0.3.2" already present on machine
  Normal   Created    100s               kubelet, ipxxxxxxxx  Created container
  Normal   Started    100s               kubelet, ipxxxxxxxx  Started container
  Normal   Pulled     100s               kubelet, ipxxxxxxxx  Container image "quay.io/presslabs/mysql-operator-sidecar:0.3.2" already present on machine
  Normal   Created    100s               kubelet, ipxxxxxxxx  Created container
  Normal   Started    100s               kubelet, ipxxxxxxxx  Started container
  Warning  Unhealthy  84s (x7 over 96s)  kubelet, ipxxxxxxxx  Readiness probe failed:

logs from the pods

Create rclone.conf file.
2019-10-24T15:47:32.823Z	INFO	sidecar	environment is not set	{"key": "INIT_BUCKET_URI"}
2019-10-24T15:47:32.823Z	INFO	sidecar	cloning command	{"host": "db-mysql-0"}
2019-10-24T15:47:32.823Z	INFO	sidecar	data already exists! Remove manually PVC to cleanup or to reinitialize.
2019-10-24T15:47:32.824Z	INFO	sidecar	configuring server	{"host": "db-mysql-0"}
2019-10-24T15:47:32.824Z	INFO	sidecar	error while reading PURGE GTID from xtrabackup info file	{"error": "open /var/lib/mysql/xtrabackup_binlog_info: no such file or directory"}
Initialization complete, now exiting!
2019-10-24T15:47:34.526691Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2019-10-24T15:47:34.528701Z 0 [Note] mysqld (mysqld 5.7.26-29-log) starting as process 1 ...
2019-10-24T15:47:34.538056Z 0 [Note] InnoDB: PUNCH HOLE support available
2019-10-24T15:47:34.538155Z 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2019-10-24T15:47:34.538178Z 0 [Note] InnoDB: Uses event mutexes
2019-10-24T15:47:34.538215Z 0 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
2019-10-24T15:47:34.538233Z 0 [Note] InnoDB: Compressed tables use zlib 1.2.7
2019-10-24T15:47:34.538251Z 0 [Note] InnoDB: Using Linux native AIO
2019-10-24T15:47:34.538754Z 0 [Note] InnoDB: Number of pools: 1
2019-10-24T15:47:34.538969Z 0 [Note] InnoDB: Using CPU crc32 instructions
2019-10-24T15:47:34.545153Z 0 [Note] InnoDB: Initializing buffer pool, total size = 512M, instances = 1, chunk size = 128M
2019-10-24T15:47:34.569645Z 0 [Note] InnoDB: Completed initialization of buffer pool
2019-10-24T15:47:34.578085Z 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2019-10-24T15:47:34.595271Z 0 [Note] InnoDB: Crash recovery did not find the parallel doublewrite buffer at /var/lib/mysql/xb_doublewrite
2019-10-24T15:47:34.597393Z 0 [Note] InnoDB: Highest supported file format is Barracuda.
2019-10-24T15:47:34.633116Z 0 [Note] InnoDB: Created parallel doublewrite buffer at /var/lib/mysql/xb_doublewrite, size 3932160 bytes
2019-10-24T15:47:34.867432Z 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2019-10-24T15:47:34.867512Z 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2019-10-24T15:47:35.011558Z 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2019-10-24T15:47:35.012721Z 0 [Note] InnoDB: 96 redo rollback segment(s) found. 96 redo rollback segment(s) are active.
2019-10-24T15:47:35.012737Z 0 [Note] InnoDB: 32 non-redo rollback segment(s) are active.
2019-10-24T15:47:35.013453Z 0 [Note] InnoDB: Waiting for purge to start
2019-10-24T15:47:35.064136Z 0 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.7.26-29 started; log sequence number 203893653
2019-10-24T15:47:35.064470Z 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
2019-10-24T15:47:35.064657Z 0 [Note] Plugin 'FEDERATED' is disabled.
2019-10-24T15:47:35.095022Z 0 [Note] Found ca.pem, server-cert.pem and server-key.pem in data directory. Trying to enable SSL support using them.
2019-10-24T15:47:35.095166Z 0 [Note] Skipping generation of SSL certificates as certificate files are present in data directory.
2019-10-24T15:47:35.099564Z 0 [Warning] CA certificate ca.pem is self signed.
2019-10-24T15:47:35.099626Z 0 [Note] Skipping generation of RSA key pair as key files are present in data directory.
2019-10-24T15:47:35.100921Z 0 [Note] Server hostname (bind-address): '*'; port: 3306
2019-10-24T15:47:35.100962Z 0 [Note] IPv6 is available.
2019-10-24T15:47:35.100976Z 0 [Note]   - '::' resolves to '::';
2019-10-24T15:47:35.100998Z 0 [Note] Server socket created on IP: '::'.
2019-10-24T15:47:35.174622Z 0 [Note] InnoDB: Buffer pool(s) load completed at 191024 15:47:35
2019-10-24T15:47:35.214181Z 0 [Note] Event Scheduler: Loaded 0 events
2019-10-24T15:47:35.214376Z 0 [Note] Execution of init_file '/etc/mysql/conf.d/operator-init.sql' started.
2019-10-24T15:47:35.224302Z 0 [Note] Execution of init_file '/etc/mysql/conf.d/operator-init.sql' ended.
2019-10-24T15:47:35.224498Z 0 [Note] mysqld: ready for connections.
Version: '5.7.26-29-log'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  Percona Server (GPL), Release 29, Revision 11ad961
Create rclone.conf file.
2019-10-24T15:47:34.787Z	INFO	sidecar	environment is not set	{"key": "INIT_BUCKET_URI"}
2019-10-24T15:47:34.788Z	INFO	sidecar	start http server for backups
time="2019-10-24T15:47:35Z" level=info msg="Starting mysqld_exporter (version=0.11.0, branch=HEAD, revision=5d7179615695a61ecc3b5bf90a2a7c76a9592cdd)" source="mysqld_exporter.go:206"
time="2019-10-24T15:47:35Z" level=info msg="Build context (go=go1.10.3, user=root@3d3ff666b0e4, date=20180629-15:00:35)" source="mysqld_exporter.go:207"
time="2019-10-24T15:47:35Z" level=info msg="Enabled scrapers:" source="mysqld_exporter.go:218"
time="2019-10-24T15:47:35Z" level=info msg=" --collect.global_status" source="mysqld_exporter.go:222"
time="2019-10-24T15:47:35Z" level=info msg=" --collect.global_variables" source="mysqld_exporter.go:222"
time="2019-10-24T15:47:35Z" level=info msg=" --collect.slave_status" source="mysqld_exporter.go:222"
time="2019-10-24T15:47:35Z" level=info msg=" --collect.info_schema.tables" source="mysqld_exporter.go:222"
time="2019-10-24T15:47:35Z" level=info msg=" --collect.heartbeat" source="mysqld_exporter.go:222"
time="2019-10-24T15:47:35Z" level=info msg="Listening on 0.0.0.0:9125" source="mysqld_exporter.go:232"
Create rclone.conf file.
Usage: /usr/local/bin/sidecar-entrypoint.sh {files-config|clone|config-and-serve}
Now runs your command.
pt-heartbeat --update --replace --check-read-only --create-table --database sys_operator --table heartbeat --defaults-file /etc/mysql/heartbeat.conf --fail-successive-errors=20

any advice in how to recover from that?

/cc @AMecea @jwilander

The text was updated successfully, but these errors were encountered:

AMecea · 2019-10-29T12:17:59Z

Hi @cpanato, can you please describe the pod that is not ready (by running kubectl describe <pod>)? So I can see which container is not ready. If it's the mysql container then I need some logs of the operator too or at least check the logs for reconciling failures.

The operator sets the pod ready by updating the configured key under the OperatorStatusTableName table if it fails then the pod will not be ready. See the code here.

Hope this helps,

imriss · 2020-04-08T17:02:43Z

I have a similar issue with the readiness probe. The workaround has been to delete the mysql-operator pod. Any advise? The mysql operator is on 0.3.8. Thanks

onelittleant · 2020-08-19T19:24:55Z

Similar issue here. If a mysql pod is killed and crash recovery takes too long, then the readiness probe fails for the pod and it is again killed before it can finish crash recovery. The result is a pod that just keeps restarting and will never finish recovery and rejoin the cluster.

Warning  Unhealthy               5m43s (x5 over 9m43s)  kubelet, gke-btc2-dev-btc2-dev-db-pods-ef5dc68d-9jzf  Readiness probe failed: ERROR 2003 (HY000): Can't connect to MySQL server on '127.0.0.1' (111)
/bin/sh: line 0: test: -eq: unary operator expected
  Normal   Killing    5m18s                  kubelet, gke-btc2-dev-btc2-dev-db-pods-ef5dc68d-9jzf  Container mysql failed liveness probe, will be restarted

helm ls -n db
NAME          	NAMESPACE	REVISION	UPDATED                             	STATUS  	CHART               	APP VERSION
btc2dev       	db       	31      	2020-08-19 12:36:02.215646 -0600 MDT	deployed	mysql-cluster-0.2.0 	1.0        
mysql-operator	db       	10      	2020-08-19 13:08:29.740631 -0600 MDT	deployed	mysql-operator-0.4.0	v0.4.0

Erically · 2020-11-13T05:49:25Z

resolved?

enoliveira · 2021-01-02T16:18:01Z

still happening.

onelittleant · 2021-01-26T00:14:43Z

Because we are unable to configure the readinessProbe we cannot provide cluster stability in the case of a mysql pod that requires crash recovery. As a result we are unable to use this solution in a production environment. We really like the operator model and have been eager to use it in production. If we could give readinessProbe configurations in the mysql podSpec it would get us there, but as-is we can only use this for dev/test environments.

HendrikRoehm · 2021-04-06T15:06:41Z

This is a serious bug, which is not resolved yet. In my case, the mysql container was not ready.

After ssh-ing into the mysql container and manually executing UPDATE sys_operator.status SET value="1" WHERE name="configured", the pod turned ready for me...

HendrikRoehm · 2021-04-06T15:12:54Z

One side note: We had two instance with more than 10mb of data and one with approx. 1.5mb of data. The one with 1.5mb of data completed the readiness check successfully as the only one.

hmilkovi · 2021-04-13T12:24:08Z

Same here and it's easy to reproduce:
kubectl delete pod ha-mysql-db-mysql-0 --grace-period=0 --force

haslersn · 2021-04-16T19:06:30Z

After ssh-ing into the mysql container and manually executing UPDATE sys_operator.status SET value="1" WHERE name="configured", the pod turned ready for me...

@HendrikRoehm Can configured have different values in different Pods of the same MySQL cluster? Are these from different (local) databases? So do I need to do this workaround in multiple Pods?

HendrikRoehm · 2021-04-16T19:37:34Z

@haslersn I don't really know as I am no expert on this. In my case this was a small test mysql instance with no replication and thus only 1 pod. I would guess that you should set it on the master instance of the mysql cluster if there are multiple instances.

worldofgeese · 2021-09-22T21:13:57Z

We ran into this issue today when running a helm upgrade.

frjaraur · 2021-09-27T09:46:46Z

Is there any workaround for this issue?. I manage to make my cluster work updating sys_operator.status manually getting inside mysql pod, and everything goes up, but this is not the right way (if you have to move pod to another node issue will come back again). Does operator manage this value?.

worldofgeese · 2021-09-27T11:18:14Z

Can @calind take a look at this? This bug wipes out any serious use of this operator.

codebard · 2021-12-24T18:53:39Z

Just encountered this with a cluster whose node was recycled. The mysql pod reports its unready despite showing as healthy. Mysql is not accessible from outside. This killed my plans to use this operator in production. I had to move on.

tebaly · 2021-12-24T22:50:47Z

#777 (comment)

Are you sure it has nothing to do with calico? In my case, the problem with this

mazizang · 2022-01-05T06:13:25Z

@tebaly
our production enviroment is using flannel, same problem

imriss · 2022-01-06T19:29:24Z

This could be a bug related to this delete table when it assumes the file does not exists while the file actually exists:

mysql-operator/pkg/sidecar/appconf.go

Lines 201 to 205 in 18a6031

    
           _, err := os.Stat(path.Join(dataDir, constants.OperatorDbName, constants.OperatorStatusTableName+".ibd")) 
        
           if os.IsNotExist(err) { 
        
           	queries = append(queries, fmt.Sprintf("DROP TABLE IF EXISTS %s.%s", 
        
           		constants.OperatorDbName, constants.OperatorStatusTableName)) 
        
           }

I was able to make the mysql cluster get ready after deleting this file then deleting the pod:
/var/lib/mysql/sys_operator/status.ibd

codebard · 2022-01-11T08:06:06Z

Looking forward to the fix for this bug to be able to reconsider this operator for use in production.

proligde · 2022-02-17T13:08:11Z

It seems like performing a manual failover, i.e. promoting one of the read replicas to be the new master, fixes this issue as well. At least that did the trick in my case.

This issue occurred in a freshly set up test cluster where I had shut down (gracefully using ACPI powerdown) all three nodes at the same time on purpose. After starting all three up again, I ended up in that very state, where basically everything worked fine, except kubernetes didn't activate the MySQL service as the master node stuck with the mysql container in state NotReady and manual intervention was necessary. (Also using flannel here by the way)

randywatson1979 · 2022-07-17T14:07:50Z

My only workaround is:

set replicaset of mysql-operator to 0
delete the mysqlcluster statefullset
set replicaset of mysql-operator to 1
apply the mysql-cluster yaml

Not pretty though.

olivernadj · 2023-01-17T16:38:59Z

It is happening to me on kind.sigs.k8s.io v1.23.5 with 0.6.2 mysql-operator.

Looks like node_controller somehow "forget" to reconcile. Any edit on database pod trigger reconcile and eventually setting status=1

I still don't understand why reconcile was not happening and the only way I can reproduce the issue is when I shutdown my PC and start again. And even with this procedure, just cause the issue from time to time randomly.

voarsh2 · 2023-03-07T17:27:10Z

readinessProbe/livenessProbe configs would be most helpful... this is basic.....

mdshi5 · 2024-06-09T11:04:14Z

Still I have issue regarding readiness probe failed, I described and found "exec [/bin/sh -c test $(mysql --defaults-file=/etc/mysql/client.conf -NB -e 'SELECT COUNT(*) FROM sys_operator.status WHERE name="configured" AND value="1"') -eq 1] delay=5s timeout=5s period │
│ =2s #success=1 #failure=3". But I had to change the value of sys_operator.status to fix this, if I change one pod the value of another pod get changed, and I had to do a repetitive process to fix it temporarily. Tell me a workaround where I can configure this in the operator values or in mysqlcluster CR so that I can fix this readiness issue without any hassle.

milero added the bug label Jan 5, 2022

panpan0000 mentioned this issue Feb 21, 2023

[bugfix]update node_controller.go #857

Merged

1 task

voarsh2 mentioned this issue Mar 7, 2023

Expose readinessProbe/livenessProbe in Cluster config #879

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mysql dont start completely and readiness fail #416

Mysql dont start completely and readiness fail #416

cpanato commented Oct 24, 2019

AMecea commented Oct 29, 2019

imriss commented Apr 8, 2020

onelittleant commented Aug 19, 2020 •

edited

Loading

Erically commented Nov 13, 2020

enoliveira commented Jan 2, 2021

onelittleant commented Jan 26, 2021

HendrikRoehm commented Apr 6, 2021

HendrikRoehm commented Apr 6, 2021

hmilkovi commented Apr 13, 2021

haslersn commented Apr 16, 2021

HendrikRoehm commented Apr 16, 2021

worldofgeese commented Sep 22, 2021

frjaraur commented Sep 27, 2021

worldofgeese commented Sep 27, 2021

codebard commented Dec 24, 2021 •

edited

Loading

tebaly commented Dec 24, 2021

mazizang commented Jan 5, 2022

imriss commented Jan 6, 2022

codebard commented Jan 11, 2022

proligde commented Feb 17, 2022 •

edited

Loading

randywatson1979 commented Jul 17, 2022

olivernadj commented Jan 17, 2023

voarsh2 commented Mar 7, 2023

mdshi5 commented Jun 9, 2024

Mysql dont start completely and readiness fail #416

Mysql dont start completely and readiness fail #416

Comments

cpanato commented Oct 24, 2019

AMecea commented Oct 29, 2019

imriss commented Apr 8, 2020

onelittleant commented Aug 19, 2020 • edited Loading

Erically commented Nov 13, 2020

enoliveira commented Jan 2, 2021

onelittleant commented Jan 26, 2021

HendrikRoehm commented Apr 6, 2021

HendrikRoehm commented Apr 6, 2021

hmilkovi commented Apr 13, 2021

haslersn commented Apr 16, 2021

HendrikRoehm commented Apr 16, 2021

worldofgeese commented Sep 22, 2021

frjaraur commented Sep 27, 2021

worldofgeese commented Sep 27, 2021

codebard commented Dec 24, 2021 • edited Loading

tebaly commented Dec 24, 2021

mazizang commented Jan 5, 2022

imriss commented Jan 6, 2022

codebard commented Jan 11, 2022

proligde commented Feb 17, 2022 • edited Loading

randywatson1979 commented Jul 17, 2022

olivernadj commented Jan 17, 2023

voarsh2 commented Mar 7, 2023

mdshi5 commented Jun 9, 2024

onelittleant commented Aug 19, 2020 •

edited

Loading

codebard commented Dec 24, 2021 •

edited

Loading

proligde commented Feb 17, 2022 •

edited

Loading