mysql 5.7.5 breaks if user mounts a volume from a long running file system #3

jonathan-kosgei · 2017-03-24T14:50:38Z

You are affected by this bug docker-library/mysql#69, which has been driving me nuts all day. Is it possible to either downgrade your base mysql to a higher or lower stable version?

vadimtk · 2017-03-24T17:36:42Z

our official images are there https://hub.docker.com/r/percona/percona-xtradb-cluster/tags/
and the version we use is 5.7.17 at this moment

jonathan-kosgei · 2017-03-25T06:22:54Z

I'm using those and I'm trying to change the datadir mysql uses so I've made the following change:

	if [ -z "$DATADIR" ]; then
		DATADIR="$("mysqld" --verbose --wsrep_provider= --help 2>/dev/null | awk '$1 == "datadir" { print $2; exit }')"
	else
		echo DATADIR is set to $DATADIR
	fi

and a bit further down instead of 
mysqld --initialize-insecure

I have

mysqld --initialize-insecure --user=mysql --datadir=$DATADIR

https://gist.githubusercontent.com/jonathan-kosgei/fa7dd80b0259191404177b3842d30b2c/raw/bb1b1aa1a9422fe50b56b9ee57c1b30892af1384/percona-entrypoint.sh

But I'm only able to get so far before I run into the following error; after mysqld init process done, ready for startup

Registering in the discovery service

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
{"action":"create","node":{"key":"/pxc-cluster/queue/wordpress-2/00000000000000003088","value":"10.52.0.108","expiration":"2017-03-24T20:03:33.547457047Z","ttl":60,"modifiedIndex":3088,"createdIndex":3088}}
100   231  100   207  100    24  17857   2070 --:--:-- --:--:-- --:--:-- 18818
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   487  100   487    0     0  68121      0 --:--:-- --:--:-- --:--:-- 69571
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
{"action":"set","node":{"key":"/pxc-cluster/wordpress-2/10.52.0.108/ipaddr","value":"10.52.0.108","expiration":"2017-03-24T20:03:03.587150337Z","ttl":30,"modifiedIndex":3089,"createdIndex":3089}}
100   220  100   196  100    24  12826   1570 --:--:-- --:--:-- --:--:-- 13066
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
{"action":"set","node":{"key":"/pxc-cluster/wordpress-2/10.52.0.108/hostname","value":"percona-0","expiration":"2017-03-24T20:03:03.610346106Z","ttl":30,"modifiedIndex":3090,"createdIndex":3090}}
100   218  100   196  100    22  16577   1860 --:--:-- --:--:-- --:--:-- 17818
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   321  100   291  100    30  24079   2482 --:--:-- --:--:-- --:--:-- 24250
{"action":"update","node":{"key":"/pxc-cluster/wordpress-2/10.52.0.108","dir":true,"expiration":"2017-03-24T20:03:03.629957174Z","ttl":30,"modifiedIndex":3091,"createdIndex":3089},"prevNode":{"key":"/pxc-cluster/wordpress-2/10.52.0.108","dir":true,"modifiedIndex":3089,"createdIndex":3089}}
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   275  100   275    0     0  23948      0 --:--:-- --:--:-- --:--:-- 25000
Joining cluster �1�0�.�5�2�.�0�.�1�0�8�,�1�0�.�5�2�.�0�.�1�0�8,�1�0�.�5�2�.�0�.�1�0�8

vadimtk · 2017-03-25T16:49:41Z

I do not see error messages here.
can you show docker logs from the container?

jonathan-kosgei · 2017-03-26T09:03:30Z

Hi, I'm hesistant to open another issue so I'll post some more info:
When I start the cluster from scratch. With blank data directories and a fresh etcd cluster, everything seems to come up. However I look at the grastate.dat and I find that the seq_no for each pod is -1:

root@gluster-3:/mnt/gfs/gluster_vol-1/mysql# cat percona-0/grastate.dat
# GALERA saved state
version: 2.1
uuid:    a91f70f2-11f8-11e7-8f3d-86c2e58790ac
seqno:   -1
safe_to_bootstrap: 0
root@gluster-3:/mnt/gfs/gluster_vol-1/mysql# cat percona-1/grastate.dat
# GALERA saved state
version: 2.1
uuid:    a91f70f2-11f8-11e7-8f3d-86c2e58790ac
seqno:   -1
safe_to_bootstrap: 0
root@gluster-3:/mnt/gfs/gluster_vol-1/mysql# cat percona-2/grastate.dat
# GALERA saved state
version: 2.1
uuid:    a91f70f2-11f8-11e7-8f3d-86c2e58790ac
seqno:   -1
safe_to_bootstrap: 0

At this point I can do mysql -h percona -u wordpress -p and wordpress works.

Scenario 1:
I have 3 percona pods

/ # jonathan@ubuntu:~/Projects/k8wp$ kubectl get pods
NAME                         READY     STATUS    RESTARTS   AGE
etcd-0                       1/1       Running   1          12h
etcd-1                       1/1       Running   0          12h
etcd-2                       1/1       Running   3          12h
etcd-3                       1/1       Running   1          12h
percona-0                    1/1       Running   0          8m
percona-1                    1/1       Running   0          57m
percona-2                    1/1       Running   0          57m

When I try to restart percona-0 it gets kicked out of the cluster on restarting:
percona-0's gvwstate.dat file shows

root@gluster-3:/mnt/gfs/gluster_vol-1/mysql# cat percona-0/gvwstate.dat
my_uuid: b7571ff8-11f8-11e7-bd2d-8b50487e1523
#vwbeg
view_id: 3 b7571ff8-11f8-11e7-bd2d-8b50487e1523 3
bootstrap: 0
member: b7571ff8-11f8-11e7-bd2d-8b50487e1523 0
member: bd05a643-11f8-11e7-9dab-1b4fc20eaf6a 0
member: c33d6a73-11f8-11e7-9e86-fe1cf3d3367a 0
#vwend

The other 2 pods in the cluster show:

root@gluster-3:/mnt/gfs/gluster_vol-1/mysql# cat percona-1/gvwstate.dat
my_uuid: bd05a643-11f8-11e7-9dab-1b4fc20eaf6a
#vwbeg
view_id: 3 bd05a643-11f8-11e7-9dab-1b4fc20eaf6a 4
bootstrap: 0
member: bd05a643-11f8-11e7-9dab-1b4fc20eaf6a 0
member: c33d6a73-11f8-11e7-9e86-fe1cf3d3367a 0
#vwend
root@gluster-3:/mnt/gfs/gluster_vol-1/mysql# cat percona-2/gvwstate.dat
my_uuid: c33d6a73-11f8-11e7-9e86-fe1cf3d3367a
#vwbeg
view_id: 3 bd05a643-11f8-11e7-9dab-1b4fc20eaf6a 4
bootstrap: 0
member: bd05a643-11f8-11e7-9dab-1b4fc20eaf6a 0
member: c33d6a73-11f8-11e7-9e86-fe1cf3d3367a 0
#vwend

Here are what I think are the relevant errors from percona-0's startup:

2017-03-26T08:37:58.370605Z 0 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1

2017-03-26T08:37:58.372537Z 0 [Note] WSREP: gcomm: connecting to group 'wordpress-001', peer '10.52.0.26:'
2017-03-26T08:38:01.373345Z 0 [Note] WSREP: (b7571ff8, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://10.52.0.26:4567 timed out, no messages seen in PT3S
2017-03-26T08:38:01.373682Z 0 [Warning] WSREP: no nodes coming from prim view, prim not possible
2017-03-26T08:38:01.373750Z 0 [Note] WSREP: view(view_id(NON_PRIM,b7571ff8,5) memb {
	b7571ff8,0
} joined {
} left {
} partitioned {
})
2017-03-26T08:38:01.373838Z 0 [Note] WSREP: gcomm: connected
2017-03-26T08:38:01.373872Z 0 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
2017-03-26T08:38:01.373987Z 0 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2017-03-26T08:38:01.374012Z 0 [Note] WSREP: Opened channel 'wordpress-001'
2017-03-26T08:38:01.374108Z 0 [Note] WSREP: Waiting for SST to complete.
2017-03-26T08:38:01.374417Z 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2017-03-26T08:38:01.374469Z 0 [Note] WSREP: Flow-control interval: [16, 16]
2017-03-26T08:38:01.374491Z 0 [Note] WSREP: Received NON-PRIMARY.
2017-03-26T08:38:01.374560Z 1 [Note] WSREP: New cluster view: global state: :-1, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version -1

The ip it's trying to connect to 10.52.0.26 in 2017-03-26T08:37:58.372537Z 0 [Note] WSREP: gcomm: connecting to group 'wordpress-001', peer '10.52.0.26:' is actually that pods previous ip, here's the listing of keys in etcd I did before deleting percona-0

/ # etcdctl ls --recursive
/pxc-cluster
/pxc-cluster/wordpress
/pxc-cluster/queue
/pxc-cluster/queue/wordpress
/pxc-cluster/queue/wordpress-001
/pxc-cluster/wordpress-001
/pxc-cluster/wordpress-001/10.52.1.46
/pxc-cluster/wordpress-001/10.52.1.46/ipaddr
/pxc-cluster/wordpress-001/10.52.1.46/hostname
/pxc-cluster/wordpress-001/10.52.2.33
/pxc-cluster/wordpress-001/10.52.2.33/ipaddr
/pxc-cluster/wordpress-001/10.52.2.33/hostname
/pxc-cluster/wordpress-001/10.52.0.26
/pxc-cluster/wordpress-001/10.52.0.26/hostname
/pxc-cluster/wordpress-001/10.52.0.26/ipaddr

After kubectl delete pods/percona-0:

/ # etcdctl ls --recursive
/pxc-cluster
/pxc-cluster/queue
/pxc-cluster/queue/wordpress
/pxc-cluster/queue/wordpress-001
/pxc-cluster/wordpress-001
/pxc-cluster/wordpress-001/10.52.1.46
/pxc-cluster/wordpress-001/10.52.1.46/ipaddr
/pxc-cluster/wordpress-001/10.52.1.46/hostname
/pxc-cluster/wordpress-001/10.52.2.33
/pxc-cluster/wordpress-001/10.52.2.33/ipaddr
/pxc-cluster/wordpress-001/10.52.2.33/hostname
/pxc-cluster/wordpress

Also during the restart percona-0 tried to register to etcd with:

{"action":"create","node":{"key":"/pxc-cluster/queue/wordpress-001/00000000000000009886","value":"10.52.0.27","expiration":"2017-03-26T08:38:57.980325718Z","ttl":60,"modifiedIndex":9886,"createdIndex":9886}}
{"action":"set","node":{"key":"/pxc-cluster/wordpress-001/10.52.0.27/ipaddr","value":"10.52.0.27","expiration":"2017-03-26T08:38:28.01814818Z","ttl":30,"modifiedIndex":9887,"createdIndex":9887}}
{"action":"set","node":{"key":"/pxc-cluster/wordpress-001/10.52.0.27/hostname","value":"percona-0","expiration":"2017-03-26T08:38:28.037188157Z","ttl":30,"modifiedIndex":9888,"createdIndex":9888}}
{"action":"update","node":{"key":"/pxc-cluster/wordpress-001/10.52.0.27","dir":true,"expiration":"2017-03-26T08:38:28.054726795Z","ttl":30,"modifiedIndex":9889,"createdIndex":9887},"prevNode":{"key":"/pxc-cluster/wordpress-001/10.52.0.27","dir":true,"modifiedIndex":9887,"createdIndex":9887}}

which doesn't work.

So in summary when starting a fresh percona cluster of 3 kubernetes pods, the grastate.dat seq_no get's stuck at -1. On deleting one pod and watching it restart, expecting it to rejoin the cluster, it sets it's inital position to 00000000-0000-0000-0000-000000000000:-1 and tries to connect to itself (it's former ip), maybe because it'd been the first pod in the cluster? It then timeouts in it's erroneous connection to itself:

2017-03-26T08:38:05.374058Z 0 [Note] WSREP: (b7571ff8, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://10.52.0.26:4567 timed out, no messages seen in PT3S

The cluster doesn't get started properly and I'm unable to restart the cluster.

jonathan-kosgei · 2017-03-26T13:05:34Z

Fixed it with changing the entrypoint in the container to the following script:

#!/bin/bash
sed -i \"s|safe_to_bootstrap.*:.*|safe_to_bootstrap:1|1\" /var/lib/mysql/`hostname`/grastate.dat; 
/entrypoint.sh --wsrep-new-cluster;

Thanks. Ref -> https://www.claudiokuenzler.com/blog/494/galera-cluster-mysql-not-starting-failed-to-open-channel-reach-primary#.WNesDiF97Qo

jonathan-kosgei closed this as completed Mar 26, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mysql 5.7.5 breaks if user mounts a volume from a long running file system #3

mysql 5.7.5 breaks if user mounts a volume from a long running file system #3

jonathan-kosgei commented Mar 24, 2017 •

edited

Loading

vadimtk commented Mar 24, 2017

jonathan-kosgei commented Mar 25, 2017

vadimtk commented Mar 25, 2017

jonathan-kosgei commented Mar 26, 2017

jonathan-kosgei commented Mar 26, 2017 •

edited

Loading

mysql 5.7.5 breaks if user mounts a volume from a long running file system #3

mysql 5.7.5 breaks if user mounts a volume from a long running file system #3

Comments

jonathan-kosgei commented Mar 24, 2017 • edited Loading

vadimtk commented Mar 24, 2017

jonathan-kosgei commented Mar 25, 2017

vadimtk commented Mar 25, 2017

jonathan-kosgei commented Mar 26, 2017

jonathan-kosgei commented Mar 26, 2017 • edited Loading

jonathan-kosgei commented Mar 24, 2017 •

edited

Loading

jonathan-kosgei commented Mar 26, 2017 •

edited

Loading