Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random and small clusterdown #2180

Closed
atlantis3001 opened this issue Dec 2, 2014 · 4 comments
Closed

Random and small clusterdown #2180

atlantis3001 opened this issue Dec 2, 2014 · 4 comments

Comments

@atlantis3001
Copy link

Hello,
I use redis cluster and I have some errors

# /usr/local/bin/redis-server -v
Redis server v=2.9.101 sha=00000000:0 malloc=jemalloc-3.6.0 bits=64 build=36dadd96225ec7c3
4107:M 02 Dec 07:32:57.062 * 10 changes in 300 seconds. Saving...
4107:M 02 Dec 07:32:57.076 * Background saving started by pid 138527
138527:C 02 Dec 07:32:59.105 * DB saved on disk
138527:C 02 Dec 07:32:59.113 * RDB: 0 MB of memory used by copy-on-write
4107:M 02 Dec 07:32:59.181 * Background saving terminated with success
4107:M 02 Dec 07:36:39.716 # Cluster state changed: fail
4107:M 02 Dec 07:36:40.216 # Cluster state changed: ok
4107:M 02 Dec 07:38:00.033 * 10 changes in 300 seconds. Saving...
4107:M 02 Dec 07:38:00.047 * Background saving started by pid 138583
138583:C 02 Dec 07:38:01.941 * DB saved on disk
138583:C 02 Dec 07:38:01.950 * RDB: 0 MB of memory used by copy-on-write
4107:M 02 Dec 07:38:02.051 * Background saving terminated with success
4107:M 02 Dec 07:43:03.015 * 10 changes in 300 seconds. Saving...
4107:M 02 Dec 07:43:03.028 * Background saving started by pid 138640
138640:C 02 Dec 07:43:04.919 * DB saved on disk
138640:C 02 Dec 07:43:04.928 * RDB: 0 MB of memory used by copy-on-write
4107:M 02 Dec 07:43:05.033 * Background saving terminated with success
4107:M 02 Dec 07:46:14.373 # Cluster state changed: fail
4107:M 02 Dec 07:46:14.873 # Cluster state changed: ok
4107:M 02 Dec 07:47:06.487 # Cluster state changed: fail
4107:M 02 Dec 07:47:07.087 # Cluster state changed: ok
4107:M 02 Dec 07:48:06.100 * 10 changes in 300 seconds. Saving...
4107:M 02 Dec 07:48:06.115 * Background saving started by pid 138698
138698:C 02 Dec 07:48:08.165 * DB saved on disk
138698:C 02 Dec 07:48:08.173 * RDB: 0 MB of memory used by copy-on-write
4107:M 02 Dec 07:48:08.219 * Background saving terminated with success

This is my topology

# /usr/local/bin/redis-cli -c -p 7001 cluster nodes
2fd06a10e539be049759e65dbe8cfa13b62bb11a 10.1.0.141:7103 slave c357db3020d8b8f753a043f6346578629f4975e3 0 1417517259801 31 connected
c13dbfb0f0e5541cf40ae23419c56d0113465c91 10.1.0.141:7101 slave 8fbf69a69c9b3644c7ef08252eb96fbeb3d41e0f 0 1417517259801 29 connected
0fcd1a7d909496e58114ba065d2fabf5fa814a37 10.1.0.142:7102 slave 96a4f973c9c173f1f9305ea33e14b3deed80c0d3 0 1417517259802 37 connected
9f8aeaad2f99e942208e65fb6159f568910d18b0 10.1.0.143:7102 slave caa475624dcc60b385021ae642c5a84914ad9b75 0 1417517259802 42 connected
40f158b0d545521356e06fde379b90a1d71bc260 10.1.0.143:7002 master - 0 1417517259802 30 connected 10923-12742
968043a75545109e5cd4afe0c3fc0712bddecf34 10.1.0.141:7003 master - 0 1417517259801 9 connected 14563-16383
e21946bee1839fefdec6111c511a131c9e1760b8 10.1.0.143:7103 slave c993961f8acbdcbd1adf478299143c84533ea331 0 1417517259802 43 connected
a32a645aa6fc253ba8be2c78cddc70abc4cc7a05 10.1.0.142:7001 master - 0 1417517259802 41 connected 0-1819
c993961f8acbdcbd1adf478299143c84533ea331 10.1.0.142:7003 master - 0 1417517259802 43 connected 5461-7280
caa475624dcc60b385021ae642c5a84914ad9b75 10.1.0.142:7002 master - 0 1417517259802 42 connected 1820-3639
5ce7d47fd8114f90d88e7644211896e88530cdc4 10.1.0.141:7102 slave 40f158b0d545521356e06fde379b90a1d71bc260 0 1417517259802 30 connected
96a4f973c9c173f1f9305ea33e14b3deed80c0d3 10.1.0.141:7002 master - 0 1417517259802 37 connected 9101-10922
8fbf69a69c9b3644c7ef08252eb96fbeb3d41e0f 10.1.0.143:7001 master - 0 1417517259802 29 connected 7281-9100
c357db3020d8b8f753a043f6346578629f4975e3 10.1.0.143:7003 master - 0 1417517259802 31 connected 12743-14562
c3b9a2c34f48a2d56bb0e29ce415a2b085d1a310 10.1.0.142:7101 slave 8e5bcd18054d1211c7b71221506892daccd22055 0 1417517259802 25 connected
89dabed32190ab456cb669e7cede900a1bcc74ac 10.1.0.143:7101 slave a32a645aa6fc253ba8be2c78cddc70abc4cc7a05 0 1417517259802 41 connected
8e5bcd18054d1211c7b71221506892daccd22055 10.1.0.141:7001 myself,master - 0 0 25 connected 3640-5460
ba98065c4a7854ae078ca194fd44ee494a363c35 10.1.0.142:7103 slave 968043a75545109e5cd4afe0c3fc0712bddecf34 0 1417517259802 18 connected

Do you have any idea where this error come from ?

Regards

@mattsta
Copy link
Contributor

mattsta commented Dec 3, 2014

Given your log with only this:

4107:M 02 Dec 07:36:39.716 # Cluster state changed: fail
4107:M 02 Dec 07:36:40.216 # Cluster state changed: ok

There's no way to tell what actually happened. Connections between servers going down? Each of your transitions between fail and ok last less than 1 second each.

For more details, you'd need to increase the log level.

@antirez
Copy link
Contributor

antirez commented Dec 19, 2014

Hello, this looks like a node-timeout configuration which is too short for the latency of the instances/network. Please could you provide us with CONFIG GET cluster* output? Thanks.

@atlantis3001
Copy link
Author

Hello antirez,
I'm realy sorry but I forgot to close this topic.
I've found the problem and you are right, first I use some virtuals server to create and test the cluster with very small time out and when I put it in reals servers I forgot to increase the time out.
Thank you for your help and your work.

@antirez
Copy link
Contributor

antirez commented Dec 19, 2014

Thanks for replying! Have a nice day.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants