Completion of error handling #2

elfring · 2019-02-27T19:35:24Z

Would you like to add more error handling for return values from functions like the following?

pthread_cond_init ⇒ bioInit
pthread_mutex_init ⇒ initServerConfig

merge from redis

JohnSully · 2019-03-03T01:43:43Z

This issue has been fixed. All failures are fatal and treated as such.

Thank you!

elfring · 2019-03-03T08:09:25Z

I suggest to avoid ignorance of return values a bit more.
Would you like to detect every error situation as early as possible?

Now both master and replicas keep track of the last replication offset that contains meaningful data (ignoring the tailing pings), and both trim that tail from the replication backlog, and the offset with which they try to use for psync. the implication is that if someone missed some pings, or even have excessive pings that the promoted replica has, it'll still be able to psync (avoid full sync). the downside (which was already committed) is that replicas running old code may fail to psync, since the promoted replica trims pings form it's backlog. This commit adds a test that reproduces several cases of promotions and demotions with stale and non-stale pings Background: The mearningful offset on the master was added recently to solve a problem were the master is left all alone, injecting PINGs into it's backlog when no one is listening and then gets demoted and tries to replicate from a replica that didn't have any of the PINGs (or at least not the last ones). however, consider this case: master A has two replicas (B and C) replicating directly from it. there's no traffic at all, and also no network issues, just many pings in the tail of the backlog. now B gets promoted, A becomes a replica of B, and C remains a replica of A. when A gets demoted, it trims the pings from its backlog, and successfully replicate from B. however, C is still aware of these PINGs, when it'll disconnect and re-connect to A, it'll ask for something that's not in the backlog anymore (since A trimmed the tail of it's backlog), and be forced to do a full sync (something it didn't have to do before the meaningful offset fix). Besides that, the psync2 test was always failing randomly here and there, it turns out the reason were PINGs. Investigating it shows the following scenario: cycle 1: redis #1 is master, and all the rest are direct replicas of #1 cycle 2: redis #2 is promoted to master, #1 is a replica of #2 and #3 is replica of #1 now we see that when #1 is demoted it prints: 17339:S 21 Apr 2020 11:16:38.523 * Using the meaningful offset 3929963 instead of 3929977 to exclude the final PINGs (14 bytes difference) 17339:S 21 Apr 2020 11:16:39.391 * Trying a partial resynchronization (request e2b3f8817735fdfe5fa4626766daa938b61419e5:3929964). 17339:S 21 Apr 2020 11:16:39.392 * Successful partial resynchronization with master. and when #3 connects to the demoted #2, #2 says: 17339:S 21 Apr 2020 11:16:40.084 * Partial resynchronization not accepted: Requested offset for secondary ID was 3929978, but I can reply up to 3929964 so the issue here is that the meaningful offset feature saved the day for the demoted master (since it needs to sync from a replica that didn't get the last ping), but it didn't help one of the other replicas which did get the last ping.

JohnSully pushed a commit that referenced this issue Mar 2, 2019

Merge pull request #2 from antirez/unstable

4981694

merge from redis

JohnSully added a commit that referenced this issue Mar 3, 2019

Fix issue #2, check posix return values

01a529c

JohnSully closed this as completed Mar 3, 2019

dufrygrant mentioned this issue Aug 10, 2019

Segmentation fault in current version from 10th Aug #74

Closed

jendis mentioned this issue Dec 17, 2019

KeyDB crashes when multi-master enabled #124

Closed

ronen-cedato mentioned this issue Jan 12, 2020

Keydb crash suddenly #132

Closed

jamzed mentioned this issue Feb 1, 2020

Keydb crash right after REPLICA sync is completed (5.3.1) #137

Closed

mcrivar mentioned this issue Mar 5, 2020

Deadlock in replicationFeedMonitors #150

Closed

botzill mentioned this issue Mar 19, 2020

KeyDB 5.3.2 crashed by signal: 11 #155

Closed

smartattack mentioned this issue May 4, 2020

KeyDB dying via SIGABORT #170

Closed

botzill mentioned this issue May 8, 2020

crash after setting maxclients via cmd line #180

Closed

JohnSully mentioned this issue Jul 11, 2020

Issue in multi-master feature. Keydb is dropping local data. #210

Closed

mattaylor mentioned this issue Jan 4, 2021

RedisGears crashing KeyDB after 1-2hrs #276

Closed

jendis mentioned this issue May 24, 2021

Crash after replication #317

Closed

sathish-a mentioned this issue May 13, 2022

KeyDB crashes when active-replication enabled #419

Closed

paulmchen mentioned this issue Aug 29, 2022

[CRASH] command Keys * crashed the server when there are more than 10 GB of cached data #486

Closed

0xgeert mentioned this issue Jan 9, 2023

[CRASH] - 10 minutes in on clean DB with parallel inserts using Watch + Multi/exec #539

Open

msotheeswaran-sc mentioned this issue Feb 9, 2023

couple AOF fixes #560

Merged

zboralski mentioned this issue Mar 8, 2023

[CRASH] Using redisgraph.so #595

Closed

CrazyTennisFan mentioned this issue May 17, 2023

[NEW] Add memory leak checking feature #656

Open

paulmchen mentioned this issue May 29, 2023

[CRASH] server.cpp: '!g_pserver->propagate_in_transaction' is not true assert crash #663

Closed

zboralski mentioned this issue Jun 16, 2023

[BUG] crash on json.get after upgrading to debian 12 stable #676

Open

CrazyTennisFan mentioned this issue Nov 27, 2023

[CRASH] Crashes when enable-async-commands was enabled in the cache use case (NOT flash). #751

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Completion of error handling #2

Completion of error handling #2

elfring commented Feb 27, 2019

JohnSully commented Mar 3, 2019

elfring commented Mar 3, 2019

Completion of error handling #2

Completion of error handling #2

Comments

elfring commented Feb 27, 2019

JohnSully commented Mar 3, 2019

elfring commented Mar 3, 2019