Crash in actx_get_by_node() #113

SS-TruMinds · 2022-11-29T08:10:43Z

Hello,

We are using hiredis-cluster Release 0.8.1, and this issue is with regards to a crash we see in our performance testing.

We use async hiredis APIs, and invoke them both from the application and also from hiredis callbacks.
Our application is multi-threaded, and hiredis context object & hiredis API calls are mutex protected.

We noticed the following crash a few hours into our test:

Crash reason: SIGSEGV /0x00000080
Crash address: 0x0
Process uptime: not available

Thread 13 (crashed)
0 libhiredis_cluster.so.0.8!actx_get_by_node [hircluster.c : 3630 + 0x0]
1 libhiredis_cluster.so.0.8!redisClusterAsyncFormattedCommand [hircluster.c : 4117 + 0x8]
2 libhiredis_cluster.so.0.8!redisClustervAsyncCommand [hircluster.c : 4244 + 0x16]
3 libhiredis_cluster.so.0.8!redisClusterAsyncCommand [hircluster.c : 4258 + 0x5]

Kindly help resolve this crash.

Thank you.

The text was updated successfully, but these errors were encountered:

bjosv · 2022-11-29T10:00:48Z

Interesting, do you have any ideas what events that lead up to this?
Any timeouts, topology changes or other logs that can give a lead?

SS-TruMinds · 2022-11-29T15:09:28Z

Unfortunately no.
We have been having issues with adapter timeouts on the client side & also server failures, but exact trigger for this crash is not known.

rahulKrishnaM · 2022-12-14T09:42:08Z

Hi @bjosv Have some updates from the core analysis. Posting it here. (All data picked from either frame 0 or 1.)

It looks like both table and slots pointers are holding corrupted data at the time of crash. From the hiredis code, the node calculation is done as table[slot_number]. slot_number as seen from the core file is 8535.

So,

(gdb) p acc,cc.table[8535]
$223 = (cluster_node *) 0x1c00550

which is same as the corrupted cluster_node pointer seen at the time of crash. So, the corrupted node is pulled from the corrupted table data.

However, the nodes dictionary object (of type struct dictEntry) seems to be holding valid data.

(gdb) p *acc.cc.nodes
$54 = {table = 0x1c53ed0, type = 0x7fe3b53c53a0 , size = 8, sizemask = 7, used = 7, privdata = 0x0}

(gdb) ptype acc->cc->nodes->table
type = struct dictEntry {
void *key;
void *val;
struct dictEntry *next;
} **

(gdb) p (char *)cc.nodes.table[0].key
$231 = 0x1bce9b1 "185.185.185.185:1111@16379"

(gdb) p *acc.cc.nodes.table[0].next
$43 = {key = 0x1c16551, val = 0x1fd79b0, next = 0x0}
(gdb) p (char *)cc.nodes.table[0].next.key
$232 = 0x1c16551 "240.240.240.240:1111@16379"

(gdb) p (char *)cc.nodes.table[1].key
$225 = 0x1cecee1 "213.213.213.213:1111@16379"

(gdb) p cc.nodes.table[2]
$12 = (dictEntry *) 0x0

(gdb) p (char *)cc.nodes.table[3].key
$226 = 0x1d183b1 "17.17.17.17:1111@16379"

(gdb) p cc.nodes.table[4]
$13 = (dictEntry *) 0x0

(gdb) p cc.nodes.table[5]
$14 = (dictEntry *) 0x0

(gdb) p cc.nodes.table[6]
$15 = (dictEntry *) 0x0

(gdb) p (char *)cc.nodes.table[7].key
$227 = 0x1c62ff1 "123.123.123.123:1111@16379"

(gdb) p (char *)cc.nodes.table[7].next.key
$228 = 0x1f413c1 "247.247.247.247:1111@16379"

(gdb) p acc.cc.nodes.table[7].next.next
$50 = (struct dictEntry *) 0x20da370
(gdb) p *acc.cc.nodes.table[7].next.next
$51 = {key = 0x1d06911, val = 0x1c552b0, next = 0x0}
(gdb) p (char *)cc.nodes.table[7].next.next.key
$229 = 0x1d06911 "26.26.26.26:1111@16379"

So, there are total 7 nodes available at the time of crash as indicated by the nodes dictionary.

All 7 cluster_node (valid) pointers are decoded below.

(gdb) p (cluster_node *)acc.cc.nodes.table[0].val
$122 = (cluster_node *) 0x1f9d540
(gdb) p (cluster_node *)acc.cc.nodes.table[0].next.val
$123 = (cluster_node *) 0x1fd79b0
(gdb) p (cluster_node *)acc.cc.nodes.table[1].val
$124 = (cluster_node *) 0x1c55740
(gdb) p (cluster_node *)acc.cc.nodes.table[3].val
$125 = (cluster_node *) 0x1f73cc0
(gdb) p (cluster_node *)acc.cc.nodes.table[7].val
$126 = (cluster_node *) 0x207a1d0
(gdb) p (cluster_node *)acc.cc.nodes.table[7].next.val
$127 = (cluster_node *) 0x1c162b0
(gdb) p (cluster_node *)acc.cc.nodes.table[7].next.next.val
$128 = (cluster_node *) 0x1c552b0
(gdb) p *$122
$129 = {name = 0x1f37a03 "5d51fc9a44b9c4dc166aa607aa3ab96a79c88c57", addr = 0x1bb9b81 "185.185.185.185:1111@16379", host = 0x1beacb1 "185.185.185.185",
port = 1111, role = 1 '\001', pad = 0 '\000', failure_count = 0, con = 0x0, acon = 0x0, slots = 0x1c41ba0, slaves = 0x0, migrating = 0x0,
importing = 0x0}
(gdb) p *$123
$130 = {name = 0x1c82833 "b4906a5eb2a9c8e8398c5b87202d2d8b360ec1f5", addr = 0x1f233d1 "240.240.240.240:1111@16379", host = 0x1dcdfb1 "240.240.240.240",
port = 1111, role = 1 '\001', pad = 0 '\000', failure_count = 0, con = 0x0, acon = 0x0, slots = 0x1c55e10, slaves = 0x0, migrating = 0x0,
importing = 0x0}
(gdb) p *$124
$131 = {name = 0x1bb8253 "dc74c7bab5209309369cbd2109bba446863d9231", addr = 0x1bbc1d1 "213.213.213.213:1111@16379", host = 0x1bbc071 "213.213.213.213",
port = 1111, role = 1 '\001', pad = 0 '\000', failure_count = 0, con = 0x0, acon = 0x0, slots = 0x22089e0, slaves = 0x0, migrating = 0x0,
importing = 0x0}
(gdb) p *$125
$132 = {name = 0x2510b13 "736b141a059a95839f02fa75af6449faf3f9e295", addr = 0x1d369c1 "17.17.17.17:1111@16379", host = 0x1c4edb1 "17.17.17.17",
port = 1111, role = 1 '\001', pad = 0 '\000', failure_count = 0, con = 0x0, acon = 0x0, slots = 0x1e98030, slaves = 0x0, migrating = 0x0,
importing = 0x0}
(gdb) p *$126
$133 = {name = 0x2af1f03 "21b0aaa3abe526da3adf201c9138a3cef1bdd824", addr = 0x1c01151 "123.123.123.123:1111@16379", host = 0x21892e1 "123.123.123.123",
port = 1111, role = 1 '\001', pad = 0 '\000', failure_count = 0, con = 0x0, acon = 0x0, slots = 0x1e181d0, slaves = 0x0, migrating = 0x0,
importing = 0x0}
(gdb) p *$127
$134 = {name = 0x1de2443 "19145935ddaa3e4a9d1f984876ac114634fb0622", addr = 0x2047f51 "247.247.247.247:1111@16379", host = 0x1c15311 "247.247.247.247",
port = 1111, role = 1 '\001', pad = 0 '\000', failure_count = 0, con = 0x0, acon = 0x7fe3340164a0, slots = 0x21ab1f0, slaves = 0x0, migrating = 0x0,
importing = 0x0}
(gdb) p *$128
$135 = {name = 0x1c46013 "64c9ac4a8aa04e3033f4abba019f3ffe3964cae2", addr = 0x1c1b6e1 "26.26.26.26:1111@16379", host = 0x1bb83b1 "26.26.26.26",
port = 1111, role = 1 '\001', pad = 0 '\000', failure_count = 0, con = 0x0, acon = 0x7fe3900045b0, slots = 0x260a260, slaves = 0x0, migrating = 0x0,
importing = 0x0}
(gdb)

Each cluster_node holds information of the slots they serve. Below information captures slots information of each of the above 7 nodes.

(gdb) p $122.slots
$211 = (struct hilist *) 0x1c41ba0

(gdb) ptype $122.slots
type = struct hilist {
listNode *head;
listNode *tail;
void *(*dup)(void *);
void (*free)(void *);
int (*match)(void *, void *);
unsigned long len;
} *
(gdb)

(gdb) p *$122.slots
$136 = {head = 0x1dee720, tail = 0x1dee720, dup = 0x0, free = 0x7fe3b53b75e0 , match = 0x0, len = 1}
(gdb) p *(cluster_slot *)$122.slots.head.value
$140 = {start = 5462, end = 8192, node = 0x1f9d540}
(gdb) p *$123.slots
$142 = {head = 0x1c47810, tail = 0x1c47810, dup = 0x0, free = 0x7fe3b53b75e0 , match = 0x0, len = 1}
(gdb) p *(cluster_slot *)$123.slots.head.value
$143 = {start = 13655, end = 16383, node = 0x1fd79b0}
(gdb) p *$124.slots
$144 = {head = 0x1d77af0, tail = 0x1d77af0, dup = 0x0, free = 0x7fe3b53b75e0 , match = 0x0, len = 1}
(gdb) p *(cluster_slot *)$124.slots.head.value
$145 = {start = 2731, end = 5461, node = 0x1c55740}
(gdb) p *$125.slots ======> ALL empty. So this slots object is not holding any shards
$146 = {head = 0x0, tail = 0x0, dup = 0x0, free = 0x7fe3b53b75e0 , match = 0x0, len = 0}
(gdb) p *$126.slots
$147 = {head = 0x1d35290, tail = 0x1d35290, dup = 0x0, free = 0x7fe3b53b75e0 , match = 0x0, len = 1}
(gdb) p *(cluster_slot *)$126.slots.head.value
$148 = {start = 10924, end = 13654, node = 0x207a1d0}
(gdb) p *$127.slots
$149 = {head = 0x2149970, tail = 0x2149970, dup = 0x0, free = 0x7fe3b53b75e0 , match = 0x0, len = 1}
(gdb) p *(cluster_slot *)$127.slots.head.value
$150 = {start = 8193, end = 10923, node = 0x1c162b0}
(gdb) p *$128.slots
$151 = {head = 0x2c6b6c0, tail = 0x2c6b6c0, dup = 0x0, free = 0x7fe3b53b75e0 , match = 0x0, len = 1}
(gdb) p *(cluster_slot *)$128.slots.head.value
$152 = {start = 0, end = 2730, node = 0x1c552b0}
(gdb)

slots to cluster_node mapping from above data ==>

0-2730 node = 0x1c552b0
8193-10923 node = 0x1c162b0
10924-13654 node = 0x207a1d0
2731-5461 node = 0x1c55740
13655-16383 node = 0x1fd79b0
5462-8192 node = 0x1f9d540

So, it's indicating 6 valid slot ranges. So, there is 1 extra node ($125 from gdb), which is not having any slots distributed to it.

One more interesting observation is that contiguous slots of the corrupted table object seem to be holding same cluster_node address (even though it is holding invalid data) as how it should be when range of slots are assigned to multiple cluster_nodes.

(gdb) x/16384gx cc.table
0x2dd68d0: 0x0000000001bb80f0 0x0000000001bb80f0
0x2dd68e0: 0x0000000001bb80f0 0x0000000001bb80f0
0x2dd68f0: 0x0000000001bb80f0 0x0000000001bb80f0
0x2dd6900: 0x0000000001bb80f0 0x0000000001bb80f0
.
.
0x2df6890: 0x0000000001bcd120 0x0000000001bcd120
0x2df68a0: 0x0000000001bcd120 0x0000000001bcd120
0x2df68b0: 0x0000000001bcd120 0x0000000001bcd120
0x2df68c0: 0x0000000001bcd120 0x0000000001bcd120
(gdb)

6 unique cluster_node addresses came up (added below) for the entire 16384 slots, indicating there were 6 nodes available at some point of time (which is 7 at the time of crash).

0x0000000001bb80f0
0x0000000001e82c60
0x0000000001c49130
0x0000000001c00550 --> cluster_node pointer which crashed.
0x0000000001beb2b0
0x0000000001bcd120

Not really sure what path it took for the corruption to happen. Please share any thoughts you have on the same.

bjosv · 2022-12-14T11:13:58Z

Just as a reference, what cluster_slots do you have in cc->slots?
Does it match the ones found in $140 to $152?

bjosv · 2022-12-14T11:25:58Z

An also what called redisClusterAsyncCommand in (ie. frame 4+):

Thread 13 (crashed)
0 libhiredis_cluster.so.0.8!actx_get_by_node [hircluster.c : 3630 + 0x0]
1 libhiredis_cluster.so.0.8!redisClusterAsyncFormattedCommand [hircluster.c : 4117 + 0x8]
2 libhiredis_cluster.so.0.8!redisClustervAsyncCommand [hircluster.c : 4244 + 0x16]
3 libhiredis_cluster.so.0.8!redisClusterAsyncCommand [hircluster.c : 4258 + 0x5]

Was it from within a response callback, and in that case, was it a NULL response?

rahulKrishnaM · 2022-12-14T11:26:33Z

@bjosv No, that data looks all corrupted.

(gdb) p *cc
$208 = {err = 0, errstr = '\000' <repeats 127 times>, flags = 0, connect_timeout = 0x1bce800, command_timeout = 0x1bfe520, max_retry_count = 5,
username = 0x0, password = 0x0, nodes = 0x1c49a50, slots = 0x1cc0ca0, route_version = 3, table = 0x2dd68d0, requests = 0x0, retry_count = 0,
need_update_route = 0, update_route_time = 0, ssl = 0x0, ssl_init_fn = 0x0}
(gdb) p cc->slots
$209 = (struct hiarray *) 0x1cc0ca0
(gdb) p *cc.slots
$210 = {nelem = 6, elem = 0x213e400, size = 8, nalloc = 6}

(gdb) p (cluster_slot*)(cc.slots->elem)
$212 = (cluster_slot*) 0x213e400
(gdb) p *$212
$213 = {start = 32733440, end = 0, node = 0x1dfc5b0}
(gdb) p (cluster_slot*)(cc.slots->elem + 8)
$214 = (cluster_slot*) 0x213e408
(gdb) p *(cluster_slot*)(cc.slots->elem + 8)
$215 = {start = 31442352, end = 0, node = 0x1bb83d0}
(gdb) p *(cluster_slot*)(cc.slots->elem + 8*2)
$216 = {start = 29066192, end = 0, node = 0x1bef460}
(gdb) p *(cluster_slot*)(cc.slots->elem + 8*3)
$217 = {start = 29291616, end = 0, node = 0x1d57580}
(gdb) p *(cluster_slot*)(cc.slots->elem + 8*4)
$218 = {start = 30766464, end = 0, node = 0x1bead20}
(gdb) p *(cluster_slot*)(cc.slots->elem + 8*5)
$219 = {start = 29273376, end = 0, node = 0x40}

rahulKrishnaM · 2022-12-14T11:30:07Z

An also what called redisClusterAsyncCommand in (ie. frame 4+):

Thread 13 (crashed)
0 libhiredis_cluster.so.0.8!actx_get_by_node [hircluster.c : 3630 + 0x0]
1 libhiredis_cluster.so.0.8!redisClusterAsyncFormattedCommand [hircluster.c : 4117 + 0x8]
2 libhiredis_cluster.so.0.8!redisClustervAsyncCommand [hircluster.c : 4244 + 0x16]
3 libhiredis_cluster.so.0.8!redisClusterAsyncCommand [hircluster.c : 4258 + 0x5]

Was it from within a response callback, and in that case, was it a NULL response?

No, this was directly called from the application trying to send out a command to hiredis server.

bjosv · 2022-12-15T09:45:04Z

It is stated that you run multi-threaded and the question is in what way?

When a response is sent from a redis node the used eventsystem will notice that data is received on the socket,
which will trigger a call to hiredis redisProcessCallbacks() that parses the reply and call hiredis-clusters callbacks.
In a case where slots have moved, the MOVED response will trigger a fetch of the new topology and update the nodes dict and the slot-to-node table in a hiredis-clusters callback.

If the eventsystem is able to process replies in multiple threads at the same time (likely from different nodes) , there is a chance that the replies can run the callback that updates the dict and table at the same time.
This might result in a inconsistency between the dict and table.
The dict would probably be ok since an update of it is a change of pointer, but the table is built using a dict that suddenly is changed.

rahulKrishnaM · 2022-12-15T16:32:06Z

Hi @bjosv, all access into hiredis is restricted using 1 common mutex. So, any threads trying to call into hiredis would first acquire the lock and then make the hiredis command call. Once the command execution is completed, the lock is released, and another thread can acquire it and invoke its command call.

The eventsystem can only process 1 reply at a time (is always run in a single threaded context). It takes the lock before calling into hiredis for any read/write event which will then be executing the redisProcessCallbacks().

rahulKrishnaM · 2022-12-16T11:02:25Z

We were thinking of possibilities of how the above stack trace could be related to the one in ticket 124.

One theory which we came up in our internal discussion is that:
Image a case where,

we start with "v1" nodes, slots and table
something triggers us to get into cluster_update_by_addr, which gets as far as updating nodes to "v2" before invoking callbacks - slots and table are still "v1"
we go through a path similar to that in the stack in ticket 124, calling back into the application library, which tries to send another command
we're lucky, and the node that we end up targeting has not yet been freed (or at least not yet corrupted), so we don't crash with the stack as seen in 124
however, instead, we end up deciding that we need to do cluster_update_by_addr again - this goes through and updates nodes to "v3", invokes any callbacks on "v2" nodes (of which there are unlikely to be any, because they've only just been created)
the inner cluster_update_by_addr completes and sets slots and table to "v3" - now, all of nodes, slots and table are "v3"
we unwind the stack, and end up back in the first cluster_update_by_addr, which finishes its processing by setting slots and table to "v2" - now, nodes is "v3" and slots and table are "v2"
now we wait until the next event, when we hit the crash in this ticket because table (which is where we do the lookup) is "v2" but nodes is "v3", so all table's pointers are out-of-date.

If the above scenario is a possibility, do you agree that we could end up in a case as above, and that both are linked ?

bjosv · 2023-11-02T11:07:31Z

I believe your scenario was a possibility and likely fixed by #125.
Please re-open a issue if not.

rahulKrishnaM mentioned this issue Dec 16, 2022

Crash while doing dictRelease #124

Closed

bjosv closed this as completed Nov 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crash in actx_get_by_node() #113

Crash in actx_get_by_node() #113

SS-TruMinds commented Nov 29, 2022

bjosv commented Nov 29, 2022

SS-TruMinds commented Nov 29, 2022

rahulKrishnaM commented Dec 14, 2022

bjosv commented Dec 14, 2022

bjosv commented Dec 14, 2022

rahulKrishnaM commented Dec 14, 2022 •

edited

Loading

rahulKrishnaM commented Dec 14, 2022

bjosv commented Dec 15, 2022

rahulKrishnaM commented Dec 15, 2022

rahulKrishnaM commented Dec 16, 2022

bjosv commented Nov 2, 2023

Crash in actx_get_by_node() #113

Crash in actx_get_by_node() #113

Comments

SS-TruMinds commented Nov 29, 2022

bjosv commented Nov 29, 2022

SS-TruMinds commented Nov 29, 2022

rahulKrishnaM commented Dec 14, 2022

bjosv commented Dec 14, 2022

bjosv commented Dec 14, 2022

rahulKrishnaM commented Dec 14, 2022 • edited Loading

rahulKrishnaM commented Dec 14, 2022

bjosv commented Dec 15, 2022

rahulKrishnaM commented Dec 15, 2022

rahulKrishnaM commented Dec 16, 2022

bjosv commented Nov 2, 2023

rahulKrishnaM commented Dec 14, 2022 •

edited

Loading