Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

isisd: fix crash when deactivating ISIS adjacency on the interface. #15993

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

zhou-run
Copy link
Contributor

  1. When the command "no <ip|ipv6> router isis WORD" is executed on the interface, it invokes list_delete_all_node to iterate and release the memory of all nodes in the cirtcuit->u.bc.adjdb[1] linked list. However, the nodes are not unlinked during this traversal process, leading to the call of *list->del to delete the data of the linked list nodes.
  2. For ISIS, deleting the data of the linked list nodes is done by calling isis_delete_adj. Subsequently, isis_level2_adj_up will be called to iterate and query the cirtcuit->u.bc.adjdb[1] linked list. If there are many neighbors on this interface, accessing the memory of the released linked list nodes may occur.
  3. Not limited to ISIS, if the linked list is not unlinked during the deletion of all nodes in process 1, *list->del should not be allowed to iterate through the list again.

Signed-off-by: zhou-run zhou.run@h3c.com

1. When the command "no <ip|ipv6> router isis WORD" is executed on the interface, it invokes list_delete_all_node to iterate and release the memory of all nodes in the cirtcuit->u.bc.adjdb[1] linked list. However, the nodes are not unlinked during this traversal process, leading to the call of *list->del to delete the data of the linked list nodes.

2. For ISIS, deleting the data of the linked list nodes is done by calling isis_delete_adj. Subsequently, isis_level2_adj_up will be called to iterate and query the cirtcuit->u.bc.adjdb[1] linked list. If there are many neighbors on this interface, accessing the memory of the released linked list nodes may occur.

3. Not limited to ISIS, if the linked list is not unlinked during the deletion of all nodes in process 1, *list->del should not be allowed to iterate through the list again.

Signed-off-by: zhou-run <166502045+zhou-run@users.noreply.github.com>
Copy link
Member

@ton31337 ton31337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you show the backtrace?

In my opinion the problem is in isis_circuit_up() when we create a list we don't specify circuit->u.bc.adjdb[0]->del = isis_delete_adj;.

@zhou-run
Copy link
Contributor Author

Can you show the backtrace?

In my opinion the problem is in isis_circuit_up() when we create a list we don't specify circuit->u.bc.adjdb[0]->del = isis_delete_adj;.

The issue lies in isis_circuit_down(), not isis_circuit_up(). The backtrace is as follows:

(gdb) bt
#0  0x00007f7d6e541fe1 in raise () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00007f7d6e63188c in core_handler (signo=11, siginfo=0x7ffc0ced2630, context=<optimized out>) at lib/sigevent.c:262
#2  <signal handler called>
#3  0x00005647f5b11568 in isis_level2_adj_up (area=area@entry=0x5647f7c89830) at isisd/isis_lsp.c:423
#4  0x00005647f5b14073 in isis_reset_attach_bit (adj=0x5647f7cad690) at isisd/isis_lsp.c:474
#5  lsp_handle_adj_state_change (adj=0x5647f7cad690) at isisd/isis_lsp.c:2162
#6  0x00005647f5b53675 in hook_call_isis_adj_state_change_hook (adj=adj@entry=0x5647f7cad690) at isisd/isis_adjacency.c:152
#7  0x00005647f5b536f3 in isis_delete_adj (arg=0x5647f7cad690) at isisd/isis_adjacency.c:167
#8  0x00007f7d6e5fe003 in list_delete_all_node (list=0x5647f7c88060) at lib/linklist.c:316
#9  0x00007f7d6e5fe069 in list_delete (list=list@entry=0x5647f7c84708) at lib/linklist.c:326
#10 0x00005647f5b0872e in isis_circuit_down (circuit=0x5647f7c84620) at isisd/isis_circuit.c:835
#11 0x00005647f5b09f81 in isis_csm_state_change (event=event@entry=IF_DOWN_FROM_Z, circuit=circuit@entry=0x5647f7c84620, arg=arg@entry=0x5647f7c7f7a0)
    at isisd/isis_csm.c:196
#12 0x00005647f5b083b0 in isis_circuit_disable (circuit=0x5647f7c84620) at isisd/isis_circuit.c:100
#13 isis_circuit_del (circuit=0x5647f7c84620) at isisd/isis_circuit.c:200
#14 0x00005647f5b434f5 in lib_interface_isis_destroy (args=<optimized out>) at isisd/isis_nb_config.c:2612
#15 0x00007f7d6e61347a in nb_callback_destroy (errmsg_len=2, errmsg=0x7ffc0ced38d0 "", dnode=0x5647f7c948f0, event=NB_EV_APPLY, nb_node=<optimized out>,
    context=<optimized out>) at lib/northbound.c:1131
#16 nb_callback_configuration (context=<optimized out>, event=event@entry=NB_EV_APPLY, change=change@entry=0x5647f7cb6680, errmsg=errmsg@entry=0x7ffc0ced38d0 "",
    errmsg_len=errmsg_len@entry=8192) at lib/northbound.c:1356
#17 0x00007f7d6e6138b7 in nb_transaction_process (errmsg_len=8192, errmsg=0x7ffc0ced38d0 "", transaction=0x5647f7c94080, event=NB_EV_APPLY)
    at lib/northbound.c:1473
#18 nb_candidate_commit_apply (transaction=0x5647f7c94080, save_transaction=save_transaction@entry=true, transaction_id=transaction_id@entry=0x0,
    errmsg=errmsg@entry=0x7ffc0ced38d0 "", errmsg_len=errmsg_len@entry=8192) at lib/northbound.c:906
#19 0x00007f7d6e61403d in nb_candidate_commit (context=context@entry=0x7ffc0ced38c0, candidate=<optimized out>, save_transaction=save_transaction@entry=true,
    comment=comment@entry=0x0, transaction_id=transaction_id@entry=0x0, errmsg=errmsg@entry=0x7ffc0ced38d0 "", errmsg_len=8192) at lib/northbound.c:938
#20 0x00007f7d6e616ec9 in nb_cli_classic_commit (vty=0x5647f7cae160) at lib/northbound_cli.c:64
#21 0x00007f7d6e6176a8 in nb_cli_apply_changes (vty=0x5647f7cae160, xpath_base_fmt=<optimized out>) at lib/northbound_cli.c:268
#22 0x00007f7d6e5d918e in cmd_execute_command_real (vline=vline@entry=0x5647f7cae140, vty=vty@entry=0x5647f7cae160, cmd=cmd@entry=0x0, up_level=up_level@entry=0,
    filter=FILTER_RELAXED) at lib/command.c:971
#23 0x00007f7d6e5d951d in cmd_execute_command (vline=vline@entry=0x5647f7cae140, vty=vty@entry=0x5647f7cae160, cmd=cmd@entry=0x0, vtysh=vtysh@entry=0)
    at lib/command.c:1030
#24 0x00007f7d6e5d9770 in cmd_execute (vty=vty@entry=0x5647f7cae160, cmd=cmd@entry=0x5647f7cb48a0 "no ip router isis 10", matched=matched@entry=0x0,
    vtysh=vtysh@entry=0) at lib/command.c:1198
#25 0x00007f7d6e6485e6 in vty_command (vty=vty@entry=0x5647f7cae160, buf=0x5647f7cb48a0 "no ip router isis 10") at lib/vty.c:483
#26 0x00007f7d6e648d01 in vty_execute (vty=vty@entry=0x5647f7cae160) at lib/vty.c:1246
#27 0x00007f7d6e64ba40 in vtysh_read (thread=<optimized out>) at lib/vty.c:2090
#28 0x00007f7d6e64348d in thread_call (thread=thread@entry=0x7ffc0ced8310) at lib/thread.c:1958
#29 0x00007f7d6e5fd4a8 in frr_run (master=0x5647f79a43d0) at lib/libfrr.c:1184
#30 0x00005647f5b050f3 in main (argc=5, argv=<optimized out>, envp=<optimized out>) at isisd/isis_main.c:273
(gdb) f 3
#3  0x00005647f5b11568 in isis_level2_adj_up (area=area@entry=0x5647f7c89830) at isisd/isis_lsp.c:423
423     isisd/isis_lsp.c: No such file or directory.
(gdb) p node
$1 = (struct listnode *) 0x110
(gdb) f 8
#8  0x00007f7d6e5fe003 in list_delete_all_node (list=0x5647f7c88060) at lib/linklist.c:316
316     lib/linklist.c: No such file or directory.
(gdb) p list->head->data
$2 = (void *) 0x5647f7cabf20
(gdb) p list->head->next->data
$3 = (void *) 0x5647f7c9bb60
(gdb) p list->head->next->next->data
Cannot access memory at address 0x120
(gdb) p list->head->next->next
$4 = (struct listnode *) 0x110

The backtrace provided above pertains to version 8.2.2, but it seems that the same issue exists in the code of the master branch as well.

@ton31337
Copy link
Member

The issue lies in isis_circuit_down(), not isis_circuit_up().

Yes, but the list needs to have a deletion hook when it's created. So it might be an issue in timing (or not).

@zhou-run
Copy link
Contributor Author

The issue lies in isis_circuit_down(), not isis_circuit_up().

Yes, but the list needs to have a deletion hook when it's created. So it might be an issue in timing (or not).

In addition to the list deletion operation list_delete, if only a specific node in the linked list is to be deleted (as in the implementation of isis_adj_state_change), the function that should be called is listnode_delete or list_delete_node. This does not call *list->del, and the caller is responsible for deleting the data associated with the node. Therefore, it should be fine to create the linked list without adding a deletion hook.

@ton31337
Copy link
Member

Someone clearly understanding the IS-IS code should take a look here. /cc @odd22 @louis-6wind @pguibert6WIND maybe?

@louis-6wind
Copy link
Contributor

I'll take a look at it

Copy link
Contributor

@louis-6wind louis-6wind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

void isis_circuit_down(struct isis_circuit *circuit)
{
[...]
		} else if (circuit->circ_type == CIRCUIT_T_BROADCAST) {
			struct list *adj_list;
			struct listnode *node;
			if (circuit->u.bc.adjdb[0]) {
				adj_list = list_new();
				isis_adj_build_up_list(circuit->u.bc.adjdb[0],
						       adj_list);
				for (ALL_LIST_ELEMENTS_RO(adj_list, node, adj))
					isis_log_adj_change(
						adj, adj->adj_state,
						ISIS_ADJ_DOWN,
						"circuit is being brought down");
				list_delete(&adj_list);
			}
			if (circuit->u.bc.adjdb[1]) {
				adj_list = list_new();
				isis_adj_build_up_list(circuit->u.bc.adjdb[1],
						       adj_list);
				for (ALL_LIST_ELEMENTS_RO(adj_list, node, adj))
					isis_log_adj_change(
						adj, adj->adj_state,
						ISIS_ADJ_DOWN,
						"circuit is being brought down");
				list_delete(&adj_list);
			}

The problem is that we are building a temporary link-list adj_list containing the up adjacencies. The temporary list is deleted and their nodes are freed but still referenced in u.bc.adjdb[X]. This is a heap-after-free issue.

There's no need to use a temporary link-list. The problem should be fixed in all places where isis_adj_build_up_list() is used to build a temporary list.

@@ -331,7 +331,7 @@ void list_delete_all_node(struct list *list)
next = node->next;
if (*list->del)
(*list->del)(node->data);
listnode_free(list, node);
list_delete_node(list, node);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

void list_delete_node(struct list *list, struct listnode *node)
{
	frrtrace(2, frr_libfrr, list_delete_node, list, node);

	if (node->prev)
		node->prev->next = node->next;
	else
		list->head = node->next;
	if (node->next)
		node->next->prev = node->prev;
	else
		list->tail = node->prev;
	list->count--;
	listnode_free(list, node);
}

list_delete_node() updates the link-list and free the node with listnode_free(). This is correct but, since all the nodes are deleted, we do not need to do that. listnode_free() is sufficient

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list_delete_node() updates the link-list and free the node with listnode_free(). This is correct but, since all the nodes are deleted, we do not need to do that. listnode_free() is sufficient

I don't think so. Refer to the backtrace above, (*list->del)()=>...=>isis_level2_adj_up(). If the memory of each node in the list is not unlinked before being released, the operation to delete node data (*list->del)() should not be allowed to continue traversing this linked list, as it would otherwise access released node memory. However, the restriction on allowing (*list->del)() to traverse the list is inherently unreasonable.

bool isis_level2_adj_up(struct isis_area *area)
{
	struct listnode *node, *cnode;
	struct isis_circuit *circuit;
	struct list *adjdb;
	struct isis_adjacency *adj;

	if (area->is_type == IS_LEVEL_1)
		return false;

	for (ALL_LIST_ELEMENTS_RO(area->circuit_list, cnode, circuit)) {
		if (circuit->circ_type == CIRCUIT_T_BROADCAST) {
			adjdb = circuit->u.bc.adjdb[1];
			if (!adjdb || !adjdb->count)
				continue;

			for (ALL_LIST_ELEMENTS_RO(adjdb, node, adj)) {    //here heap-after-free
				if (adj->level != ISIS_ADJ_LEVEL1
				    && adj->adj_state == ISIS_ADJ_UP)
					return true;
			}
		} else if (circuit->circ_type == CIRCUIT_T_P2P
			   && circuit->u.p2p.neighbor) {
			adj = circuit->u.p2p.neighbor;
			if (adj->level != ISIS_ADJ_LEVEL1
			    && adj->adj_state == ISIS_ADJ_UP)
				return true;
		}
	}

	return false;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your fix avoids this crash and could avoid some other potential crashes with link-lists by cleaning up the memory before freeing it. But it masks the actual issue which is the heap-after-free.

You have noticed there is heap-after-free in isis_level2_adj_up(). I agree with you. However, it is not supposed to happen. You are not supposed to access memory zones that could have been allocated to something else. It can lead to unexpected behaviors.

If you avoid using temporary list, you won't have the crash anymore and you will have fixed the root cause.

@zhou-run
Copy link
Contributor Author

The problem is that we are building a temporary link-list adj_list containing the up adjacencies. The temporary list is deleted and their nodes are freed but still referenced in u.bc.adjdb[X]. This is a heap-after-free issue.

There's no need to use a temporary link-list. The problem should be fixed in all places where isis_adj_build_up_list() is used to build a temporary list.

Deleting the temporary link-list adj_list only removes the temporary nodes of the temporary link-list, which are separate from the actual nodes of the real link-list u.bc.adjdb[X] . Even though the data stored in the nodes node->data is the same memory, the temporary link-list adj_list does not have an operation to delete the node->data. Why would deleting the temporary link-list adj_list affect the real link-list u.bc.adjdb[X] ?
On the contrary, the actual operation of deleting node->data is performed when deleting the real link-list u.bc.adjdb[X] because it points to the delete operation isis_delete_adj() . I believe this is where the heap-after-free issue lies.

@louis-6wind
Copy link
Contributor

louis-6wind commented May 17, 2024

Deleting the temporary link-list adj_list only removes the temporary nodes of the temporary link-list, which are separate from the actual nodes of the real link-list u.bc.adjdb[X] . Even though the data stored in the nodes node->data is the same memory, the temporary link-list adj_list does not have an operation to delete the node->data. Why would deleting the temporary link-list adj_list affect the real link-list u.bc.adjdb[X] ?

you are right. I have misinterpreted the backtrace.

I have just loaded frr-8.2.2 tag to interpret the backtrace. I see that the issue is because in list_delete_all_node(), the (*list->del)hook (hook_call_isis_adj_state_change_hook here ) is parsing the link-list that is being deleted via isis_level2_adj_up() -> for (ALL_LIST_ELEMENTS_RO(adjdb, node, adj))

void list_delete_all_node(struct list *list)
{
[...]
	for (node = list->head; node; node = next) {
		next = node->next;
		if (*list->del)
			(*list->del)(node->data);
		listnode_free(list, node);
	}
bool isis_level2_adj_up(struct isis_area *area)
{
[...]
	for (ALL_LIST_ELEMENTS_RO(area->circuit_list, cnode, circuit)) {
		if (circuit->circ_type == CIRCUIT_T_BROADCAST) {
			adjdb = circuit->u.bc.adjdb[1];
[...]
			for (ALL_LIST_ELEMENTS_RO(adjdb, node, adj)) {
				if (adj->level != ISIS_ADJ_LEVEL1
				    && adj->adj_state == ISIS_ADJ_UP)
					return true;

This is not a case of heap-use-after but I am not sure we should allow accessing a list being deleted.

@louis-6wind
Copy link
Contributor

The issue is in lsp_handle_adj_state_change()

static int lsp_handle_adj_state_change(struct isis_adjacency *adj)
{
	lsp_regenerate_schedule(adj->circuit->area, IS_LEVEL_1 | IS_LEVEL_2, 0);

	/* when an adjacency state changes determine if we need to
	 * change attach_bits in other area's LSPs
	 */
	isis_reset_attach_bit(adj);

	return 0;
}

lsp_regenerate_schedule() delays the regeneration of the LSP so that it is done once the u.bc.adjdb[X] adjacency list is fully destroyed. isis_reset_attach_bit() should be called during the LSP generation process, not in lsp_handle_adj_state_change() directly. This call was introduced by #7998. @lynne-volta could you help @zhou-run to rework this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants