Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bgp filter fun #15466

Merged
merged 3 commits into from
Mar 14, 2024
Merged

Bgp filter fun #15466

merged 3 commits into from
Mar 14, 2024

Conversation

donaldsharp
Copy link
Member

see individual commits

a) Allows unsuppress-maps to be part of the filtering system that would allow outgoing data to a peer.
b) Fix memory leak associated with late decision not to send data to peer in bgp

@donaldsharp
Copy link
Member Author

@Mergifyio backport dev/10.0 stable/9.1 stable/9.0 stable/8.5 stable/8.4

Copy link

mergify bot commented Mar 2, 2024

backport dev/10.0 stable/9.1 stable/9.0 stable/8.5 stable/8.4

✅ Backports have been created

Copy link
Member

@ton31337 ton31337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

bgpd/bgp_updgrp_adv.c Outdated Show resolved Hide resolved
Copy link
Member

@riw777 riw777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this part of the code be executed when withdraw previously announced default route to the neighbor (I think no matter if it comes from other peer or from default-originate. Maybe the comment in the "if(withdraw)" scope are not correct) ?
I dont clearly understand the logic in subgroup_default_originate() , but maybe is wrong to work with "attr" structure here. "attr" is a something new and blank init and populated above every time when you enter into the function, also it do not carry previously sets attr (maybe ?)

814 	struct attr attr;
815 	struct attr *new_attr = &attr;

Also seems "attr" will be change from default-originate route map if there are such, which will affect and re-announced default route learned from other peer (if there are such).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The attribute is initially set to a default value then the route-map( if any ) will modify it, thus it is needed. The attribute is used as a way of knowing what to send to a peer so it is needed even in the withdraw case.. That is the purpose of the attr. The changes I made did uncover an additional bug where the aspath in subgroup_default_originate was attempting to free the wrong aspath, in some cases. This was masked by the excessive intern of the attr data structure in this call path. In any event, with the addition of change 1872695559d438. I am happier with the results now.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your explanation Donald ! In all cases the patch you are offer is good, should be applied and seems will fix #15459 (most likely they had 0.0.0.0/0 ::/0 among their 6400 testing routes). When backport it to 9.1 and 8.5 I will redo all my debugs and tests against #14828 , but im nearly sure this patch will not solve it. Maybe I am wrong but imagine this situation:

Router1 -- Announce 0.0.0.0/0 --> Router2 -- Announce 0.0.0.0/0 (any from 2) --> Router3
                                  Also generate 0.0.0.0/0 with route-map

If R1 withdraw 0.0.0.0.0/0 I believe subgroup_default_originate() will be called with withdraw flag. Because Router2 have default-originate route-map XXX , attr will be changed (will enter in the if() scope take a look above your change in the code), and withdraw will flush/unintern completely other attributes (pointers in the attr structure) not these was previously set (maybe for self generated default route? not for re-announced, default route). subgroup_default_originate() is called on a fib-out ? (changes on the attr are made after the changes from neighbor output route-map ? ). Since a week testing fix (not sure it is proper, but no massive leak as before) for #14828, (the very first version of your patch you post is applied also), and it change also code in subgroup_default_originate() (before your change) maybe is best to take a look, could need to change something else too, or at least will tell me If im wrong.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow the chain of events that would cause a problem. Can you recreate this in a topotest? or more explicitly lay out the steps of your scenario?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea how or where to write o topotest (I am not familiar with all of that). Will try to explain more exactly (sorry it is so long) with code...
Here is the all subgroup_default_originate() function taken from stable/9.1/pr-15466 (with your changes), I will put comments in the code to can better explain what I mean...

void subgroup_default_originate(struct update_subgroup *subgrp, int withdraw)
{
	struct bgp *bgp;
	struct attr attr = { 0 };
	struct attr *new_attr = &attr;    //Ivaylo  new_attr = attr,  all changes to new_attr  will affect to attr and vice versa
	struct aspath *aspath;
	struct prefix p;
	struct peer *from;
	struct bgp_dest *dest;
	struct bgp_path_info *pi;
	struct peer *peer;
	struct bgp_adj_out *adj;
	route_map_result_t ret = RMAP_DENYMATCH;
	route_map_result_t new_ret = RMAP_DENYMATCH;
	afi_t afi;
	safi_t safi;
	safi_t safi_rib;
	int pref = 65536;
	int new_pref = 0;

	if (!subgrp)
		return;

	peer = SUBGRP_PEER(subgrp);
	afi = SUBGRP_AFI(subgrp);
	safi = SUBGRP_SAFI(subgrp);

	if (!(afi == AFI_IP || afi == AFI_IP6))
		return;

	if (safi == SAFI_LABELED_UNICAST)
		safi_rib = SAFI_UNICAST;
	else
		safi_rib = safi;

	bgp = peer->bgp;
	from = bgp->peer_self;

	bgp_attr_default_set(&attr, bgp, BGP_ORIGIN_IGP);

	/* make coverity happy */
	assert(attr.aspath);

	aspath = attr.aspath;
	attr.med = 0;
	attr.flag |= ATTR_FLAG_BIT(BGP_ATTR_MULTI_EXIT_DISC);

	if ((afi == AFI_IP6) || peer_cap_enhe(peer, afi, safi)) {
		/* IPv6 global nexthop must be included. */
		attr.mp_nexthop_len = BGP_ATTR_NHLEN_IPV6_GLOBAL;

		/* If the peer is on shared nextwork and we have link-local
		   nexthop set it. */
		if (peer->shared_network
		    && !IN6_IS_ADDR_UNSPECIFIED(&peer->nexthop.v6_local))
			attr.mp_nexthop_len = BGP_ATTR_NHLEN_IPV6_GLOBAL_AND_LL;
	}

// Ivaylo: In my debugs to catch #14828  (testing with 9.1 current stable) I put a simple:
//zlog_err("We are here now");
// Ivaylo: And I see in the logs this message on every announce/withdraw/update message we receive 
// and/or announce no matter if it is default route (self generated and/or received) 
// or some other prefix. (It is another question, but do we really need to do this?) Obviously
// if the .name ( char[] ) is not 0x00  (the default-originate route-map string name)
// we will enter in the if scope.

	if (peer->default_rmap[afi][safi].name) {
		struct bgp_path_info tmp_pi = {0};

		tmp_pi.peer = bgp->peer_self;

		SET_FLAG(bgp->peer_self->rmap_type, PEER_RMAP_TYPE_DEFAULT);

		/* Iterate over the RIB to see if we can announce
		 * the default route. We announce the default
		 * route only if route-map has a match.
		 */
		for (dest = bgp_table_top(bgp->rib[afi][safi_rib]); dest;
		     dest = bgp_route_next(dest)) {
			if (!bgp_dest_has_bgp_path_info_data(dest))
				continue;

			for (pi = bgp_dest_get_bgp_path_info(dest); pi;
			     pi = pi->next) {
				struct attr tmp_attr = attr;

				tmp_pi.attr = &tmp_attr;

// Ivaylo: route_map_apply_ext(), allocate new memory (on the heap) in tmp_pi.attr community/lcom/ecom  +
// sets their values if they are set in the route-map associated with default-originate clause in the 
// config. Also if the comm/lcomm/ecomm strings are changed in the RM, we will get a memory leak, because
// we will lost their pointers to call free(), nor we can free them here in this stage yet.
				new_ret = route_map_apply_ext(
					peer->default_rmap[afi][safi].map,
					bgp_dest_get_prefix(dest), pi, &tmp_pi,
					&new_pref);

				if (new_ret == RMAP_PERMITMATCH) {
// Ivaylo: I have no idea what for are new_pref and pref maybe some kind of preference or ? But if we don not 
// enter in this if() we never will free tmp_pi.attr comm/lcomm/ecomm and this cause #14828 
					if (new_pref < pref) {
						pref = new_pref;

// IVAYLO:  Here we flush new_attr ( the &attr !) So all what have been set before to it gone (eventually the old attr
// data is still here if the route_map_apply_ext() (not changed)/copied it from the tmp_pi.attr  ), and intern new
// data in the attr structure based on tmp_pi.attr data. From this point to the end you will work with this &attr,
// Donald.

						bgp_attr_flush(new_attr);
						new_attr = bgp_attr_intern(
							tmp_pi.attr);
						bgp_attr_flush(tmp_pi.attr);
					}
					subgroup_announce_reset_nhop(
						(peer_cap_enhe(peer, afi, safi)
							 ? AF_INET6
							 : AF_INET),
						new_attr);
					ret = new_ret;
				} else
					bgp_attr_flush(&tmp_attr);
//Ivaylo: We need to call flush on &tmp_attr = tmp_pi.attr no matter by any condition in the loop, because
// on the next iteration we will eventually lost new assigned memory (because of route_map_apply_ext() )
// , and will have huge leak. I am not sure if bgp_attr_intern() allocate new heap memory and copy data in it,
// or just copy the pointers (in attr). If second we, have not free() (call flush), because will do a mess.
			}
		}
		bgp->peer_self->rmap_type = 0;

		if (ret == RMAP_DENYMATCH) {
			/*
			 * If its a implicit withdraw due to routemap
			 * deny operation need to set the flag back.
			 * This is a convertion of update flow to
			 * withdraw flow.
			 */
			if (!withdraw &&
			    (!CHECK_FLAG(subgrp->sflags,
					 SUBGRP_STATUS_DEFAULT_ORIGINATE)))
				SET_FLAG(subgrp->sflags,
					 SUBGRP_STATUS_DEFAULT_ORIGINATE);
			withdraw = 1;
		}
	}

	/* Check if the default route is in local BGP RIB which is
	 * installed through redistribute or network command
	 */
	memset(&p, 0, sizeof(p));
	p.family = afi2family(afi);
	p.prefixlen = 0;
	dest = bgp_safi_node_lookup(bgp->rib[afi][safi_rib], safi_rib, &p,
				    NULL);

// Ivaylo: from this point down is your changes, Donald. Because the declarations on the top (attr = new_attr),
// and my comments up, I hope explained better why there could be cases when you may work with wrong 
// attr data down. (I may be wrong of course, because I am not familiar with all the picture)
// By the way, before your changes I think there was a leak, because attr never been flush in if(withdraw) 
// scope, and maybe can explains the leak on withdraw in #15459 (if among their testing routes was default
// route)

	if (withdraw) {
		/* Withdraw the default route advertised using default
		 * originate
		 */
		if (CHECK_FLAG(subgrp->sflags, SUBGRP_STATUS_DEFAULT_ORIGINATE))
			subgroup_default_withdraw_packet(subgrp);
		UNSET_FLAG(subgrp->sflags, SUBGRP_STATUS_DEFAULT_ORIGINATE);

		/* If default route is present in the local RIB, advertise the
		 * route
		 */
		if (dest) {
			for (pi = bgp_dest_get_bgp_path_info(dest); pi;
			     pi = pi->next) {
				if (!CHECK_FLAG(pi->flags, BGP_PATH_SELECTED))
					continue;

				if (subgroup_announce_check(dest, pi, subgrp,
							    bgp_dest_get_prefix(
								    dest),
							    &attr, NULL)) {
					if (!bgp_adj_out_set_subgroup(dest,
								      subgrp,
								      &attr, pi))
						bgp_attr_flush(&attr);
				} else
					bgp_attr_flush(&attr);
			}
			bgp_dest_unlock_node(dest);
		}
	} else {
		if (!CHECK_FLAG(subgrp->sflags,
				SUBGRP_STATUS_DEFAULT_ORIGINATE)) {

			/* The 'neighbor x.x.x.x default-originate' default will
			 * act as an
			 * implicit withdraw for any previous UPDATEs sent for
			 * 0.0.0.0/0 so
			 * clear adj_out for the 0.0.0.0/0 prefix in the BGP
			 * table.
			 */
			if (dest) {
				/* Remove the adjacency for the previously
				 * advertised default route
				 */
				adj = adj_lookup(
				       dest, subgrp,
				       BGP_ADDPATH_TX_ID_FOR_DEFAULT_ORIGINATE);
				if (adj != NULL) {
					/* Clean up previous advertisement.  */
					if (adj->adv)
						bgp_advertise_clean_subgroup(
							subgrp, adj);

					/* Free allocated information.  */
					adj_free(adj);
				}
				bgp_dest_unlock_node(dest);
			}

			/* Advertise the default route */
			if (bgp_in_graceful_shutdown(bgp))
				bgp_attr_add_gshut_community(new_attr);

			SET_FLAG(subgrp->sflags,
				 SUBGRP_STATUS_DEFAULT_ORIGINATE);
			subgroup_default_update_packet(subgrp, new_attr, from);
		}
	}

	aspath_unintern(&aspath);
}

At all I think this function is mess and its usage is mess, but more I wrote in the #14828 . Unfortunately I dont have the time and I need to learn and know much more for the all work flow to can rewrite it as I think it should be. Maybe during you write your patch, you have much better idea than me which/how work, and will greatly appreciate if you can help me fix #14828 , without screw up something else.

@ton31337
Copy link
Member

ton31337 commented Mar 8, 2024

Can we squash "fixup" commit?

StormLiangMS pushed a commit to sonic-net/sonic-buildimage that referenced this pull request Mar 8, 2024
…some-cases (#18246)

Why I did it
FRR issue
FRRouting/frr#15459
Add patch for FRR PR for 202305
FRRouting/frr#15466

Work item tracking
Microsoft ADO (number only):
26876083
How I did it
Add patch for FRR PR
FRRouting/frr#15466

How to verify it
Run the original case locally
@ton31337
Copy link
Member

Seems valid failures yet.

Currently in subgroup_default_originate the attr.aspath
is set in bgp_attr_default_set, which hashs the aspath
and creates a refcount for it.  If this is a withdraw
the subgroup_announce_check and bgp_adj_out_set_subgroup
is called which will intern the attribute.  This will
cause the the attr.aspath to be set to a new value
finally at the bottom of the function it intentionally
uninterns the aspath which is not the one that was
created for this function.  This reduces the other
aspath's refcount by 1 and if a clear bgp * is issued
fast enough the aspath for that will be removed
and the system will crash.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
If unsuppress-map is setup for outgoing peers, consider that
policy is being applied as for RFC 8212.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Customer has this valgrind trace:

Direct leak of 2829120 byte(s) in 70728 object(s) allocated from:
  0 in community_new ../bgpd/bgp_community.c:39
  1 in community_uniq_sort ../bgpd/bgp_community.c:170
  2 in route_set_community ../bgpd/bgp_routemap.c:2342
  3 in route_map_apply_ext ../lib/routemap.c:2673
  4 in subgroup_announce_check ../bgpd/bgp_route.c:2367
  5 in subgroup_process_announce_selected ../bgpd/bgp_route.c:2914
  6 in group_announce_route_walkcb ../bgpd/bgp_updgrp_adv.c:199
  7 in hash_walk ../lib/hash.c:285
  8 in update_group_af_walk ../bgpd/bgp_updgrp.c:2061
  9 in group_announce_route ../bgpd/bgp_updgrp_adv.c:1059
 10 in bgp_process_main_one ../bgpd/bgp_route.c:3221
 11 in bgp_process_wq ../bgpd/bgp_route.c:3221
 12 in work_queue_run ../lib/workqueue.c:282

The above leak detected by valgrind was from a screenshot so I copied it
by hand.  Any mistakes in line numbers are purely from my transcription.
Additionally this is against a slightly modified 8.5.1 version of FRR.
Code inspection of 8.5.1 -vs- latest master shows the same problem
exists.  Code should be able to be followed from there to here.

What is happening:

There is a route-map being applied that modifes the outgoing community
to a peer.  This is saved in the attr copy created in
subgroup_process_announce_selected.  This community pointer is not
interned.  So the community->refcount is still 0.  Normally when
a prefix is announced, the attr and the prefix are placed on a
adjency out structure where the attribute is interned.  This will
cause the community to be saved in the community hash list as well.
In a non-normal operation when the decision to send is aborted after
the route-map application, the attribute is just dropped and the
pointer to the community is just dropped too, leading to situations
where the memory is leaked.  The usage of bgp suppress-fib would
would be a case where the community is caused to be leaked.
Additionally the previous commit where an unsuppress-map is used
to modify the outgoing attribute but since unsuppress-map was
not considered part of outgoing policy the attribute would be dropped as
well.  This pointer drop also extends to any dynamically allocated
memory saved by the attribute pointer that was not interned yet as well.

So let's modify the return case where the decision is made to
not send the prefix to the peer to always just flush the attribute
to ensure memory is not leaked.

Fixes: FRRouting#15459
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
@ton31337 ton31337 merged commit d1d79f4 into FRRouting:master Mar 14, 2024
9 checks passed
ton31337 added a commit that referenced this pull request Mar 14, 2024
ton31337 added a commit that referenced this pull request Mar 15, 2024
StormLiangMS pushed a commit to sonic-net/sonic-buildimage that referenced this pull request Apr 15, 2024
Why I did it
Porting fix for FRRouting/frr#15459

Adding patch for PR FRRouting/frr#15466

Work item tracking
Microsoft ADO (number only):
How I did it
Ported the fix FRRouting/frr#15466 to 8.5.1

How to verify it
Running the test_stress_route and ensure no memory leak
StormLiangMS pushed a commit to sonic-net/sonic-buildimage that referenced this pull request Apr 18, 2024
Porting PR #18272 to 202311.

Why I did it
Porting fix for FRRouting/frr#15459

Adding patch for PR FRRouting/frr#15466

Work item tracking
Microsoft ADO (number only):
How I did it
Ported the fix FRRouting/frr#15466 to 8.5.1

How to verify it
Running the test_stress_route and ensure no memory leak
StormLiangMS pushed a commit to sonic-net/sonic-buildimage that referenced this pull request Apr 29, 2024
Why I did it
Upgrading FRR 8.5.4 to include latest fixes.

Work item tracking
Microsoft ADO (number only):
How I did it
New patches that were added:

Patch	FRR Pull request	Issue fixed
0024-lib-use-snmp-s-large-fd-sets-for-agentx.patch	FRRouting/frr#13396	FRRouting/frr#14143
0025-bgp-community-memory-leak-fix.patch	FRRouting/frr#15466	FRRouting/frr#15459
0026-bgp-fib-suppress-announce-fix.patch	FRRouting/frr#15634	FRRouting/frr#15626
0027-lib-Do-not-convert-EVPN-prefixes-into-IPv4-IPv6-if-n.patch	FRRouting/frr#15418	FRRouting/frr#14419
Removed patches:

Patch	Upstream FRR commit that is present in 8.5.4
0019-zebra-Abstract-dplane_ctx_route_init-to-init-route-w.patch	FRRouting/frr@3f01977
0020-zebra-Fix-crash-when-dplane_fpm_nl-fails-to-process-.patch	FRRouting/frr@fe5f624
0022-bgpd-Don-t-read-the-first-byte-of-ORF-header-if-we-a.patch	FRRouting/frr@3515178
0023-bgpd-Make-sure-we-have-enough-data-to-read-two-bytes.patch	FRRouting/frr@460ee93
0024-bgpd-Do-not-process-NLRIs-if-the-attribute-length-is.patch	FRRouting/frr@f291f1e
0025-bgpd-Use-treat-as-withdraw-for-tunnel-encapsulation-.patch	FRRouting/frr@8a4a88c
0026-zebra-Add-encap-type-when-building-packet-for-FPM.patch	FRRouting/frr@f0f7b28
0028-bgpd-Check-mandatory-attributes-more-carefully-for-U.patch	FRRouting/frr@21418d6
0029-bgpd-Handle-MP_REACH_NLRI-malformed-packets-with-ses.patch	FRRouting/frr@30b5c2a
0030-bgpd-Treat-EOR-as-withdrawn-to-avoid-unwanted-handli.patch	FRRouting/frr@01f232c
0031-bgpd-Ignore-handling-NLRIs-if-we-received-MP_UNREACH.patch	FRRouting/frr@a0c4ec2
0032-zebra-Fix-fpm-multipath-encap-addition.patch	FRRouting/frr@10a9a5f
Realigned patches:

Old Patch	New patch
0005-Add-support-of-bgp-l3vni-evpn.patch	0005-Add-support-of-bgp-l3vni-evpn.patch
0021-zebra-remove-duplicated-nexthops-when-sending-fpm-msg.patch	0019-zebra-remove-duplicated-nexthops-when-sending-fpm-msg.patch
0027-zebra-Fix-non-notification-of-better-admin-won.patch	0020-zebra-Fix-non-notification-of-better-admin-won.patch
Disable-ipv6-src-address-test-in-pceplib.patch	0021-Disable-ipv6-src-address-test-in-pceplib.patch
cross-compile-changes.patch	0022-cross-compile-changes.patch
0033-zebra-The-dplane_fpm_nl-return-path-leaks-memory.patch	0023-zebra-The-dplane_fpm_nl-return-path-leaks-memory.patch
How to verify it
Running sonic-mgmt test suite.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants