Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FPM lack support of MPLS #11189

Closed
odd22 opened this issue May 12, 2022 · 10 comments
Closed

FPM lack support of MPLS #11189

odd22 opened this issue May 12, 2022 · 10 comments
Labels
fpm triage Needs further investigation

Comments

@odd22
Copy link
Member

odd22 commented May 12, 2022

MPLS labels are not propagated through FPM

If you activate MPLS on FRR with LDP or Segment Routing (OSPF or IS-IS), MPLS labels that are enforce on Linux Kernel through netlink interface are not propagate through the FPM interface. Within SONIC, the fpmsyncd client (see https://github.com/Azure/sonic-swss/tree/master/fpmsyncd and more particular onLabelRouteMsg function https://github.com/Azure/sonic-swss/blob/master/fpmsyncd/routesync.cpp#L786) is expecting to received message with family AF_MPLS to process MPLS labels, but no one are received. This issue concerns any FRR version including master branch.

  • [x ] Did you check if this is a duplicate issue?
  • [x ] Did you test it on the latest FRRouting/frr master branch?

To Reproduce

Activate MPLS and IS-IS segment Routing. Connect any FPM client and look to what's message are sent by Zebra to the client through FPM interface e.g. with tcpdump.

Expected behavior

You should received message with family AF_MPLS.

Versions

Sonic barefoot image (version master.95332-dirty-20220502.215123), libopenSAI 1.9.1 + P4 SDE 9.9 and FRR 8.2.2.

Additional context

PR sonic-net/sonic-swss#1765 for fpmsyncd propose an alternative to solve this issue. Fpmsyncd is listen to the netlink socket instead of the FPM one. With this patch, it is possible to receive MPLS information, but, as mention in the PR, fpmsyncd could not received VRF_ID information.

Thus, FPM should support MPLS.

@odd22 odd22 added triage Needs further investigation fpm labels May 12, 2022
@mjstapp
Copy link
Contributor

mjstapp commented May 12, 2022

Just to be clear: FPM doesn't include LSPs currently, and it doesn't include nexthops' labels at all.

@odd22
Copy link
Member Author

odd22 commented May 12, 2022

@mjstapp Yes of course! It is what I saw when browsing the current zebra code.

The issue is more to discuss how to add such support and add one more item in out TODO list.

@rzalamena
Copy link
Member

@odd22 here is some information to help you figure out what is missing.

The FPM module dplane_fpm_nl supports MPLS if it is handled in zebra netlink encode/decode routines (are you talking about this FPM module or the other one?).

Browsing the code I see the following:

  1. If zebra learns and configures MPLS via netlink, then it should work because the data plane framework encapsulates these information in a southbound agnostic way. The FPM code that is supposed to handle route install/delete/update is here: https://github.com/FRRouting/frr/blob/master/zebra/dplane_fpm_nl.c#L721

    	case DPLANE_OP_ROUTE_INSTALL:
     	rv = netlink_route_multipath_msg_encode(
     		RTM_NEWROUTE, ctx, &nl_buf[nl_buf_len],
     		sizeof(nl_buf) - nl_buf_len, true, fnc->use_nhg);

    So if the netlink_route_multipath_msg_encode handles MPLS it should work as is. If you need a protobuf version you need to extend the module and write the appropriate protobuf spec / encoder / decoder.

    If the information is not available in the data plane context structure ( https://github.com/FRRouting/frr/blob/master/zebra/zebra_dplane.c#L351 ) then it needs to be extended to support it.

  2. LSP is also supported: https://github.com/FRRouting/frr/blob/master/zebra/dplane_fpm_nl.c#L773

     case DPLANE_OP_LSP_INSTALL:
     case DPLANE_OP_LSP_UPDATE:
     case DPLANE_OP_LSP_DELETE:
     	rv = netlink_lsp_msg_encoder(ctx, nl_buf, sizeof(nl_buf));
  3. It seems that at the moment zebra doesn't learn about MPLS routes installed in Linux: https://github.com/FRRouting/frr/blob/master/zebra/rt_netlink.c#L762

     /* We don't care about change notifications for the MPLS table. */
     /* TODO: Revisit this. */
     if (rtm->rtm_family == AF_MPLS)
     	return 0;

If we are talking about the other FPM module zebra_fpm_netlink.c then the appropriate route/LSP installation hook needs to be in place and you might need to write a netlink enconding function there.

@odd22
Copy link
Member Author

odd22 commented May 16, 2022

@rzalamena I'm referring to FPM module and not dplane_fpm_nl.

Indeed, within SONiC, zebra is launched with -M fpm option. I try to change for -M dplane_fpm_nl option, and configure zebra with fpm address 127.0.0.1 to let dplane_fpm_nl connect to the fpmsyncd process. Here, as you mention, MPLS information are properly sent to fpmsyncd process. However, this configuration is not working because the VRF ID is not transmitted, or wrongly transmitted. Thus, fpmsyncd reject all messages received from dplane_fpm_nl module and no routes are enforced within the ASIC.

So, in both case, FPM or DPLANE_FPM_NL modules need enhancement. FPM, to propagate MPLS labels, DPLANE_FPM_NL to propagate VRF ID. I don't which is the better nor which is the simple to implement.

@odd22
Copy link
Member Author

odd22 commented May 16, 2022

After dig around both FRR and SONiC code I finally understand how it exactly work.

First, SONiC needs to collect the VRF ID to properly enforce route within the ASIC. As such VRF IDs is not supported by netlink, they substitute the TABLE ID information by the VRF ID. For that purpose, they produced and apply a patch to FRR when building FRR for SONIC: see https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-frr/patch/0003-Use-vrf_id-for-vrf-not-tabled_id.patch

This patch is only pertinent for FPM not for the DPLANE_FPM_NL module. I produced a similar patch:

diff --git a/zebra/rt_netlink.c b/zebra/rt_netlink.c
index 2ff083dec..d80955a8f 100644
--- a/zebra/rt_netlink.c
+++ b/zebra/rt_netlink.c
@@ -1984,8 +1984,8 @@ ssize_t netlink_route_multipath_msg_encode(int cmd,
                }
        }
 #endif
-       /* Table corresponding to this route. */
-       table_id = dplane_ctx_get_table(ctx);
+       /* VRF corresponding to this route. */
+       table_id = dplane_ctx_get_vrf(ctx);
        if (table_id < 256)
                req->r.rtm_table = table_id;
        else {

and I'll make a try

@donaldsharp
Copy link
Member

yeah frankly this patch is wrong imo. Sonic has taken a shortcut and cut themselves into a corner. If they ever want to push a route into a table that is not a vrf they can not do so. If they need a mapping of table -to vrf they need to keep that in their code. Additionally this patch will never be acceptable in FRR land and is something that they need to carry for themselves

@odd22
Copy link
Member Author

odd22 commented May 16, 2022

@donaldsharp I'm completely agree with you. The way SONiC is handle VRF ID is not good. I just try to achieve a fast (yes dirty ;-)) patch to let fpmsyncd learnt MPLS table. Do you know if there is another solution to properly convey VRF ID through FPM/netlink ? As, if SONiC should maintain the mapping table_id / vrf_id, at a certain moment, they need to populate it, thus, to learn it. VRF ID could be retrieved from the configuration DB, but how to build this association table ?

And, of course, this patch is for SONiC build image. Not FRR. BTW, the patch is not complete as I also need to change rtm_table in netlink_mpls_multipath_msg_encode() function.

My first tests are not very concluant. I got Adjacency SID, but not prefix SID.

@donaldsharp
Copy link
Member

Frankly this is an data plane issue that needs to be addressed. Currently FRR only gets interface information via netlink commands but in reality this needs to be properly abstracted such that FRR provides interface up/down(etc) data plane interface and then people developing their own dataplane can program against it( and the different kernels we work with should be switched to these interfaces )

Currently we live in a bastardized world where this is a bit messy in this regards. I do know @mjstapp recently started doing some work here to finalize this. All in all this is a work in progress.

@github-actions
Copy link

This issue is stale because it has been open 180 days with no activity. Comment or remove the autoclose label in order to avoid having this issue closed.

@frrbot
Copy link

frrbot bot commented Nov 13, 2022

This issue will be automatically closed in the specified period unless there is further activity.

@odd22 odd22 removed the autoclose label Nov 15, 2022
@frrbot frrbot bot closed this as completed Nov 20, 2022
@frrbot frrbot bot closed this as completed Nov 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fpm triage Needs further investigation
Projects
None yet
Development

No branches or pull requests

4 participants