(S,G) Prune not received on FHR #7347

vijaykug · 2020-10-20T09:32:46Z

Issue -
CI/CD for pull req #7126 is failing after rebase
On further debugging found that , Prune is not sent from RP (R2)to FHR (R3) node , because of that FHR (s,g) mroute is having OIL towards R2 (RP) and R1 (LHR) , this is causing duplicate multicast traffic on LHR ( Traffic received via RPT and SPT path)

It looks to be regression , pull request #7126 was submitted on sep15, that time all the test were passed .
I have re-run this test locally with 15 sept build ( frr_7.6-dev-20200915-07-g171b364c7-0~ubuntu16.04.1_amd64.deb) this testcase is passing .

Topology used:

R2 is RP
R1 LHR
R3 FHR

CLI output

#################################################################
R2 (RP)node output
#################################################################
frr# show ip pim join
Interface Address Source Group State Uptime Expire Prune
r2-r1-eth0 10.0.1.2 * 225.1.1.1 JOIN 00:29:56 02:57 --:--
r2-r1-eth0 10.0.1.2 10.0.6.2 225.1.1.1 SGRpt(P) --:--:-- 02:57 --:-- <<<< (s,g,rpt) prune received on R2 from R1

frr# show ip pim upstream
Iif Source Group State Uptime JoinTimer RSTimer KATimer RefCnt
lo * 225.1.1.1 J 00:33:55 00:00:05 --:--:-- --:--:-- 1
r2-r3-eth1 10.0.6.2 225.1.1.1 J 00:33:55 00:00:27 --:--:-- 00:03:05 2 <<<<<< upstream still in join state

frr# show ip mroute
IP Multicast Routing Table
Flags: S - Sparse, C - Connected, P - Pruned
R - RP-bit set, F - Register flag, T - SPT-bit set

Source Group Flags Proto Input Output TTL Uptime

              225.1.1.1          S        PIM     lo                    r2-r1-eth0       1    00:30:20

10.0.6.2 225.1.1.1 SRT PIM r2-r3-eth1 r2-r1-eth0 1 00:31:18 >>>>> RP node still keeping (s,g) mroute

###########################################################
R3 node output :-
###########################################################

frr# show ip mroute
IP Multicast Routing Table
Flags: S - Sparse, C - Connected, P - Pruned
R - RP-bit set, F - Register flag, T - SPT-bit set

Source Group Flags Proto Input Output TTL Uptime
10.0.6.2 225.1.1.1 SFT PIM r3-r5-eth3 r3-r1-eth0 1 00:41:48
PIM r3-r2-eth1 1

[x ] Did you check if this is a duplicate issue?
[x ] Did you test it on the latest FRRouting/frr master branch?

To Reproduce
Create testbed as per given topology
Send join and traffic from iperf
Wait for mroute convergence
Shut and no shut R1-R0 port from R1
or
Run testcase "test_multiple_groups_same_RP_address_p2" from pull req #7126 to repro this issue

Expected behavior

RP should sent prune toward FHR , and FHR should remove OIL towards RP after spt switchover

Versions

OS Kernel: Ubuntu 16.04
FRR Version: Master

will attach debug log for same for all the nodes

The text was updated successfully, but these errors were encountered:

vishaldhingra · 2020-10-20T11:02:22Z

Please assign this issue to me. (vishaldhingra)

vishaldhingra · 2020-11-02T08:58:31Z

RCA for mcast route (10.0.6.2,225.1.1.10)

The problem is at RP, RP is not sending (S,G) prune to R3 on the reception of (SGRpt) prune from R1.

r1: Shut the interface r1-r0-eth0 from R1 to R2
r2 receives (*,G) prune.
r2 removes the (*,G) Ifchannel state and marks the (SGRpt) to NOINFO.
r1: no Shut the interface r1-r0-eth0 from R1 to R2
r2 receives the (*,G) join
r2 creates the (*,G) Ifchannel state and due to presence of (SGRpt) Ifchannel it sets the channel_oil->oif_flags to PIM_OIF_FLAG_PROTO_PIM | PIM_OIF_FLAG_PROTO_STAR. r2 also sends the (S,G) join towards R3.
r2 receives the (SGRpt) prune from R1
As the flag was PIM_OIF_FLAG_PROTO_STAR also, It does not remove the OIF r2-r1-eth0 from mroute.
At this point R2 did not send the (S,G)prune towards R3.

Hence R1 is receiving duplicate traffic.

Here is the detailed explanation, with logs.
Test case is failing after performing below steps

Test step creating problem.

2020-11-01 08:30:41,737 INFO: STEP 39: 'r1: Shut the interface r1-r0-eth0 from R1 to R2'
2020-11-01 08:30:41,737 INFO: Shutting down interface : r1-r0-eth0

R1 is LHR and R2 having SGRpt ifchannel state.

R1 sends (*,G) prune towards R2.

R2 Logs for handling (*,G) prune

2020/11/01 22:45:56 PIM: pim_joinprune_recv: join/prune upstream=10.0.1.2 group=225.1.1.10/32 join_src=0 prune_src=1 from 10.0.1.1 on r2-r1-eth0
2020/11/01 22:45:56 PIM: recv_prune: prune (S,G)=(1.0.2.17,225.1.1.10) rpt=1 wc=2 upstream=10.0.1.2 holdtime=210 from 10.0.1.1 on r2-r1-eth0
2020/11/01 22:45:56 PIM: recv_prune: Prune received with RP(1.0.2.17) for (1.0.2.17,225.1.1.10)
2020/11/01 22:45:56 PIM: PIM_IFCHANNEL(r2-r1-eth0): (*,225.1.1.10) is switching from JOIN to PRUNEP
2020/11/01 22:45:56 PIM: on_ifjoin_prune_pending_timer: IFCHANNEL(*,225.1.1.10) PRUNEP Prune Pending Timer Popped
2020/11/01 22:45:56 PIM: pim_forward_stop: (S,G)=(*,225.1.1.10) oif=r2-r1-eth0 install_it: 0 installed: 1
2020/11/01 22:45:56 PIM: pimd/pim_oil.c pim_channel_add_oif: existing protocol mask 2 requested OIF r2-r1-eth0 (vif_index=2, min_ttl=1) for channel (S,G)=(*,225.1.1.10)
2020/11/01 22:45:56 PIM: PIM_IFCHANNEL(r2-r1-eth0): (*,225.1.1.10) is switching from PRUNEP to NOINFO
2020/11/01 22:45:56 PIM: pimd/pim_ifchannel.c pim_ifchannel_ifjoin_switch: Prune(S,G)=(10.0.6.2,225.1.1.10) from (*,225.1.1.10)
2020/11/01 22:45:56 PIM: PIM_IFCHANNEL_DOWN: (S,G)=(*,225.1.1.10) on interface r2-r1-eth0
2020/11/01 22:45:56 PIM: pim_upstream_switch: PIM_UPSTREAM_(*,225.1.1.10): (S,G) old: Joined new: NotJoined
2020/11/01 22:45:56 PIM: pim_ifchannel_update_assert_tracking_desired: AssertTrackingDesired(*,225.1.1.10,r2-r1-eth0) changed from 1 to 0
2020/11/01 22:45:56 PIM: pim_forward_stop: (S,G)=(*,225.1.1.10) oif=r2-r1-eth0 install_it: 0 installed: 1
2020/11/01 22:45:56 PIM: pim_mroute_add(pim_channel_del_oif), vrf default Added Route: (*,225.1.1.10) IIF: lo, OIFS:
2020/11/01 22:45:56 PIM: pim_channel_del_oif(pim_forward_stop): (S,G)=(*,225.1.1.10): proto_mask=2 IIF:1 OIF=r2-r1-eth0 vif_index=2
2020/11/01 22:45:56 PIM: pim_ifchannel_delete: ifchannel entry (*,225.1.1.10)(r2-r1-eth0) del start
2020/11/01 22:45:56 PIM: pimd/pim_oil.c pim_channel_del_oif: no existing protocol mask 2(0) for requested OIF r2-r1-eth0 (vif_index=2, min_ttl=0) for channel (S,G)=(*,225.1.1.10)
2020/11/01 22:45:56 PIM: pimd/pim_oil.c pim_channel_del_oif: no existing protocol mask 4(0) for requested OIF r2-r1-eth0 (vif_index=2, min_ttl=0) for channel (S,G)=(10.0.6.2,225.1.1.10)
2020/11/01 22:45:56 PIM: pim_upstream_del(pim_ifchannel_delete): Delete (*,225.1.1.10)[default] ref count: 1 , flags: 16 c_oil ref count 1 (Pre decrement)
2020/11/01 22:45:56 PIM: pim_upstream free vrf:default (*,225.1.1.10) flags 0x10
2020/11/01 22:45:56 PIM: pim_mroute_del(pim_upstream_del), vrf default Deleted Route: (*,225.1.1.10) IIF: lo, OIFS:
2020/11/01 22:45:56 PIM: pim_channel_oil_del(pim_channel_oil_upstream_deref): Del oil for (225.1.1.10,*), Ref Count: 1 (Predecrement)
2020/11/01 22:45:56 PIM: pim_upstream_del: Deregister upstream (*,225.1.1.10) addr 1.0.2.17/32 with Zebra NHT
2020/11/01 22:45:56 PIM: pim_ifchannel_delete: ifchannel entry (*,225.1.1.10)(r2-r1-eth0) is deleted

At this point (*,G) Ifchannel state is deleted, but SGRpt ifchannel is still there with state NOINFO.

no shutdown

2020-11-01 08:30:41,894 INFO: STEP 40: 'r1: No Shut the interface r1-r0-eth0 from R1 to R2'
2020-11-01 08:30:41,894 INFO: Bringing up interface : r1-r0-eth0
2020-11-01 08:30:42,080 INFO: Waiting for SPT switchover..

R2 logs on reception of (*,G) join

2020/11/01 22:46:04 PIM: recv_join: join (S,G)=(1.0.2.17,225.1.1.10) rpt=1 wc=1 upstream=10.0.1.2 holdtime=210 from 10.0.1.1 on r2-r1-eth0
2020/11/01 22:46:04 PIM: Lookedup: 0x1e69b30 for rp_info: 0x1e927d0(224.0.0.0/4) Lock: 2
2020/11/01 22:46:04 PIM: pim_rpf_set_refresh_time: vrf(default) New last route change time: 1604299564412821
2020/11/01 22:46:04 PIM: Lookedup: 0x1e69b30 for rp_info: 0x1e927d0(224.0.0.0/4) Lock: 2
2020/11/01 22:46:04 PIM: pim_channel_oil_add(pim_upstream_new): c_oil (*,225.1.1.10) add
2020/11/01 22:46:04 PIM: Lookedup: 0x1e69b30 for rp_info: 0x1e927d0(224.0.0.0/4) Lock: 2
2020/11/01 22:46:04 PIM: pim_if_find_neighbor: neighbor not found for address 1.0.2.17 on interface lo
2020/11/01 22:46:04 PIM: pim_rpf_update(pim_upstream_new): RPF'(*,225.1.1.10) not found: won't send join upstream
2020/11/01 22:46:04 PIM: pim_rpf_update(pim_upstream_new): (S,G)=(*,225.1.1.10) source nexthop now is: interface=lo address=1.0.2.17 pref=0 metric=0
2020/11/01 22:46:04 PIM: pim_rpf_update(pim_upstream_new): (S,G)=(*,225.1.1.10) RPF_interface(S) changed from <oldif?> to lo
2020/11/01 22:46:04 PIM: pim_upstream_mroute_iif_update(pim_upstream_new) (*,225.1.1.10) IIF: lo, OIFS:  mroute iif update 1
2020/11/01 22:46:04 PIM: pim_mroute_add(pim_upstream_new), vrf default Added Route: (*,225.1.1.10) IIF: lo, OIFS:
2020/11/01 22:46:04 PIM: pim_upstream_new: Created Upstream (*,225.1.1.10) upstream_addr 1.0.2.17 ref count 1 increment
2020/11/01 22:46:04 PIM: pim_upstream_add(pim_ifchannel_add): (*,225.1.1.10), iif 0.0.0.0/0 (lo) found: 0: ref_count: 1
2020/11/01 22:46:04 PIM: pim_ifchannel_add: ifchannel (*,225.1.1.10)(r2-r1-eth0) is created
2020/11/01 22:46:04 PIM: PIM_IFCHANNEL(r2-r1-eth0): (*,225.1.1.10) is switching from NOINFO to JOIN
2020/11/01 22:46:04 PIM: pimd/pim_ifchannel.c pim_ifchannel_ifjoin_switch: Join(S,G)=(10.0.6.2,225.1.1.10) from (*,225.1.1.10)
2020/11/01 22:46:04 PIM: pim_mroute_add(pim_channel_add_oif), vrf default Added Route: (10.0.6.2,225.1.1.10) IIF: r2-r3-eth1, OIFS: r2-r1-eth0
2020/11/01 22:46:04 PIM: pim_channel_add_oif(pim_ifchannel_ifjoin_switch): (S,G)=(10.0.6.2,225.1.1.10): proto_mask=4 OIF=r2-r1-eth0 vif_index=2: DONE
2020/11/01 22:46:04 PIM: pim_upstream_switch: PIM_UPSTREAM_(10.0.6.2,225.1.1.10): (S,G) old: NotJoined new: Joined
2020/11/01 22:46:04 PIM: pim_channel_add_oif(pim_upstream_inherited_olist_decide): (S,G)=(10.0.6.2,225.1.1.10): proto_mask=2 OIF=r2-r1-eth0 vif_index=2 added to 0x6
2020/11/01 22:46:04 PIM: pim_upstream_send_join: RPF'(10.0.6.2,225.1.1.10)=10.0.4.2(Joined) for Interface r2-r3-eth1
2020/11/01 22:46:04 PIM: pim_joinprune_send: from 10.0.4.2 on r2-r3-eth1
2020/11/01 22:46:04 PIM: pim_joinprune_send: sending (G)=225.1.1.10 to upstream=10.0.4.2 on interface r2-r3-eth1
2020/11/01 22:46:04 PIM: pim_msg_build_jp_groups : Join : 10.0.6.2
2020/11/01 22:46:04 PIM: pim_joinprune_send: interface r2-r3-eth1 num_joins 1 num_prunes 0
2020/11/01 22:46:04 PIM: pim_msg_send: to 224.0.0.13 on r2-r3-eth1: msg_size=34 checksum=dbd4


2020/11/01 22:46:04 PIM: join_timer_start: starting 60 sec timer for upstream (S,G)=(10.0.6.2,225.1.1.10)
2020/11/01 22:46:04 PIM: up (10.0.6.2,225.1.1.10) add to nbr r2-r3-eth1/10.0.4.2 jp-agg-list
2020/11/01 22:46:04 PIM: PIM_IFCHANNEL_UP: (S,G)=(*,225.1.1.10) on interface r2-r1-eth0
2020/11/01 22:46:04 PIM: pim_upstream_switch: PIM_UPSTREAM_(*,225.1.1.10): (S,G) old: NotJoined new: Joined
2020/11/01 22:46:04 PIM: pim_ifchannel_update_assert_tracking_desired: AssertTrackingDesired(*,225.1.1.10,r2-r1-eth0) changed from 0 to 1
2020/11/01 22:46:04 PIM: pim_mroute_add(pim_channel_add_oif), vrf default Added Route: (*,225.1.1.10) IIF: lo, OIFS: r2-r1-eth0
2020/11/01 22:46:04 PIM: pim_channel_add_oif(pim_upstream_inherited_olist_decide): (S,G)=(*,225.1.1.10): proto_mask=2 OIF=r2-r1-eth0 vif_index=2: DONE
2020/11/01 22:46:04 PIM: pim_upstream_send_join: RPF'(*,225.1.1.10)=0.0.0.0(Joined) for Interface lo
2020/11/01 22:46:04 PIM: pim_upstream_send_join: can't send join upstream: RPF'(*,225.1.1.10)=0.0.0.0
2020/11/01 22:46:04 PIM: join_timer_start: starting 60 sec timer for upstream (S,G)=(*,225.1.1.10)
2020/11/01 22:46:04 PIM: pimd/pim_oil.c pim_channel_add_oif: existing protocol mask 2 requested OIF r2-r1-eth0 (vif_index=2, min_ttl=1) for channel (S,G)=(*,225.1.1.10)
2020/11/01 22:46:04 PIM: pim_forward_start: (S,G)=(*,225.1.1.10) oif=r2-r1-eth0 (1.0.2.17)
2020/11/01 22:46:04 PIM: pimd/pim_oil.c pim_channel_add_oif: existing protocol mask 2 requested OIF r2-r1-eth0 (vif_index=2, min_ttl=1) for channel (S,G)=(*,225.1.1.10)
2020/11/01 22:46:04 PIM: pim_ifchannel_set_star_g_join_state: JOIN (*,225.1.1.10) eom: 0 join 1

R2 logs on reception of (SGRpt) prune.

2020/11/01 22:46:04 PIM: recv_prune: prune (S,G)=(10.0.6.2,225.1.1.10) rpt=1 wc=0 upstream=10.0.1.2 holdtime=210 from 10.0.1.1 on r2-r1-eth0
2020/11/01 22:46:04 PIM: pim_joinprune_recv: SGRpt flag is set, del inherit oif from up (10.0.6.2,225.1.1.10)
2020/11/01 22:46:04 PIM: pimd/pim_oil.c pim_channel_del_oif: other protocol masks remain for requested OIF r2-r1-eth0 (vif_index=2, min_ttl=1) for channel (S,G)=(10.0.6.2,225.1.1.10)
2020/11/01 22:46:04 PIM: pim_ifchannel_set_star_g_join_state: JOIN (*,225.1.1.10) eom: 1 join 0
2020/11/01 22:46:04 PIM: Scheduling READ event on PIM socket fd=16
2020/11/01 22:46:04 PIM: on_ifjoin_prune_pending_timer: IFCHANNEL(10.0.6.2,225.1.1.10) SGRpt(PP) Prune Pending Timer Popped
2020/11/01 22:46:06 PIM: Sending Request for New Channel Oil Information(10.0.6.2,225.1.1.3) VIIF 3(default)
2020/11/01 22:46:06 PIM: Sending Request for New Channel Oil Information(10.0.6.2,225.1.1.7) VIIF 3(default)

Here R2 has not modified the route and have not sent prune towards R3.

vishaldhingra · 2020-11-02T09:05:20Z

Fix

r1: Shut the interface r1-r0-eth0 from R1 to R2
r2 receives (,G) prune.
r2 During the processing of (,G) prune, we are not removing the ifchannel on
PruneTmp or PrunePendingTmp state. In this scenario, stop joinExpiry timer and delete the ifchannel.

When (*,G) join and (SGRpt) prune comes again after no shut. PIMd will configure the mroute correctly and send the prune correctly.

problem : ========= When (*,G) prune received where we have SGRpt state, ifchannel goes to NO_INFO state and doesn't get removed. Root cause : ============ During the processing of (*,G) prune, we are not removing the ifchannel on PruneTmp or PrunePendingTmp state. Fix : ===== In that scenario, stop joinExpiry timer and delete the ifchannel. issue FRRouting#7347 Co-authored-by: Saravanan K <saravanank@vmware.com> Signed-off-by: vishaldhingra <vdhingra@vmware.com>

problem: table-id gets overwritten for a given route. RCA: table-id was getting overwritten from the NB layer, So route was getting installed with the latest table-id. Fix: make the table-id as the key in the NB layer. This will program the route in zebra correctly. - Removed the table-id modify callbacks. - Moved the validate and apply table-id changes to path-list creation issue FRRouting#7347 Signed-off-by: vishaldhingra <vdhingra@vmware.com>

vijaykug · 2021-01-28T17:29:09Z

working fine on latest master , closing this issue

issue FRRouting#7347 Signed-off-by: vishaldhingra <vdhingra@vmware.com>

problem: table-id gets overwritten for a given route. RCA: table-id was getting overwritten from the NB layer, So route was getting installed with the latest table-id. Fix: make the table-id as the key in the NB layer. This will program the route in zebra correctly. - Removed the table-id modify callbacks. - Moved the validate and apply table-id changes to path-list creation issue FRRouting#7347 Signed-off-by: vishaldhingra <vdhingra@vmware.com>

problem : ========= When (*,G) prune received where we have SGRpt state, ifchannel goes to NO_INFO state and doesn't get removed. Root cause : ============ During the processing of (*,G) prune, we are not removing the ifchannel on PruneTmp or PrunePendingTmp state. Fix : ===== In that scenario, stop joinExpiry timer and delete the ifchannel. issue FRRouting#7347 Co-authored-by: Saravanan K <saravanank@vmware.com> Signed-off-by: vishaldhingra <vdhingra@vmware.com>

vijaykug added the triage Needs further investigation label Oct 20, 2020

donaldsharp assigned vishaldhingra Oct 20, 2020

qlyoung added the pim label Oct 26, 2020

vishaldhingra mentioned this issue Nov 2, 2020

pimd: (*,G) Prune processing doesn't remove SGRpt ifchannel #7404

Merged

vijaykug closed this as completed Jan 28, 2021

vishaldhingra added a commit to vishaldhingra/frr that referenced this issue Mar 4, 2021

(WIP)bgpd : peer-group configuration regression fixes

2faa8dc

issue FRRouting#7347 Signed-off-by: vishaldhingra <vdhingra@vmware.com>

vishaldhingra mentioned this issue Mar 4, 2021

bgpd: peer-group configuration regression fixes #8202

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(S,G) Prune not received on FHR #7347

(S,G) Prune not received on FHR #7347

vijaykug commented Oct 20, 2020 •

edited

vishaldhingra commented Oct 20, 2020 •

edited

vishaldhingra commented Nov 2, 2020

vishaldhingra commented Nov 2, 2020

vijaykug commented Jan 28, 2021

(S,G) Prune not received on FHR #7347

(S,G) Prune not received on FHR #7347

Comments

vijaykug commented Oct 20, 2020 • edited

vishaldhingra commented Oct 20, 2020 • edited

vishaldhingra commented Nov 2, 2020

vishaldhingra commented Nov 2, 2020

vijaykug commented Jan 28, 2021

vijaykug commented Oct 20, 2020 •

edited

vishaldhingra commented Oct 20, 2020 •

edited