Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(S,G) Prune not received on FHR #7347

Closed
vijaykug opened this issue Oct 20, 2020 · 4 comments
Closed

(S,G) Prune not received on FHR #7347

vijaykug opened this issue Oct 20, 2020 · 4 comments
Assignees
Labels
pim triage Needs further investigation

Comments

@vijaykug
Copy link
Contributor

vijaykug commented Oct 20, 2020

Issue -
CI/CD for pull req #7126 is failing after rebase
On further debugging found that , Prune is not sent from RP (R2)to FHR (R3) node , because of that FHR (s,g) mroute is having OIL towards R2 (RP) and R1 (LHR) , this is causing duplicate multicast traffic on LHR ( Traffic received via RPT and SPT path)

It looks to be regression , pull request #7126 was submitted on sep15, that time all the test were passed .
I have re-run this test locally with 15 sept build ( frr_7.6-dev-20200915-07-g171b364c7-0~ubuntu16.04.1_amd64.deb) this testcase is passing .

Topology used:

image

R2 is RP
R1 LHR
R3 FHR

CLI output

#################################################################
R2 (RP)node output
#################################################################
frr# show ip pim join
Interface Address Source Group State Uptime Expire Prune
r2-r1-eth0 10.0.1.2 * 225.1.1.1 JOIN 00:29:56 02:57 --:--
r2-r1-eth0 10.0.1.2 10.0.6.2 225.1.1.1 SGRpt(P) --:--:-- 02:57 --:-- <<<< (s,g,rpt) prune received on R2 from R1

frr# show ip pim upstream
Iif Source Group State Uptime JoinTimer RSTimer KATimer RefCnt
lo * 225.1.1.1 J 00:33:55 00:00:05 --:--:-- --:--:-- 1
r2-r3-eth1 10.0.6.2 225.1.1.1 J 00:33:55 00:00:27 --:--:-- 00:03:05 2 <<<<<< upstream still in join state

frr# show ip mroute
IP Multicast Routing Table
Flags: S - Sparse, C - Connected, P - Pruned
R - RP-bit set, F - Register flag, T - SPT-bit set

Source Group Flags Proto Input Output TTL Uptime

  •               225.1.1.1          S        PIM     lo                    r2-r1-eth0       1    00:30:20
    

10.0.6.2 225.1.1.1 SRT PIM r2-r3-eth1 r2-r1-eth0 1 00:31:18 >>>>> RP node still keeping (s,g) mroute

###########################################################
R3 node output :-
###########################################################

frr# show ip mroute
IP Multicast Routing Table
Flags: S - Sparse, C - Connected, P - Pruned
R - RP-bit set, F - Register flag, T - SPT-bit set

Source Group Flags Proto Input Output TTL Uptime
10.0.6.2 225.1.1.1 SFT PIM r3-r5-eth3 r3-r1-eth0 1 00:41:48
PIM r3-r2-eth1 1

[x ] Did you check if this is a duplicate issue?
[x ] Did you test it on the latest FRRouting/frr master branch?

To Reproduce
Create testbed as per given topology
Send join and traffic from iperf
Wait for mroute convergence
Shut and no shut R1-R0 port from R1
or
Run testcase "test_multiple_groups_same_RP_address_p2" from pull req #7126 to repro this issue

Expected behavior

RP should sent prune toward FHR , and FHR should remove OIL towards RP after spt switchover

Versions

  • OS Kernel: Ubuntu 16.04
  • FRR Version: Master

will attach debug log for same for all the nodes

@vijaykug vijaykug added the triage Needs further investigation label Oct 20, 2020
@vishaldhingra
Copy link
Contributor

vishaldhingra commented Oct 20, 2020

Please assign this issue to me. (vishaldhingra)

@vishaldhingra
Copy link
Contributor

RCA for mcast route (10.0.6.2,225.1.1.10)

The problem is at RP, RP is not sending (S,G) prune to R3 on the reception of (SGRpt) prune from R1.

  1. r1: Shut the interface r1-r0-eth0 from R1 to R2
  2. r2 receives (*,G) prune.
  3. r2 removes the (*,G) Ifchannel state and marks the (SGRpt) to NOINFO.
  4. r1: no Shut the interface r1-r0-eth0 from R1 to R2
  5. r2 receives the (*,G) join
  6. r2 creates the (*,G) Ifchannel state and due to presence of (SGRpt) Ifchannel it sets the channel_oil->oif_flags to PIM_OIF_FLAG_PROTO_PIM | PIM_OIF_FLAG_PROTO_STAR. r2 also sends the (S,G) join towards R3.
  7. r2 receives the (SGRpt) prune from R1
    As the flag was PIM_OIF_FLAG_PROTO_STAR also, It does not remove the OIF r2-r1-eth0 from mroute.
    At this point R2 did not send the (S,G)prune towards R3.

Hence R1 is receiving duplicate traffic.

Here is the detailed explanation, with logs.
Test case is failing after performing below steps

  1. Test step creating problem.
2020-11-01 08:30:41,737 INFO: STEP 39: 'r1: Shut the interface r1-r0-eth0 from R1 to R2'
2020-11-01 08:30:41,737 INFO: Shutting down interface : r1-r0-eth0

R1 is LHR and R2 having SGRpt ifchannel state.

  1. R1 sends (*,G) prune towards R2.

    R2 Logs for handling (*,G) prune

2020/11/01 22:45:56 PIM: pim_joinprune_recv: join/prune upstream=10.0.1.2 group=225.1.1.10/32 join_src=0 prune_src=1 from 10.0.1.1 on r2-r1-eth0
2020/11/01 22:45:56 PIM: recv_prune: prune (S,G)=(1.0.2.17,225.1.1.10) rpt=1 wc=2 upstream=10.0.1.2 holdtime=210 from 10.0.1.1 on r2-r1-eth0
2020/11/01 22:45:56 PIM: recv_prune: Prune received with RP(1.0.2.17) for (1.0.2.17,225.1.1.10)
2020/11/01 22:45:56 PIM: PIM_IFCHANNEL(r2-r1-eth0): (*,225.1.1.10) is switching from JOIN to PRUNEP
2020/11/01 22:45:56 PIM: on_ifjoin_prune_pending_timer: IFCHANNEL(*,225.1.1.10) PRUNEP Prune Pending Timer Popped
2020/11/01 22:45:56 PIM: pim_forward_stop: (S,G)=(*,225.1.1.10) oif=r2-r1-eth0 install_it: 0 installed: 1
2020/11/01 22:45:56 PIM: pimd/pim_oil.c pim_channel_add_oif: existing protocol mask 2 requested OIF r2-r1-eth0 (vif_index=2, min_ttl=1) for channel (S,G)=(*,225.1.1.10)
2020/11/01 22:45:56 PIM: PIM_IFCHANNEL(r2-r1-eth0): (*,225.1.1.10) is switching from PRUNEP to NOINFO
2020/11/01 22:45:56 PIM: pimd/pim_ifchannel.c pim_ifchannel_ifjoin_switch: Prune(S,G)=(10.0.6.2,225.1.1.10) from (*,225.1.1.10)
2020/11/01 22:45:56 PIM: PIM_IFCHANNEL_DOWN: (S,G)=(*,225.1.1.10) on interface r2-r1-eth0
2020/11/01 22:45:56 PIM: pim_upstream_switch: PIM_UPSTREAM_(*,225.1.1.10): (S,G) old: Joined new: NotJoined
2020/11/01 22:45:56 PIM: pim_ifchannel_update_assert_tracking_desired: AssertTrackingDesired(*,225.1.1.10,r2-r1-eth0) changed from 1 to 0
2020/11/01 22:45:56 PIM: pim_forward_stop: (S,G)=(*,225.1.1.10) oif=r2-r1-eth0 install_it: 0 installed: 1
2020/11/01 22:45:56 PIM: pim_mroute_add(pim_channel_del_oif), vrf default Added Route: (*,225.1.1.10) IIF: lo, OIFS:
2020/11/01 22:45:56 PIM: pim_channel_del_oif(pim_forward_stop): (S,G)=(*,225.1.1.10): proto_mask=2 IIF:1 OIF=r2-r1-eth0 vif_index=2
2020/11/01 22:45:56 PIM: pim_ifchannel_delete: ifchannel entry (*,225.1.1.10)(r2-r1-eth0) del start
2020/11/01 22:45:56 PIM: pimd/pim_oil.c pim_channel_del_oif: no existing protocol mask 2(0) for requested OIF r2-r1-eth0 (vif_index=2, min_ttl=0) for channel (S,G)=(*,225.1.1.10)
2020/11/01 22:45:56 PIM: pimd/pim_oil.c pim_channel_del_oif: no existing protocol mask 4(0) for requested OIF r2-r1-eth0 (vif_index=2, min_ttl=0) for channel (S,G)=(10.0.6.2,225.1.1.10)
2020/11/01 22:45:56 PIM: pim_upstream_del(pim_ifchannel_delete): Delete (*,225.1.1.10)[default] ref count: 1 , flags: 16 c_oil ref count 1 (Pre decrement)
2020/11/01 22:45:56 PIM: pim_upstream free vrf:default (*,225.1.1.10) flags 0x10
2020/11/01 22:45:56 PIM: pim_mroute_del(pim_upstream_del), vrf default Deleted Route: (*,225.1.1.10) IIF: lo, OIFS:
2020/11/01 22:45:56 PIM: pim_channel_oil_del(pim_channel_oil_upstream_deref): Del oil for (225.1.1.10,*), Ref Count: 1 (Predecrement)
2020/11/01 22:45:56 PIM: pim_upstream_del: Deregister upstream (*,225.1.1.10) addr 1.0.2.17/32 with Zebra NHT
2020/11/01 22:45:56 PIM: pim_ifchannel_delete: ifchannel entry (*,225.1.1.10)(r2-r1-eth0) is deleted

At this point (*,G) Ifchannel state is deleted, but SGRpt ifchannel is still there with state NOINFO.

  1. no shutdown
2020-11-01 08:30:41,894 INFO: STEP 40: 'r1: No Shut the interface r1-r0-eth0 from R1 to R2'
2020-11-01 08:30:41,894 INFO: Bringing up interface : r1-r0-eth0
2020-11-01 08:30:42,080 INFO: Waiting for SPT switchover..
  1. R2 logs on reception of (*,G) join
2020/11/01 22:46:04 PIM: recv_join: join (S,G)=(1.0.2.17,225.1.1.10) rpt=1 wc=1 upstream=10.0.1.2 holdtime=210 from 10.0.1.1 on r2-r1-eth0
2020/11/01 22:46:04 PIM: Lookedup: 0x1e69b30 for rp_info: 0x1e927d0(224.0.0.0/4) Lock: 2
2020/11/01 22:46:04 PIM: pim_rpf_set_refresh_time: vrf(default) New last route change time: 1604299564412821
2020/11/01 22:46:04 PIM: Lookedup: 0x1e69b30 for rp_info: 0x1e927d0(224.0.0.0/4) Lock: 2
2020/11/01 22:46:04 PIM: pim_channel_oil_add(pim_upstream_new): c_oil (*,225.1.1.10) add
2020/11/01 22:46:04 PIM: Lookedup: 0x1e69b30 for rp_info: 0x1e927d0(224.0.0.0/4) Lock: 2
2020/11/01 22:46:04 PIM: pim_if_find_neighbor: neighbor not found for address 1.0.2.17 on interface lo
2020/11/01 22:46:04 PIM: pim_rpf_update(pim_upstream_new): RPF'(*,225.1.1.10) not found: won't send join upstream
2020/11/01 22:46:04 PIM: pim_rpf_update(pim_upstream_new): (S,G)=(*,225.1.1.10) source nexthop now is: interface=lo address=1.0.2.17 pref=0 metric=0
2020/11/01 22:46:04 PIM: pim_rpf_update(pim_upstream_new): (S,G)=(*,225.1.1.10) RPF_interface(S) changed from <oldif?> to lo
2020/11/01 22:46:04 PIM: pim_upstream_mroute_iif_update(pim_upstream_new) (*,225.1.1.10) IIF: lo, OIFS:  mroute iif update 1
2020/11/01 22:46:04 PIM: pim_mroute_add(pim_upstream_new), vrf default Added Route: (*,225.1.1.10) IIF: lo, OIFS:
2020/11/01 22:46:04 PIM: pim_upstream_new: Created Upstream (*,225.1.1.10) upstream_addr 1.0.2.17 ref count 1 increment
2020/11/01 22:46:04 PIM: pim_upstream_add(pim_ifchannel_add): (*,225.1.1.10), iif 0.0.0.0/0 (lo) found: 0: ref_count: 1
2020/11/01 22:46:04 PIM: pim_ifchannel_add: ifchannel (*,225.1.1.10)(r2-r1-eth0) is created
2020/11/01 22:46:04 PIM: PIM_IFCHANNEL(r2-r1-eth0): (*,225.1.1.10) is switching from NOINFO to JOIN
2020/11/01 22:46:04 PIM: pimd/pim_ifchannel.c pim_ifchannel_ifjoin_switch: Join(S,G)=(10.0.6.2,225.1.1.10) from (*,225.1.1.10)
2020/11/01 22:46:04 PIM: pim_mroute_add(pim_channel_add_oif), vrf default Added Route: (10.0.6.2,225.1.1.10) IIF: r2-r3-eth1, OIFS: r2-r1-eth0
2020/11/01 22:46:04 PIM: pim_channel_add_oif(pim_ifchannel_ifjoin_switch): (S,G)=(10.0.6.2,225.1.1.10): proto_mask=4 OIF=r2-r1-eth0 vif_index=2: DONE
2020/11/01 22:46:04 PIM: pim_upstream_switch: PIM_UPSTREAM_(10.0.6.2,225.1.1.10): (S,G) old: NotJoined new: Joined
2020/11/01 22:46:04 PIM: pim_channel_add_oif(pim_upstream_inherited_olist_decide): (S,G)=(10.0.6.2,225.1.1.10): proto_mask=2 OIF=r2-r1-eth0 vif_index=2 added to 0x6
2020/11/01 22:46:04 PIM: pim_upstream_send_join: RPF'(10.0.6.2,225.1.1.10)=10.0.4.2(Joined) for Interface r2-r3-eth1
2020/11/01 22:46:04 PIM: pim_joinprune_send: from 10.0.4.2 on r2-r3-eth1
2020/11/01 22:46:04 PIM: pim_joinprune_send: sending (G)=225.1.1.10 to upstream=10.0.4.2 on interface r2-r3-eth1
2020/11/01 22:46:04 PIM: pim_msg_build_jp_groups : Join : 10.0.6.2
2020/11/01 22:46:04 PIM: pim_joinprune_send: interface r2-r3-eth1 num_joins 1 num_prunes 0
2020/11/01 22:46:04 PIM: pim_msg_send: to 224.0.0.13 on r2-r3-eth1: msg_size=34 checksum=dbd4


2020/11/01 22:46:04 PIM: join_timer_start: starting 60 sec timer for upstream (S,G)=(10.0.6.2,225.1.1.10)
2020/11/01 22:46:04 PIM: up (10.0.6.2,225.1.1.10) add to nbr r2-r3-eth1/10.0.4.2 jp-agg-list
2020/11/01 22:46:04 PIM: PIM_IFCHANNEL_UP: (S,G)=(*,225.1.1.10) on interface r2-r1-eth0
2020/11/01 22:46:04 PIM: pim_upstream_switch: PIM_UPSTREAM_(*,225.1.1.10): (S,G) old: NotJoined new: Joined
2020/11/01 22:46:04 PIM: pim_ifchannel_update_assert_tracking_desired: AssertTrackingDesired(*,225.1.1.10,r2-r1-eth0) changed from 0 to 1
2020/11/01 22:46:04 PIM: pim_mroute_add(pim_channel_add_oif), vrf default Added Route: (*,225.1.1.10) IIF: lo, OIFS: r2-r1-eth0
2020/11/01 22:46:04 PIM: pim_channel_add_oif(pim_upstream_inherited_olist_decide): (S,G)=(*,225.1.1.10): proto_mask=2 OIF=r2-r1-eth0 vif_index=2: DONE
2020/11/01 22:46:04 PIM: pim_upstream_send_join: RPF'(*,225.1.1.10)=0.0.0.0(Joined) for Interface lo
2020/11/01 22:46:04 PIM: pim_upstream_send_join: can't send join upstream: RPF'(*,225.1.1.10)=0.0.0.0
2020/11/01 22:46:04 PIM: join_timer_start: starting 60 sec timer for upstream (S,G)=(*,225.1.1.10)
2020/11/01 22:46:04 PIM: pimd/pim_oil.c pim_channel_add_oif: existing protocol mask 2 requested OIF r2-r1-eth0 (vif_index=2, min_ttl=1) for channel (S,G)=(*,225.1.1.10)
2020/11/01 22:46:04 PIM: pim_forward_start: (S,G)=(*,225.1.1.10) oif=r2-r1-eth0 (1.0.2.17)
2020/11/01 22:46:04 PIM: pimd/pim_oil.c pim_channel_add_oif: existing protocol mask 2 requested OIF r2-r1-eth0 (vif_index=2, min_ttl=1) for channel (S,G)=(*,225.1.1.10)
2020/11/01 22:46:04 PIM: pim_ifchannel_set_star_g_join_state: JOIN (*,225.1.1.10) eom: 0 join 1
  1. R2 logs on reception of (SGRpt) prune.
2020/11/01 22:46:04 PIM: recv_prune: prune (S,G)=(10.0.6.2,225.1.1.10) rpt=1 wc=0 upstream=10.0.1.2 holdtime=210 from 10.0.1.1 on r2-r1-eth0
2020/11/01 22:46:04 PIM: pim_joinprune_recv: SGRpt flag is set, del inherit oif from up (10.0.6.2,225.1.1.10)
2020/11/01 22:46:04 PIM: pimd/pim_oil.c pim_channel_del_oif: other protocol masks remain for requested OIF r2-r1-eth0 (vif_index=2, min_ttl=1) for channel (S,G)=(10.0.6.2,225.1.1.10)
2020/11/01 22:46:04 PIM: pim_ifchannel_set_star_g_join_state: JOIN (*,225.1.1.10) eom: 1 join 0
2020/11/01 22:46:04 PIM: Scheduling READ event on PIM socket fd=16
2020/11/01 22:46:04 PIM: on_ifjoin_prune_pending_timer: IFCHANNEL(10.0.6.2,225.1.1.10) SGRpt(PP) Prune Pending Timer Popped
2020/11/01 22:46:06 PIM: Sending Request for New Channel Oil Information(10.0.6.2,225.1.1.3) VIIF 3(default)
2020/11/01 22:46:06 PIM: Sending Request for New Channel Oil Information(10.0.6.2,225.1.1.7) VIIF 3(default)

Here R2 has not modified the route and have not sent prune towards R3.

@vishaldhingra
Copy link
Contributor

Fix

r1: Shut the interface r1-r0-eth0 from R1 to R2
r2 receives (,G) prune.
r2 During the processing of (
,G) prune, we are not removing the ifchannel on
PruneTmp or PrunePendingTmp state. In this scenario, stop joinExpiry timer and delete the ifchannel.

When (*,G) join and (SGRpt) prune comes again after no shut. PIMd will configure the mroute correctly and send the prune correctly.

vishaldhingra added a commit to vishaldhingra/frr that referenced this issue Nov 2, 2020
problem :
=========
When (*,G) prune received where we have SGRpt state,
ifchannel goes to NO_INFO state and doesn't get removed.

Root cause :
============
During the processing of (*,G) prune, we are not removing the
ifchannel on PruneTmp or PrunePendingTmp state.

Fix :
=====
In that scenario, stop joinExpiry timer and delete the ifchannel.

issue FRRouting#7347

Co-authored-by: Saravanan K <saravanank@vmware.com>
Signed-off-by: vishaldhingra <vdhingra@vmware.com>
vishaldhingra added a commit to vishaldhingra/frr that referenced this issue Jan 10, 2021
problem: table-id gets overwritten for a given route.

RCA: table-id was getting overwritten from the NB layer,
     So route was getting installed with the latest table-id.

Fix: make the table-id as the key in the NB layer.
     This will program the route in zebra correctly.

- Removed the table-id modify callbacks.
- Moved the validate and apply table-id changes to path-list creation

issue FRRouting#7347

Signed-off-by: vishaldhingra <vdhingra@vmware.com>
vishaldhingra added a commit to vishaldhingra/frr that referenced this issue Jan 12, 2021
problem: table-id gets overwritten for a given route.

RCA: table-id was getting overwritten from the NB layer,
     So route was getting installed with the latest table-id.

Fix: make the table-id as the key in the NB layer.
     This will program the route in zebra correctly.

- Removed the table-id modify callbacks.
- Moved the validate and apply table-id changes to path-list creation

issue FRRouting#7347

Signed-off-by: vishaldhingra <vdhingra@vmware.com>
vishaldhingra added a commit to vishaldhingra/frr that referenced this issue Jan 15, 2021
problem: table-id gets overwritten for a given route.

RCA: table-id was getting overwritten from the NB layer,
     So route was getting installed with the latest table-id.

Fix: make the table-id as the key in the NB layer.
     This will program the route in zebra correctly.

- Removed the table-id modify callbacks.
- Moved the validate and apply table-id changes to path-list creation

issue FRRouting#7347

Signed-off-by: vishaldhingra <vdhingra@vmware.com>
@vijaykug
Copy link
Contributor Author

working fine on latest master , closing this issue

vishaldhingra added a commit to vishaldhingra/frr that referenced this issue Mar 4, 2021
issue FRRouting#7347

Signed-off-by: vishaldhingra <vdhingra@vmware.com>
taspelund pushed a commit to taspelund/frr that referenced this issue Oct 11, 2022
problem: table-id gets overwritten for a given route.

RCA: table-id was getting overwritten from the NB layer,
     So route was getting installed with the latest table-id.

Fix: make the table-id as the key in the NB layer.
     This will program the route in zebra correctly.

- Removed the table-id modify callbacks.
- Moved the validate and apply table-id changes to path-list creation

issue FRRouting#7347

Signed-off-by: vishaldhingra <vdhingra@vmware.com>
taspelund pushed a commit to taspelund/frr that referenced this issue Oct 11, 2022
problem :
=========
When (*,G) prune received where we have SGRpt state,
ifchannel goes to NO_INFO state and doesn't get removed.

Root cause :
============
During the processing of (*,G) prune, we are not removing the
ifchannel on PruneTmp or PrunePendingTmp state.

Fix :
=====
In that scenario, stop joinExpiry timer and delete the ifchannel.

issue FRRouting#7347

Co-authored-by: Saravanan K <saravanank@vmware.com>
Signed-off-by: vishaldhingra <vdhingra@vmware.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pim triage Needs further investigation
Projects
None yet
Development

No branches or pull requests

3 participants