Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zebra: When registering a nexthop, we do not always need to re-eval #2897

Merged
merged 2 commits into from
Aug 25, 2018

Conversation

donaldsharp
Copy link
Member

The code prior to this change, was allowing clients to register
for nexthop tracking. Then zebra would look up the rnh and
send to that particular client any known data. Additionally
zebra was blindly re-evaluating the rnh for every registration.

This leads to interesting behavior in that all people registered
for that nexthop will get callbacks even if nothing changes.

Modify the code to know if we have evaluated the rnh or not
and if so limit the re-evaluation to when absolutely necessary

This is of particular importance to do because of nht callbacks
for protocols cause those protocols to do not insignificant
work and as more protocols are registering for nht callbacks
we will cause more work than is necessary.

Signed-off-by: Donald Sharp sharpd@cumulusnetworks.com

@donaldsharp
Copy link
Member Author

@mwinter-osr @rwestphal any help here on what has gone wrong?

@rwestphal
Copy link
Member

@donaldsharp ldpd doesn't do nexthop tracking so this failure does't make sense to me. I'll rerun the failed tests to rule out the possibility of this being a false warning. If it turns out to be a real problem, I'll take a closer look.

@@ -1064,7 +1065,7 @@ static void zread_rnh_register(ZAPI_HANDLER_ARGS)
p.family);
return;
}
rnh = zebra_add_rnh(&p, zvrf_id(zvrf), type);
rnh = zebra_add_rnh(&p, zvrf_id(zvrf), type, &exist);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: zebra_add_rnh() might return NULL in the case of an error and we're not checking this possibility here (this is a pre-existing problem). CS might complain that the exists variable might be used uninitialized in this case. Checking if rnh is NULL or not would solve both problems. Other than that the changes look good to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah it is a bit of a pre-existing problem( but only going to happen when someone moves around initialization ) and the code would quickly crash too due to rnh deref. I'll make it behave a bit better though.

@NetDEF-CI
Copy link
Collaborator

Continuous Integration Result: FAILED

See below for issues.
CI System Testrun URL: https://ci1.netdef.org/browse/FRR-FRRPULLREQ-4997/

This is a comment from an EXPERIMENTAL automated CI system.
For questions and feedback in regards to this CI system, please feel free to email
Martin Winter - mwinter (at) opensourcerouting.org.

Get source and apply patch from patchwork: Successful

Building Stage: Successful

Basic Tests: Failed

Addresssanitizer topotest: Successful
Debian 8 deb pkg check: Successful
Topology tests on Ubuntu 16.04 amd64: Successful
CentOS 6 rpm pkg check: Successful
Debian 9 deb pkg check: Successful
IPv4 protocols on Ubuntu 14.04: Successful
Ubuntu 14.04 deb pkg check: Successful
Static analyzer (clang): Successful
IPv6 protocols on Ubuntu 14.04: Successful
CentOS 7 rpm pkg check: Successful
Ubuntu 16.04 deb pkg check: Successful
Topotest tests on Ubuntu 16.04 i386: Successful
Ubuntu 12.04 deb pkg check: Successful
Fedora 24 rpm pkg check: Successful

IPv4 ldp protocol on Ubuntu 16.04: Failed

RFC Compliance Test ANVL-LDP-1.24 failing:
Test Summary
Send DUT labelled data which DUT should forward
Test Reference
Setup Verification
Test Classification
MUST
Test ANVL-LDP-1.24: !FAILED!
Did not receive Label Mapping Message from 192.168.0.101 for FEC 30.0.10.0/24

RFC Compliance Test ANVL-LDP-26.8 failing:
Test Summary
An LSR configured for Independent Control and Downstream Unsolicited
mode sends a mapping message when the LSR recognizes a new FEC via the
forwarding table.
Test Reference
RFC 3036, s3.5.7.1.1 p67 Independent Control Mapping
Test Classification
MUST
Test ANVL-LDP-26.8: !FAILED!
Did not receive Label Mapping for FEC 172.16.12.0/24

RFC Compliance Test ANVL-LDP-1.24 failing:
Test Summary
Send DUT labelled data which DUT should forward
Test Reference
Setup Verification
Test Classification
MUST
Test ANVL-LDP-1.24: !FAILED!
Did not receive Label Mapping Message from 192.168.0.101 for FEC 30.0.10.0/24

RFC Compliance Test ANVL-LDP-26.8 failing:
Test Summary
An LSR configured for Independent Control and Downstream Unsolicited
mode sends a mapping message when the LSR recognizes a new FEC via the
forwarding table.
Test Reference
RFC 3036, s3.5.7.1.1 p67 Independent Control Mapping
Test Classification
MUST
Test ANVL-LDP-26.8: !FAILED!
Did not receive Label Mapping for FEC 172.16.12.0/24


CLANG Static Analyzer Summary

  • Github Pull Request 2897, comparing to Git base SHA 3391232

No Changes in Static Analysis warnings compared to base

4 Static Analyzer issues remaining.

See details at
https://ci1.netdef.org/browse/FRR-FRRPULLREQ-4997/artifact/shared/static_analysis/index.html

@donaldsharp
Copy link
Member Author

@rwestphal I already reran! Hence me attempting to poke multiple people yesterday/today

@donaldsharp donaldsharp force-pushed the zebra_rnh_fixup branch 3 times, most recently from 332591e to b7ee96c Compare August 25, 2018 11:21
@LabN-CI
Copy link
Collaborator

LabN-CI commented Aug 25, 2018

💚 Basic BGPD CI results: SUCCESS, 0 tests failed

Results table
_ _
Result SUCCESS git merge/2897 b7ee96c
Date 08/25/2018
Start 07:30:23
Finish 07:53:30
Run-Time 23:07
Total 1816
Pass 1816
Fail 0
Valgrind-Errors 0
Valgrind-Loss 0
Details vncregress-2018-08-25-07:30:23.txt
Log autoscript-2018-08-25-07:31:08.log.bz2

For details, please contact louberger

The code prior to this change, was allowing clients to register
for nexthop tracking.  Then zebra would look up the rnh and
send to that particular client any known data.  Additionally
zebra was blindly re-evaluating the rnh for every registration.

This leads to interesting behavior in that all people registered
for that nexthop will get callbacks even if nothing changes.

Modify the code to know if we have evaluated the rnh or not
and if so limit the re-evaluation to when absolutely necessary

This is of particular importance to do because of nht callbacks
for protocols cause those protocols to do not insignificant
work and as more protocols are registering for nht callbacks
we will cause more work than is necessary.

Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we add / remove a nexthop that we need to track,
keep track of the number of times we have done this
for each nexthop.  Consequently keep track of the
number of available nexthops, so that we can
just install new routes when we get one
that uses a pre-existing nexthop.  Deletion of
nexthops is done on refcount going to 0.
Removal of routes is handled elsewhere for removal.

Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
@FRRouting FRRouting deleted a comment from LabN-CI Aug 25, 2018
@FRRouting FRRouting deleted a comment from NetDEF-CI Aug 25, 2018
@NetDEF-CI
Copy link
Collaborator

Continuous Integration Result: SUCCESSFUL

Congratulations, this patch passed basic tests

Tested-by: NetDEF / OpenSourceRouting.org CI System

CI System Testrun URL: https://ci1.netdef.org/browse/FRR-FRRPULLREQ-5039/

This is a comment from an EXPERIMENTAL automated CI system.
For questions and feedback in regards to this CI system, please feel free to email
Martin Winter - mwinter (at) opensourcerouting.org.


CLANG Static Analyzer Summary

  • Github Pull Request 2897, comparing to Git base SHA 18d93bb

No Changes in Static Analysis warnings compared to base

4 Static Analyzer issues remaining.

See details at
https://ci1.netdef.org/browse/FRR-FRRPULLREQ-5039/artifact/shared/static_analysis/index.html

@FRRouting FRRouting deleted a comment from LabN-CI Aug 25, 2018
@FRRouting FRRouting deleted a comment from NetDEF-CI Aug 25, 2018
@FRRouting FRRouting deleted a comment from NetDEF-CI Aug 25, 2018
@LabN-CI
Copy link
Collaborator

LabN-CI commented Aug 25, 2018

💚 Basic BGPD CI results: SUCCESS, 0 tests failed

Results table
_ _
Result SUCCESS git merge/2897 74f0a94
Date 08/25/2018
Start 08:20:23
Finish 08:43:34
Run-Time 23:11
Total 1816
Pass 1816
Fail 0
Valgrind-Errors 0
Valgrind-Loss 0
Details vncregress-2018-08-25-08:20:23.txt
Log autoscript-2018-08-25-08:21:09.log.bz2

For details, please contact louberger

@NetDEF-CI
Copy link
Collaborator

Continuous Integration Result: SUCCESSFUL

Congratulations, this patch passed basic tests

Tested-by: NetDEF / OpenSourceRouting.org CI System

CI System Testrun URL: https://ci1.netdef.org/browse/FRR-FRRPULLREQ-5040/

This is a comment from an EXPERIMENTAL automated CI system.
For questions and feedback in regards to this CI system, please feel free to email
Martin Winter - mwinter (at) opensourcerouting.org.


CLANG Static Analyzer Summary

  • Github Pull Request 2897, comparing to Git base SHA 18d93bb

No Changes in Static Analysis warnings compared to base

4 Static Analyzer issues remaining.

See details at
https://ci1.netdef.org/browse/FRR-FRRPULLREQ-5040/artifact/shared/static_analysis/index.html

@rwestphal rwestphal merged commit 955cb66 into FRRouting:master Aug 25, 2018
nhtd->refcount++;

if (nhtd->refcount > 1) {
static_nht_update(nhtd->nh, nhtd->nh_num,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious about this added clause: is there a condition where registering interest in a tracked nexthop would change the status of the nexthop?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, they should be independent. Registering a nexthop tells zebra to just let you know when something changes that is it.

@donaldsharp donaldsharp deleted the zebra_rnh_fixup branch September 13, 2018 14:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants