Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SONiC] FRR did not completely read/load frr.conf and missed generating bgpd.log #13773

Closed
1 of 2 tasks
vaibhavhd opened this issue Jun 13, 2023 · 5 comments
Closed
1 of 2 tasks
Labels
triage Needs further investigation

Comments

@vaibhavhd
Copy link


Describe the bug

  • Did you check if this is a duplicate issue?
  • Did you test it on the latest FRRouting/frr master branch?

FRR version: 7.5.1-sonic
Operating system: SONiC 202012
Kernel: 4.19.0-12-2-amd64

FRR missed parts of config that was present in frr.conf file

Additionally, bgpd.log was not generated due to loading partial config.

Below is an example of config. When the issue happens FRR missed to load config contents after router bgp 4200065100.
That means, router-id config, graceful-restart config and announcing Loopback0 and Vlan routes was missed.

This caused production issue for SONiC.

Similar issues that I could find, but unsure if this is the same issue:
#12343
#12049

Common symptom from above issue: Config was missed.
Uncommon symptom: bgpd.log was also missed due to partial read of config.

root@str-7260cx3-acs-1:/etc/sonic/frr# cat frr.conf 
    !
! =========== Managed by sonic-cfggen DO NOT edit manually! ====================
! generated by templates/frr.conf.j2 with config DB data
! file: frr.conf
!
!
! template: common/daemons.common.conf.j2
!
hostname str-7260cx3-acs-1
password zebra
enable password zebra
!
log syslog informational
log facility local4
!
! end of template: common/daemons.common.conf.j2!
!
!
!
! Enable nht through default route
ip nht resolve-via-default
! Enable link-detect (default disabled)
interface PortChannel101
link-detect
!
interface PortChannel102
link-detect
!
interface PortChannel103
link-detect
!
interface PortChannel104
link-detect
!
!!
!
!
! add static ipv6 /64 loopback route to allow bgpd to advertise the loopback route prefix
ipv6 route fc00:1::/64 Loopback0
!!
!
! template: bgpd/bgpd.main.conf.j2
!
! bgp multiple-instance
!
! BGP configuration
!
! TSA configuration
!
ip prefix-list PL_LoopbackV4 permit 10.1.0.32/32
!
ipv6 prefix-list PL_LoopbackV6 permit fc00:1::/64
!
ip prefix-list LOCAL_VLAN_IPV4_PREFIX seq 5 permit 192.168.0.0/21
!
ipv6 prefix-list LOCAL_VLAN_IPV6_PREFIX seq 10 permit fc02:1000::/64
!
!
!
router bgp 4200065100
!
  bgp log-neighbor-changes
  no bgp default ipv4-unicast
  no bgp ebgp-requires-policy
!
  bgp bestpath as-path multipath-relax
!
  bgp graceful-restart restart-time 240
  bgp graceful-restart
  bgp graceful-restart preserve-fw-state
  bgp graceful-restart select-defer-time 45
!
  bgp router-id 10.1.0.32
!
  network 10.1.0.32/32
!
  address-family ipv6
    network fc00:1::32/64
  exit-address-family
!
  network 192.168.0.1/21
  address-family ipv6
   network fc02:1000::1/64
  exit-address-family
!
!
!
  address-family ipv4
    maximum-paths 64
  exit-address-family
  address-family ipv6
    maximum-paths 64
  exit-address-family
!
! end of template: bgpd/bgpd.main.conf.j2
!!

To Reproduce

It is difficult to repro the issue.
However, when we laoded partial config we could repro the issue of missing bgpd.log.

Expected behavior

FRR should load / read all of config fully.
FRR should always generate bgpd.log.

Screenshots

Versions

  • OS Version:
  • Kernel:
  • FRR Version:

Additional context

@vaibhavhd vaibhavhd added the triage Needs further investigation label Jun 13, 2023
@StormLiangMS
Copy link

I suspect this issue is related this code, it doesn't check the errno when fgets done. It could hit some error during fgets. It should check errno, then it could try reopen the config file or drop an error msg.

/* Configuration make from file. */
int vtysh_config_from_file(struct vty *vty, FILE *fp)
{
int ret;
const struct cmd_element cmd;
int lineno = 0;
/
once we have an error, we remember & return that */
int retcode = CMD_SUCCESS;
char *vty_buf_copy = XCALLOC(MTYPE_VTYSH_CMD, VTY_BUFSIZ);
char *vty_buf_trimmed = NULL;

while (fgets(vty->buf, VTY_BUFSIZ, fp)) {
	lineno++;

@donaldsharp
Copy link
Member

Please check to see if this issue is seen in latest master. 7.5.1 is 2+ years old at this point in time and no-one is going to look at this

@StormLiangMS
Copy link

StormLiangMS commented Jun 14, 2023

hi @donaldsharp, I checked the code in master branch, it doesn't change, when read from frr.conf by vtysh -b, it assumes NULL is eof by default, but actually it could be an error for either hardware issue or corrupted content of frr.conf, when fgets return NULL. In current code, it could be silent failure and leave user no chance to capture and save it in first place. I'm working on a PR to enhance this.

while (fgets(vty->buf, VTY_BUFSIZ, fp)) {
lineno++;

Copy link

This issue is stale because it has been open 180 days with no activity. Comment or remove the autoclose label in order to avoid having this issue closed.

@frrbot
Copy link

frrbot bot commented Dec 12, 2023

This issue will be automatically closed in the specified period unless there is further activity.

@frrbot frrbot bot closed this as completed Dec 19, 2023
@frrbot frrbot bot closed this as completed Dec 19, 2023
@frrbot frrbot bot removed the autoclose label Dec 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage Needs further investigation
Projects
None yet
Development

No branches or pull requests

3 participants