Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Juniper CHASSIS and SYSTEM alerts #2358

Closed
1 task done
lunkwill42 opened this issue Mar 11, 2022 · 12 comments · Fixed by #2388
Closed
1 task done

Add support for Juniper CHASSIS and SYSTEM alerts #2358

lunkwill42 opened this issue Mar 11, 2022 · 12 comments · Fixed by #2388
Assignees
Labels
CNaaS Related to the CNaaS activity juniper

Comments

@lunkwill42
Copy link
Member

lunkwill42 commented Mar 11, 2022

Juniper devices have a concept of alerts, classified into Chassis alerts and system alerts.

Their SNMP MIBs support fetching a number of current alerts, but no details of what the alerts actually are.

The CNaaS team wants NAV to be able to report that alerts have been flagged, but some design is needed to figure out how this should work in NAV

Also needed:

@lunkwill42 lunkwill42 created this issue from a note in CNaaS deliverables (To do) Mar 11, 2022
@hmpf
Copy link
Contributor

hmpf commented Mar 15, 2022

@lunkwill42 I cannot find the word "alert" in the juniper mibs, do you mean notification, trap or alarm? If not, where is this described/documented?

@lunkwill42
Copy link
Member Author

lunkwill42 commented Mar 16, 2022

@hmpf I believe you want the JUNIPER-ALARM-MIB. It simply enumerates the number of present "red" alarms and "yellow" alarms in a device. There's a copy of it here: https://github.com/pgmillon/observium/blob/master/mibs/juniper/JUNIPER-ALARM-MIB

You may want to talk to Håvard E for some guidance, as Zino actually employs this MIB (and potentially others)...

@hmpf
Copy link
Contributor

hmpf commented Mar 16, 2022

Seems like step one is adding that mib to NAV's own library of mibs then :) Is there a howto for that?

@hmpf
Copy link
Contributor

hmpf commented Mar 16, 2022

According to that MIB there are yellow alarms and red alarms, a count for each, whether the status is on, off or other, and a timestamp for when the status last changed. It might be relevant to mark whether these alarms are enabled or not as well (via jnxAlarmRelayMode: other, passOn, cutOff).

@lunkwill42
Copy link
Member Author

Seems like step one is adding that mib to NAV's own library of mibs then :) Is there a howto for that?

Closest thing we have is this: https://nav.readthedocs.io/en/latest/hacking/adding-environment-probe-support.html?highlight=smidump#dumping-the-mib

@lunkwill42 lunkwill42 added CNaaS Related to the CNaaS activity juniper labels Mar 17, 2022
@hmpf
Copy link
Contributor

hmpf commented Mar 17, 2022

I've asked Håvard E. about what zino does about this.

The JUNIPER-ALARM-MIB only exists on equipment that has a physical craft-interface installed, supplying one actual led to show the colors, and buttons to turn monitoring on or off for various subsystems. Older stuff generally have this interface but at least MX204 and MX10003 does not. There is still a counter for these alarms though, they're just not officially accessible via SNMP.

A workaround is to have a script on the equipment that periodically reads the values (show system alarms) and makes them available via a different MIB, the Utility MIB (mib-jnx-util).

gw> show system alarms | display xml 
<rpc-reply xmlns:junos="http://xml.juniper.net/junos/19.4R0/junos">
    <alarm-information xmlns="http://xml.juniper.net/junos/19.4R0/junos-alarm">
        <alarm-summary>
            <no-active-alarms/>
        </alarm-summary>
    </alarm-information>
    <cli>
        <banner>{master}</banner>
    </cli>
</rpc-reply>

{master}
gw>

@hmpf
Copy link
Contributor

hmpf commented Mar 17, 2022

When using the mib-jnx-util, zino uses the OIDs jnxUtilUintValue.82.101.100.65.108.97.114.109 for the red alarm counter and jnxUtilUintValue.89.101.108.108.111.119.65.108.97.114.109 for the yellow alarm counter.

@hmpf
Copy link
Contributor

hmpf commented Mar 17, 2022

The steps so far seem to be:

  • Figure out how NAV should deal with this type of value/alarm
  • Add the JUNIPER-ALARM-MIB Docs
  • Use the JUNIPER-ALARM-MIB if it is available on the equipment polled
  • Figure out some way to get the alarms from equipment that don't support that MIB, whether via mib-jnx-util or something else.

@lunkwill42
Copy link
Member Author

#2368 adds the necessary documentation for some of the command line utilities that are useful for testing SNMP OID compatibility with NAV...

@hmpf
Copy link
Contributor

hmpf commented Apr 4, 2022

I've decided on making a new ipdevpoll-plugin just for these weirdos. It does not store the value, just dumps them into eventengine if not zero.

@hmpf
Copy link
Contributor

hmpf commented Apr 6, 2022

If converting a mib to python with smidump with the -k flag, also increase the error level above 3 for the -l-flag.

smidump -k -l 5 -f python  ./A.mib > A.py

If it complains failed to locate MIB module foo, get the missing mibs and preload them with the -p-flag:

smidump -k -l 5 -f python  -p ./foo.mib ./A.mib > A.py

@lunkwill42
Copy link
Member Author

lunkwill42 commented Jun 17, 2022

After a design discussion with @knutvi, we have sort of concluded on a way forward.

For the CNaaS team, it's important to know at any one given time what the current number of yellow or red alerts in a Juniper chassis is.

I offered up some interpretations of this, and after some discussion, we decided that the cleanest implementation in a NAV context would be this design:

  1. When ipdevpoll detects that Juniper device D has >0 yellow alerts, it should post a start-state event to this effect, and include the alert count as a variable in the event varmap (the list of arbitrary attributes that can be attached to an event). Likewise, when it sees that Juniper device D has =0 yellow alerts, it should post a corresponding end-state event. This is close to what Add support for Juniper CHASSIS and SYSTEM alerts #2388 is currently honing in on.
  2. How to actually respond to these events is up to the eventengine, and a plugin P to handle these events needs to be written.
  3. When P receives a start event that is not deemed to be a duplicate, it should post a corresponding alert, and copy the alert count variable into the generated AlertHistory record.
  4. When P receives a start event for B that is deemed a duplicate, it should do some further verification: If the seemingly duplicate event has a different alert count then the existing AlertHistory entry, it should:
    1. Close the existing AlertHistory entry, but suppress a regular end-alert from being sent (or the end-user may end up receiving two notifications about the transition, rather than just one)
    2. Post a new pair of Alert and AlertHistory records with the new alert count.
  5. When P receives an end-event that is deemed to match an open AlertHistory record, that record should be closed.

This means that every change in the alert count that does not transition to a count of 0 should cause an entirely new AlertHistory state for B to be created, while any old ones are resolved. Any change in the alert count to 0 should resolve any existing corresponding AlertHistory states.

CNaaS deliverables automation moved this from To do to Done Jun 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CNaaS Related to the CNaaS activity juniper
Projects
Development

Successfully merging a pull request may close this issue.

2 participants