# NetworkConfigParser

A small module to parse structured documents, like Cisco or Juniper network device configurations. Maintains
relationships among lines for ease of further parsing and analysis. Parses IP addresses for easier matching using the
ipaddress library.

## Quick Start and Examples

---

### Example 1: Find references to an interface name

1. We will read in the example configuration shown below, stored in the `config` variable as a string. The `parse_from_str()` function will parse this string, returning a list of `DocumentLine()` objects.

In [54]:
from search_helpers import *
from parser import *
import ipaddress as ipa

config = """interface TenGigE0/1/0/1
 description Backbone Circuit to North Pudsey from Metric Networks
 cdp
 mtu 2060
 ipv4 address 192.0.2.101 255.255.255.252
 load-interval 30
!
interface Loopback10
 description Router ID
 ipv4 address 192.0.2.1 255.255.255.255
!
router isis IGP
 net 49.0000.1920.0000.2001.00
 log adjacency changes
 address-family ipv4 unicast
  metric-style wide
  mpls traffic-eng level-1-2
  mpls traffic-eng router-id Loopback10
 !
 interface Loopback10
  passive
  circuit-type level-2-only
  address-family ipv4 unicast
  !
 !
 interface TenGigE0/1/0/1
  circuit-type level-2-only
  point-to-point
  address-family ipv4 unicast
   metric 1000
   mpls ldp sync
!
rsvp
 interface TenGigE0/1/0/1
 !
 interface TenGigE0/0/0/0
 !
!
mpls traffic-eng
 interface TenGigE0/0/0/0
 !
 interface TenGigE0/1/0/1
 !
!
mpls ldp
 !
 igp sync delay on-session-up 10
 router-id 192.0.2.1
 !
 session protection
 !
 interface TenGigE0/0/0/0
 !
 interface TenGigE0/1/0/1
 !
router static
 0.0.0.0/0 192.0.2.102
 203.0.113.100/30 192.0.2.102
"""

doc_lines = parse_from_str(config)

doc_lines[0:6]

[<DocumentLine gen=1 num_children=5 line_num=1: "interface TenGigE0/1/0/1">,
 <DocumentLine gen=2 num_children=0 line_num=2: " description Backbone Circuit to North Pudsey from Metric Networks">,
 <DocumentLine gen=2 num_children=0 line_num=3: " cdp">,
 <DocumentLine gen=2 num_children=0 line_num=4: " mtu 2060">,
 <DocumentLine gen=2 num_children=0 line_num=5: " ipv4 address 192.0.2.101 255.255.255.252">,
 <DocumentLine gen=2 num_children=0 line_num=6: " load-interval 30">]

Next, let's identify lines pertaining to TenGigE0/1/0/1. The `find_lines()` function helps us do this.

We supply the list created in the step above, plus a search term to identify the lines we want matched. If the search term is a string, that term is searched as a regular expression (using `re.search`).

In [22]:
find_lines(doc_lines, r'TenGigE0/1/0/1')

[<DocumentLine gen=1 num_children=5 line_num=1: "interface TenGigE0/1/0/1">,
 <DocumentLine gen=2 num_children=3 line_num=26: " interface TenGigE0/1/0/1">,
 <DocumentLine gen=2 num_children=0 line_num=34: " interface TenGigE0/1/0/1">,
 <DocumentLine gen=2 num_children=0 line_num=42: " interface TenGigE0/1/0/1">,
 <DocumentLine gen=2 num_children=0 line_num=54: " interface TenGigE0/1/0/1">]


By default, only the matched lines are returned. Parent lines are not included by default, but in this case we will want to see the section in which the interface appears. So we add `include_ancestors=True` to get the interface name.

In [23]:
find_lines(doc_lines, r'TenGigE0/1/0/1', include_ancestors=True)

[<DocumentLine gen=1 num_children=5 line_num=1: "interface TenGigE0/1/0/1">,
 <DocumentLine gen=1 num_children=7 line_num=12: "router isis IGP">,
 <DocumentLine gen=2 num_children=3 line_num=26: " interface TenGigE0/1/0/1">,
 <DocumentLine gen=1 num_children=4 line_num=33: "rsvp">,
 <DocumentLine gen=2 num_children=0 line_num=34: " interface TenGigE0/1/0/1">,
 <DocumentLine gen=1 num_children=4 line_num=39: "mpls traffic-eng">,
 <DocumentLine gen=2 num_children=0 line_num=42: " interface TenGigE0/1/0/1">,
 <DocumentLine gen=1 num_children=10 line_num=45: "mpls ldp">,
 <DocumentLine gen=2 num_children=0 line_num=54: " interface TenGigE0/1/0/1">]

The filtered list contains the matched interface lines with all section header lines ("router isis", "mpls ldp", etc.) leading to those matches.

We can also include immediate children of the matches:

In [24]:
find_lines(doc_lines, r'TenGigE0/1/0/1', include_children=True)

[<DocumentLine gen=1 num_children=5 line_num=1: "interface TenGigE0/1/0/1">,
 <DocumentLine gen=2 num_children=0 line_num=2: " description Backbone Circuit to North Pudsey from Metric Networks">,
 <DocumentLine gen=2 num_children=0 line_num=3: " cdp">,
 <DocumentLine gen=2 num_children=0 line_num=4: " mtu 2060">,
 <DocumentLine gen=2 num_children=0 line_num=5: " ipv4 address 192.0.2.101 255.255.255.252">,
 <DocumentLine gen=2 num_children=0 line_num=6: " load-interval 30">,
 <DocumentLine gen=2 num_children=3 line_num=26: " interface TenGigE0/1/0/1">,
 <DocumentLine gen=3 num_children=0 line_num=27: "  circuit-type level-2-only">,
 <DocumentLine gen=3 num_children=0 line_num=28: "  point-to-point">,
 <DocumentLine gen=3 num_children=2 line_num=29: "  address-family ipv4 unicast">,
 <DocumentLine gen=2 num_children=0 line_num=34: " interface TenGigE0/1/0/1">,
 <DocumentLine gen=2 num_children=0 line_num=42: " interface TenGigE0/1/0/1">,
 <DocumentLine gen=2 num_children=0 line_num=54: "

And we can include all descendants of the matched lines as well:

In [26]:
find_lines(doc_lines, r'TenGigE0/1/0/1', include_all_descendants=True)

[<DocumentLine gen=1 num_children=5 line_num=1: "interface TenGigE0/1/0/1">,
 <DocumentLine gen=2 num_children=0 line_num=2: " description Backbone Circuit to North Pudsey from Metric Networks">,
 <DocumentLine gen=2 num_children=0 line_num=3: " cdp">,
 <DocumentLine gen=2 num_children=0 line_num=4: " mtu 2060">,
 <DocumentLine gen=2 num_children=0 line_num=5: " ipv4 address 192.0.2.101 255.255.255.252">,
 <DocumentLine gen=2 num_children=0 line_num=6: " load-interval 30">,
 <DocumentLine gen=2 num_children=3 line_num=26: " interface TenGigE0/1/0/1">,
 <DocumentLine gen=3 num_children=0 line_num=27: "  circuit-type level-2-only">,
 <DocumentLine gen=3 num_children=0 line_num=28: "  point-to-point">,
 <DocumentLine gen=3 num_children=2 line_num=29: "  address-family ipv4 unicast">,
 <DocumentLine gen=4 num_children=0 line_num=30: "   metric 1000">,
 <DocumentLine gen=4 num_children=0 line_num=31: "   mpls ldp sync">,
 <DocumentLine gen=2 num_children=0 line_num=34: " interface TenGigE0/

---

## Example: Controlling regex options

`find_lines()` permits setting flags in re.search. Set regex_flags=re.IGNORECASE to perform case-insensitive matches:

In [29]:
find_lines(doc_lines, r'INTERFACE', regex_flags=re.IGNORECASE)

[<DocumentLine gen=1 num_children=5 line_num=1: "interface TenGigE0/1/0/1">,
 <DocumentLine gen=1 num_children=2 line_num=8: "interface Loopback10">,
 <DocumentLine gen=2 num_children=4 line_num=20: " interface Loopback10">,
 <DocumentLine gen=2 num_children=3 line_num=26: " interface TenGigE0/1/0/1">,
 <DocumentLine gen=2 num_children=0 line_num=34: " interface TenGigE0/1/0/1">,
 <DocumentLine gen=2 num_children=0 line_num=36: " interface TenGigE0/0/0/0">,
 <DocumentLine gen=2 num_children=0 line_num=40: " interface TenGigE0/0/0/0">,
 <DocumentLine gen=2 num_children=0 line_num=42: " interface TenGigE0/1/0/1">,
 <DocumentLine gen=2 num_children=0 line_num=52: " interface TenGigE0/0/0/0">,
 <DocumentLine gen=2 num_children=0 line_num=54: " interface TenGigE0/1/0/1">]

---

## Example: Matching successive children

Let's extract interface metric settings from "router isis" above. We can do this easily by giving find_items() multiple search terms in the form of a list or a tuple.

We'll keep ancestors in the result for readability.

In [31]:
find_lines(doc_lines, ['interface', 'metric'], regex_flags=re.IGNORECASE, include_ancestors=True)

[<DocumentLine gen=1 num_children=5 line_num=1: "interface TenGigE0/1/0/1">,
 <DocumentLine gen=2 num_children=0 line_num=2: " description Backbone Circuit to North Pudsey from Metric Networks">,
 <DocumentLine gen=1 num_children=7 line_num=12: "router isis IGP">,
 <DocumentLine gen=2 num_children=3 line_num=26: " interface TenGigE0/1/0/1">,
 <DocumentLine gen=3 num_children=2 line_num=29: "  address-family ipv4 unicast">,
 <DocumentLine gen=4 num_children=0 line_num=30: "   metric 1000">]

Multiple search terms match anywhere in the hierarchy. The only requirement to fulfill the match
is that the second term be found in any descendant of the first match, the third term is found in
any descendant of the second match, and so on.

To perform stricter matching where the second match must be made in the immediate child of the
first match, add recurse_search=False:

In [33]:
find_lines(doc_lines, ['interface', 'metric'], regex_flags=re.IGNORECASE, include_ancestors=True, recurse_search=False)

[<DocumentLine gen=1 num_children=5 line_num=1: "interface TenGigE0/1/0/1">,
 <DocumentLine gen=2 num_children=0 line_num=2: " description Backbone Circuit to North Pudsey from Metric Networks">]

Now our IS-IS metric lines are missing. This makes sense, because 'interface' and 'metric' are separated in the IS-IS config with "address-family ipv4". Let's add that term in:

In [38]:
find_lines(doc_lines, ['interface', 'address-family', 'metric'], regex_flags=re.IGNORECASE, include_ancestors=True, recurse_search=False)

[<DocumentLine gen=1 num_children=7 line_num=12: "router isis IGP">,
 <DocumentLine gen=2 num_children=3 line_num=26: " interface TenGigE0/1/0/1">,
 <DocumentLine gen=3 num_children=2 line_num=29: "  address-family ipv4 unicast">,
 <DocumentLine gen=4 num_children=0 line_num=30: "   metric 1000">]

Now we see our result as expected.

---

## Example: Callback functions as search terms

Let's find interfaces matching the IP 192.0.2.1.

We start with searching interfaces:

In [43]:
intf_search_result = find_lines(doc_lines, ['interface', '192.0.2.1'], regex_flags=re.IGNORECASE, include_ancestors=True)
intf_search_result

[<DocumentLine gen=1 num_children=5 line_num=1: "interface TenGigE0/1/0/1">,
 <DocumentLine gen=2 num_children=0 line_num=5: " ipv4 address 192.0.2.101 255.255.255.252">,
 <DocumentLine gen=1 num_children=2 line_num=8: "interface Loopback10">,
 <DocumentLine gen=2 num_children=0 line_num=10: " ipv4 address 192.0.2.1 255.255.255.255">]

Our result includes Te0/1/0/1 that matches the IP in the regex, but is not the address we want.

DocumentLine objects parse anything that looks like an IP address or network, and stores it as an object from the excellent ipaddress module. Those are stored in the `ip_addrs` and `ip_nets` attributes respectively:

In [46]:
intf_search_result[1].ip_addrs, intf_search_result[3].ip_addrs

(frozenset({IPv4Address('192.0.2.101')}),
 frozenset({IPv4Address('192.0.2.1')}))

In [47]:
intf_search_result[1].ip_nets, intf_search_result[3].ip_nets

(frozenset({IPv4Network('192.0.2.100/30')}),
 frozenset({IPv4Network('192.0.2.1/32')}))

DocumentLine objects even have a method to match on IPv4 (or v6)Address and Network objects, has_ip():

In [49]:
intf_search_result[1].has_ip(ipa.ip_address('192.0.2.1')), intf_search_result[3].has_ip(ipa.ip_address('192.0.2.1'))

(False, True)

Regular expressions can't use the has_ip method, because they search the entire line by default.

find_lines() can take a function as a search term. The function must take a DocumentLine as the first and only argument, and return a bool to indicate a successful match.

Lambda expressions, therefore, work very well here.

In [50]:
find_lines(doc_lines, ['interface', lambda x: x.has_ip(ipa.ip_address('192.0.2.1'))], regex_flags=re.IGNORECASE, include_ancestors=True)

[<DocumentLine gen=1 num_children=2 line_num=8: "interface Loopback10">,
 <DocumentLine gen=2 num_children=0 line_num=10: " ipv4 address 192.0.2.1 255.255.255.255">]

Behold the power of functions as search terms. Any attribute of DocumentLine can be chosen as a search term.

To find all /30 networks defined in the configuration:

In [56]:
find_lines(doc_lines, lambda x: any(True for i in x.ip_nets if i.prefixlen == 30), include_ancestors=True)

[<DocumentLine gen=1 num_children=5 line_num=1: "interface TenGigE0/1/0/1">,
 <DocumentLine gen=2 num_children=0 line_num=5: " ipv4 address 192.0.2.101 255.255.255.252">,
 <DocumentLine gen=1 num_children=2 line_num=56: "router static">,
 <DocumentLine gen=2 num_children=0 line_num=58: " 203.0.113.100/30 192.0.2.102">]

Find addresses that lie within a particular /30:

In [60]:
find_lines(doc_lines, lambda x: any(i in ipa.ip_network('192.0.2.100/30') for i in x.ip_addrs), include_ancestors=True)

[<DocumentLine gen=1 num_children=5 line_num=1: "interface TenGigE0/1/0/1">,
 <DocumentLine gen=2 num_children=0 line_num=5: " ipv4 address 192.0.2.101 255.255.255.252">,
 <DocumentLine gen=1 num_children=2 line_num=56: "router static">,
 <DocumentLine gen=2 num_children=0 line_num=57: " 0.0.0.0/0 192.0.2.102">,
 <DocumentLine gen=2 num_children=0 line_num=58: " 203.0.113.100/30 192.0.2.102">]

Find all second-generation lines, those with an immediate parent but no grandparent:

In [61]:
find_lines(doc_lines, lambda x: x.gen == 2)

[<DocumentLine gen=2 num_children=0 line_num=2: " description Backbone Circuit to North Pudsey from Metric Networks">,
 <DocumentLine gen=2 num_children=0 line_num=3: " cdp">,
 <DocumentLine gen=2 num_children=0 line_num=4: " mtu 2060">,
 <DocumentLine gen=2 num_children=0 line_num=5: " ipv4 address 192.0.2.101 255.255.255.252">,
 <DocumentLine gen=2 num_children=0 line_num=6: " load-interval 30">,
 <DocumentLine gen=2 num_children=0 line_num=9: " description Router ID">,
 <DocumentLine gen=2 num_children=0 line_num=10: " ipv4 address 192.0.2.1 255.255.255.255">,
 <DocumentLine gen=2 num_children=0 line_num=13: " net 49.0000.1920.0000.2001.00">,
 <DocumentLine gen=2 num_children=0 line_num=14: " log adjacency changes">,
 <DocumentLine gen=2 num_children=3 line_num=15: " address-family ipv4 unicast">,
 <DocumentLine gen=2 num_children=0 line_num=19: " !">,
 <DocumentLine gen=2 num_children=4 line_num=20: " interface Loopback10">,
 <DocumentLine gen=2 num_children=0 line_num=25: " !">,
 

---

### Example 3: Extracting all IP addresses and networks referenced in a configuration

IPs are easy to gather with a set comprehension (shown below) or a list comprehension.

In [62]:
all_ip_addrs = {j for i in doc_lines for j in i.ip_addrs if not i.is_comment}
all_ip_addrs

{IPv4Address('0.0.0.0'),
 IPv4Address('192.0.2.1'),
 IPv4Address('192.0.2.101'),
 IPv4Address('192.0.2.102'),
 IPv4Address('203.0.113.100')}

In [63]:
all_ip_networks = {j for i in doc_lines for j in i.ip_nets if not i.is_comment}
all_ip_networks

{IPv4Network('0.0.0.0/0'),
 IPv4Network('192.0.2.1/32'),
 IPv4Network('192.0.2.100/30'),
 IPv4Network('203.0.113.100/30')}

## TODO parent_child_cb

Let's find interfaces with metric configured. But we want only the interface name returned, not the metric value or anything else.

We can start with our earlier example, where we introduced list-based search terms:

## TODO convert_match and convert_family

## TODO flatten_family
