## Analyzing Configuration Drift

When debugging network issues, it is important to understand how the network configuration has changed relative to recent past or the desired golden state and to understand the impacts of those changes. A text diff for the configs is one way to do this, but it tends to be too noisy. It will output many differences that you may not care about (e.g., changes in whitespace or timestamps), and it is hard to control what is reported. Text diffs also do not tell you about the impact of change, such as which new flows will be permitted or if some BGP edges go down.

Batfish parses and builds a vendor-neutral model of device configs. This model enables you to learn how two snapshots of network configuration differ exactly along the aspects you care about. The behavior modeling of Batfish also lets you understand the full impact of these changes. This notebook illustrates this capability. 

We focus on the following differences:
 1. Some node- and interface-level configuration settings
 1. Some settings of BGP processes and neighbors
 1. Structures defined in device configs 
 1. Undefined references
 1. BGP adjacencies
 1. ACL lines with treat flows differently 

Simple text diffs won't be useful for anything but the first two items. The next two items require understanding the structure of the config, and the final two network behavior they induce. 

We picked these as examples of different types of changes that you can analyze using Batfish. You may be interested in a different aspects of your networkk, and you should be able to adapt the code below to suit your needs.

In [1]:
!diff -ur networks/drift/reference networks/drift/snapshot

diff -ur networks/drift/reference/configs/as1border1.cfg networks/drift/snapshot/configs/as1border1.cfg
--- networks/drift/reference/configs/as1border1.cfg	2021-03-19 20:32:35.000000000 -0700
+++ networks/drift/snapshot/configs/as1border1.cfg	2021-03-30 14:32:04.000000000 -0700
@@ -21,7 +21,7 @@
 !
 !
 no ip domain lookup
-ip domain name lab.local
+ip domain name lab.localp
 no ipv6 cef
 !
 !
diff -ur networks/drift/reference/configs/as1border2.cfg networks/drift/snapshot/configs/as1border2.cfg
--- networks/drift/reference/configs/as1border2.cfg	2021-03-19 20:32:35.000000000 -0700
+++ networks/drift/snapshot/configs/as1border2.cfg	2021-03-30 14:56:28.000000000 -0700
@@ -11,7 +11,7 @@
 !
 !
 ntp server 18.18.18.18
-ntp server 23.23.23.23
+ntp server 18.18.18.19
 !
 !
 no aaa new-model
diff -ur networks/drift/reference/configs/as2border2.cfg networks/drift/snapshot/configs/as2border2.cfg
--- networks/drift/reference/configs/as2border2.cfg	2021-03-19 20:32:35.0000

In [2]:
# Import packages and load questions
%run startup.py
load_questions()


# Initialize both the snapshot and the reference that we want to use
NETWORK_NAME = "my_network"
SNAPSHOT_PATH = "networks/drift/snapshot"
REFERENCE_PATH = "networks/drift/reference"

bf_set_network(NETWORK_NAME)
bf_init_snapshot(SNAPSHOT_PATH, name="snapshot", overwrite=True)
bf_init_snapshot(REFERENCE_PATH, name="reference", overwrite=True)

'reference'

In [3]:
# Helper functions to print drift information in a readable manner

def friendly_name(entity_type, row, key_columns): 
    """
    Returns a readable string as key, value pairs for an entity in a Pandas row. 
    """
    return ",".join([f"{key}={row[key]}" for key in key_columns])
    

def diff_properties(diff_frame, entity_type, key_columns, property_columns):
    if len(diff_frame) == 0:
        print(f"{entity_type} properties are identical across the two snapshot")
        return
    snapshot_only = diff_frame[diff_frame["KeyPresence"] == "Only in Snapshot"]
    reference_only = diff_frame[diff_frame["KeyPresence"] == "Only in Reference"]
    both = diff_frame[diff_frame["KeyPresence"] == "In both"]
    if len(snapshot_only) > 0:
        print(f"{entity_type}s only in snapshot")        
        for index, row in snapshot_only.iterrows():
            print(f"    {friendly_name(entity_type, row, key_columns)}")
    if len(reference_only) > 0:
        print(f"{entity_type}s only in reference")        
        for index, row in reference_only.iterrows():
            print(f"    {friendly_name(entity_type, row, key_columns)}")
    for index,row in both.iterrows():
        print("Differences for {}".format(friendly_name(entity_type, row, key_columns)))
        for property in property_columns:
            snapshot_setting = row[f"Snapshot_{property}"]
            reference_setting = row[f"Reference_{property}"]
            if snapshot_setting != reference_setting:
                print(f"    {property}: {reference_setting} -> {snapshot_setting}")


def diff_frames(snapshot_frame, reference_frame, entity_type):
    combined = pd.merge(snapshot_frame, reference_frame, how="outer", indicator=True)
    snapshot_only = combined[combined["_merge"] == "left_only"]
    reference_only = combined[combined["_merge"] == "right_only"]
    if len(snapshot_only) > 0:
        print(f"{entity_type}s only in snapshot")        
        for index, row in snapshot_only.iterrows():
            print("    ", friendly_name(entity_type, row, set(combined.columns) - {"_merge"}))
    if len(reference_only) > 0:
        print(f"{entity_type}s only in reference")        
        for index, row in reference_only.iterrows():
            print("    ", friendly_name(entity_type, row, set(combined.columns) - {"_merge"}))
    if len(snapshot_only) == 0 and len(reference_only) == 0:
        print(f"{entity_type}s are identical across the two snapshots")

### Node-level properties

We first check if any node-level configuration setting has changed. We focus on three example settings: 1) NTP servers, 2) Domain name, and 3) VRFs configured on the device. The complete list of node settings extracted by Batfish are [here](https://batfish.readthedocs.io/en/latest/notebooks/configProperties.html#Node-Properties).

We will compute the settings difference between the snapshots using differential questions. Batfish makes its models available via a [set of questions](https://batfish.readthedocs.io/en/latest/questions.html). When questions are run in differential mode, it outputs how the answer differ across two snapshots. 

In [4]:
# Properties of interest
NODE_PROPERTIES = ["NTP_Servers" , "Domain_Name", "VRFs"]

# Compute the difference across two snapshots and return a Pandas DataFrame
node_diff = bfq.nodeProperties(properties=",".join(NODE_PROPERTIES)).answer(snapshot="snapshot", 
                                                                            reference_snapshot="reference").frame()

# Print the first two rows so we can see the DataFrame schema
show(node_diff.head())

Unnamed: 0,Node,KeyPresence,Snapshot_Domain_Name,Reference_Domain_Name,Snapshot_NTP_Servers,Reference_NTP_Servers,Snapshot_VRFs,Reference_VRFs
0,as1border1,In both,lab.localp,lab.local,,,default,default
1,as1border2,In both,lab.local,lab.local,18.18.18.19 18.18.18.18,23.23.23.23 18.18.18.18,default,default


In [5]:
# Print readable messages on the differences
diff_properties(node_diff, "Node", ["Node"], NODE_PROPERTIES)

Differences for Node=as1border1
    Domain_Name: lab.local -> lab.localp
Differences for Node=as1border2
    NTP_Servers: ['23.23.23.23', '18.18.18.18'] -> ['18.18.18.19', '18.18.18.18']


### Interface-level properties

We next check if any interface-level settings have changed. We again focus on three example settings: 1) Whether the interface is active, 2) Description, and 3) Primary IP address. The complete list of interface settings extracted by Batfish are [here](https://batfish.readthedocs.io/en/latest/notebooks/configProperties.html#Interface-Properties).


In [6]:
# Properties of interest
INTERFACE_PROPERTIES = ['Active', 'Description', 'Primary_Address']

# Compute the difference across two snapshots and return a Pandas DataFrame
interface_diff = bfq.interfaceProperties(properties=",".join(INTERFACE_PROPERTIES)).answer(snapshot="snapshot", 
                                                                               reference_snapshot="reference").frame()

# Print readable messages on the differences
diff_properties(interface_diff, "Interface", ["Interface"], INTERFACE_PROPERTIES)

Differences for Interface=as2border2[GigabitEthernet0/0]
    Active: True -> False
    Primary_Address: 10.23.21.2/24 -> None
Differences for Interface=as2core1[GigabitEthernet0/0]
    Description: None -> "To as2border1 GigabitEthernet1/0"
Differences for Interface=as2core1[GigabitEthernet1/0]
    Description: None -> "To as2border2 GigabitEthernet2/0"


### BGP process and peer properties

We check BGP processes and peers next. For BGP processes, we will focus on the set of neighbors defined, and for BGP neighbors we will focus on four example properties: 1) Remote AS, 2) Description, 3) Import policies applies to the peer, and 4) Export policies applied to the peer. The complete list of BGP process properties are [here](https://batfish.readthedocs.io/en/latest/notebooks/configProperties.html#BGP-Process-Configuration) and those of BGP peers are [here](https://batfish.readthedocs.io/en/latest/notebooks/configProperties.html#BGP-Peer-Configuration).


In [7]:
# Properties of interest
BGP_PROCESS_PROPERTIES = ['Multipath_EBGP']
BGP_PEER_PROPERTIES = ['Remote_AS', 'Description', 'Peer_Group', 'Import_Policy', 'Export_Policy']

# Compute the difference across two snapshots and return a Pandas DataFrame
bgp_process_diff = bfq.bgpProcessConfiguration(properties=",".join(BGP_PROCESS_PROPERTIES)).answer(snapshot="snapshot", 
                                                                                    reference_snapshot="reference").frame()

bgp_peer_diff = bfq.bgpPeerConfiguration(properties=",".join(BGP_PEER_PROPERTIES)).answer(snapshot="snapshot", 
                                                                                    reference_snapshot="reference").frame()

#Print readable messages on the differences
diff_properties(bgp_process_diff, "BgpProcess", ["Node", "VRF", "Router_ID"], BGP_PROCESS_PROPERTIES)
print()
diff_properties(bgp_peer_diff, "BgpPeer", ["Node", "VRF", "Local_Interface", "Remote_IP"], BGP_PEER_PROPERTIES)

Differences for Node=as2dept1,VRF=default,Router_ID=2.1.4.1
    Multipath_EBGP: True -> False

BgpPeers only in snapshot
    Node=as2dept1,VRF=default,Local_Interface=None,Remote_IP=2.34.209.3
Differences for Node=as2dist1,VRF=default,Local_Interface=None,Remote_IP=2.34.101.4
    Peer_Group: dept -> dept2
    Import_Policy: ['dept_to_as2dist'] -> []
    Export_Policy: ['as2dist_to_dept'] -> []


### Structures defined in configs



In [8]:
# Extract defined structures from both snapshots as a Pandas DataFrame
snapshot_structures = bfq.definedStructures().answer(snapshot="snapshot").frame()
reference_structures = bfq.definedStructures().answer(snapshot="reference").frame()

# Show me what the schema looks like
show(snapshot_structures.head())

Unnamed: 0,Structure_Type,Structure_Name,Source_Lines
0,bgp peer-group,as2,"FileLines(filename='configs/as1border1.cfg', lines=[81])"
1,interface,GigabitEthernet1/0,"FileLines(filename='configs/as1core1.cfg', lines=[69, 70, 71])"
2,extended ipv4 access-list,OUTSIDE_TO_INSIDE,"FileLines(filename='configs/as2border2.cfg', lines=[132, 133, 134])"
3,route-map,as2dist_to_dept,"FileLines(filename='configs/as2dist1.cfg', lines=[123, 124, 125, 126])"
4,extended ipv4 access-list,103,"FileLines(filename='configs/as1border2.cfg', lines=[140])"


In [9]:
# Remove the line number information since we don't care about where exactly the structure was defined
snapshot_structures_without_lines = snapshot_structures[['Structure_Type', 'Structure_Name']].assign(
    File_Name=snapshot_structures["Source_Lines"].map(lambda x: x.filename))
reference_structures_without_lines = reference_structures[['Structure_Type', 'Structure_Name']].assign(
    File_Name=reference_structures["Source_Lines"].map(lambda x: x.filename))

# Print a readable message on the differences
diff_frames(snapshot_structures_without_lines, 
            reference_structures_without_lines, 
            "DefinedStructure")

DefinedStructures only in snapshot
     File_Name=configs/as2dist1.cfg,Structure_Name=dept2,Structure_Type=bgp peer-group
     File_Name=configs/as3border1.cfg,Structure_Name=bogons,Structure_Type=ipv4 prefix-list
DefinedStructures only in reference
     File_Name=configs/as2dist1.cfg,Structure_Name=dept,Structure_Type=bgp peer-group


### Undefined references

In [10]:
# Extract undefined references from both snapshots as a Pandas DataFrame
snapshot_undefined_references=bfq.undefinedReferences().answer(snapshot="snapshot").frame()
reference_undefined_references= bfq.undefinedReferences().answer(snapshot="reference").frame()

# Show me what the schema looks like
show(snapshot_undefined_references.head())

Unnamed: 0,File_Name,Struct_Type,Ref_Name,Context,Lines
0,configs/as2core2.cfg,route-map,filter-bogons,bgp inbound route-map,"FileLines(filename='configs/as2core2.cfg', lines=[110])"
1,configs/as2dist1.cfg,community-list,dept_community_new,route-map match community-list,"FileLines(filename='configs/as2dist1.cfg', lines=[133])"
2,configs/as2dist1.cfg,undeclared bgp peer-group,dept,bgp peer-group referenced before defined,"FileLines(filename='configs/as2dist1.cfg', lines=[99, 100, 101])"


In [11]:
# Remove the line number information since we don't care about where it was referenced
snapshot_undefined_references_without_lines = snapshot_undefined_references.drop(columns=['Lines'])
reference_undefined_references_without_lines = reference_undefined_references.drop(columns=['Lines'])

# Print a readable message on the differences
diff_frames(snapshot_undefined_references_without_lines, 
            reference_undefined_references_without_lines, 
            "UndefinedRefeference")

UndefinedRefeferences only in snapshot
     Ref_Name=dept_community_new,File_Name=configs/as2dist1.cfg,Struct_Type=community-list,Context=route-map match community-list
     Ref_Name=dept,File_Name=configs/as2dist1.cfg,Struct_Type=undeclared bgp peer-group,Context=bgp peer-group referenced before defined


### BGP adjacencies


In [12]:
# Get the edges from both snapshots as Pandas DataFrames
snapshot_bgp_edges = bfq.bgpEdges().answer(snapshot="snapshot").frame()
reference_bgp_edges = bfq.bgpEdges().answer(snapshot="reference").frame()

# Show me the schema
show(snapshot_bgp_edges.head())

Unnamed: 0,Node,IP,Interface,AS_Number,Remote_Node,Remote_IP,Remote_Interface,Remote_AS_Number
0,as1border2,1.2.2.2,,1,as1core1,1.10.1.1,,1
1,as1core1,1.10.1.1,,1,as1border1,1.1.1.1,,1
2,as2dist2,2.1.3.2,,2,as2core2,2.1.2.2,,2
3,as3border2,3.2.2.2,,3,as3core1,3.10.1.1,,3
4,as2dist2,2.34.201.3,,2,as2dept1,2.34.201.4,,65001


In [13]:
# Only retain the Node and Remote_Node columns, and keep only one row across both directions
snapshot_bgp_edges_nodes = snapshot_bgp_edges[snapshot_bgp_edges['Node'] < snapshot_bgp_edges['Remote_Node']][['Node', 'Remote_Node']]
reference_bgp_edges_nodes = reference_bgp_edges[reference_bgp_edges['Node'] < reference_bgp_edges['Remote_Node']][['Node', 'Remote_Node']]

# Print a readable message on the differences
diff_frames(snapshot_bgp_edges_nodes, 
            reference_bgp_edges_nodes, 
            "BgpEdge")

BgpEdges only in reference
     Node=as2border2,Remote_Node=as3border1


### Compare packet filters

Batfish offers two natively differential questions whose answers explain the behavioral differences for packet filters (ACLs, firewall rules) and end-to-end reachability. We will use both types of differential queries in the analysis below.

In [14]:
compare_filters = bfq.compareFilters().answer(snapshot='snapshot',reference_snapshot='reference').frame()
show(compare_filters)

Unnamed: 0,Node,Filter_Name,Line_Index,Line_Content,Line_Action,Reference_Line_Index,Reference_Line_Content
0,as2dist2,105,4,permit ip host 3.0.3.0 host 255.255.255.0,PERMIT,End of ACL,


### Summary
