### Testing and Validation of a Network with Batfish

Network engineers often need to verify their networks via assertions on device configurations.  This can become a daunting task for large networks when done manually.  However, programmatically performing these assertions on the vendor-independent model produced by Batfish can be done quickly and easily.

In this notebook, we will look at different types of validations that can be done with Batfish.  More specifically, we will consider validation of configured NTP servers and we will see how these validations are possible with [Pandas APIs](https://pandas.pydata.org/pandas-docs/stable/).

In [1]:
# Importing packages and loading questions
%run startup.py

### Initializing our Network and Snapshot

In [2]:
NETWORK_NAME = "example_network"
SNAPSHOT_NAME = "example_snapshot"
SNAPSHOT_PATH = "../test_rigs/example"

bf_set_network(NETWORK_NAME)
bf_init_snapshot(SNAPSHOT_PATH, name=SNAPSHOT_NAME)

'{\n  "answerElements" : [\n    {\n      "class" : "org.batfish.datamodel.answers.InitInfoAnswerElement",\n      "parseStatus" : {\n        "as1border1" : "PASSED",\n        "as1border2" : "PASSED",\n        "as1core1" : "PASSED",\n        "as2border1" : "PASSED",\n        "as2border2" : "PASSED",\n        "as2core1" : "PASSED",\n        "as2core2" : "PASSED",\n        "as2dept1" : "PASSED",\n        "as2dist1" : "PASSED",\n        "as2dist2" : "PASSED",\n        "as3border1" : "PASSED",\n        "as3border2" : "PASSED",\n        "as3core1" : "PASSED",\n        "host1" : "PASSED",\n        "host2" : "PASSED",\n        "iptables/host1.iptables" : "PASSED",\n        "iptables/host2.iptables" : "PASSED"\n      }\n    }\n  ],\n  "status" : "SUCCESS",\n  "summary" : {\n    "numFailed" : 0,\n    "numPassed" : 0,\n    "numResults" : 0\n  }\n}\n'

### Validating NTP Servers Configuration
There are many potential assertions we might apply on NTP server configuration, but in this notebook we will focus on  the common scenarios we have observed in some real world networks.
In this exercise, we will try to validate the following scenarios with respect to a set of reference servers **[18.18.18.18, 23.23.23.23]**:
* Every node has some NTP server configured (not necessarily from the reference servers).
* NTP servers on all nodes should be the same as the reference servers.
* NTP servers on all nodes should contain at least one NTP server from the reference servers.
* NTP server configuration matches definition from a database.

In [3]:
ref_ntp_servers = set(['18.18.18.18', '23.23.23.23'])

To start with, let's see the NTP servers configured on the nodes having name containing ***border***.

In [78]:
node_props = bfq.nodeProperties(nodeRegex=".*border.*", propertySpec="ntp-servers").answer().frame()
node_props



Unnamed: 0,node,ntp-servers
0,as1border1,[]
1,as1border2,"[18.18.18.18, 23.23.23.23]"
2,as2border1,"[18.18.18.18, 23.23.23.23]"
3,as2border2,[18.18.18.18]
4,as3border1,"[18.18.18.18, 23.23.23.23]"
5,as3border2,"[18.18.18.18, 23.23.23.23]"


#### Every node has some NTP server configured
If a node has at least one NTP server configured (may be arbitarary and not from our reference set), it will pass our assertion.

All nodes which have an empty set of NTP servers will be violators of this assertion. Following command can get us a table of violators.

In [5]:
# Violators
ns_violators = node_props[node_props["ntp-servers"].apply(lambda x: not bool(x))]
ns_violators

Unnamed: 0,node,ntp-servers
0,as1border1,[]


#### NTP servers on all nodes should be the same as our reference
A common use case for validating NTP servers generally involves checking that the set of NTP servers on all relevant nodes is equal to a given set. Doing this using pybatfish/pandas is pretty straightforward.

To get a table of violators an example command would look like

In [6]:
# Violators (Nodes whose set of NTP servers is not equal to our reference set)
ns_violators = node_props[node_props["ntp-servers"].apply(lambda x: ref_ntp_servers != set(x))]
ns_violators

Unnamed: 0,node,ntp-servers
0,as1border1,[]
3,as2border2,[18.18.18.18]


If you want to know more about the **lambda** keyword, see [lambda expressions](https://docs.python.org/3/reference/expressions.html#lambda).

To look at which nodes actually have different NTP servers compared to our reference set, we can do a set difference on the **ntp-servers** column. Commands to do that would look like below:

In [7]:
ns_difference = node_props["ntp-servers"].map(lambda x: ref_ntp_servers - set(x))
# Let's pair it up with the node columns for a better view
diff_df = pd.DataFrame({'node': node_props["node"], 'ntp-servers-missing': ns_difference})
# Getting only the rows with a non-empty ntp-server-difference
diff_df[diff_df["ntp-servers-missing"].apply(lambda x: bool(x))]

Unnamed: 0,node,ntp-servers-missing
0,as1border1,"{23.23.23.23, 18.18.18.18}"
3,as2border2,{23.23.23.23}


#### NTP servers on all nodes should contain at least one NTP server from our reference
NTP servers on all nodes should contain at least one NTP server from our reference. This is a more lenient version of the previous check where if any node contains at least one NTP server from our reference set, it passes our assertion.

To get a table of violators, we can use the following command, the intersection of reference NTP servers and a Node's NTP servers should be empty to make it a violator:

In [8]:
# Violators (Nodes which do not contain even a single NTP server from our reference set)
ns_violators = node_props[node_props["ntp-servers"].apply(lambda x: not bool(ref_ntp_servers.intersection(set(x))))]
ns_violators

Unnamed: 0,node,ntp-servers
0,as1border1,[]


So _as1border1_ contains an empty set of NTP servers which clearly violates our assertion in this case.

#### NTP server configuration matches database
Each node's NTP servers should match those defined in a database. This sort of check enables easy validation of configurations which may be non-uniform across nodes.

We will assume data from the database is fetched in the following format, where node names are dictionary keys and specific properties are defined in a property-keyed sub-dictionary:


In [193]:
# Mock reference-node-data, presumably taken from some database
database = {'as1border1': {'ntp-servers': ['23.23.23.23'], 'dns-servers': ['1.1.1.1']},
            'as1border2': {'ntp-servers': ['23.23.23.23'], 'dns-servers': ['1.1.1.1']},
            'as2border1': {'ntp-servers': ['18.18.18.18', '23.23.23.23'], 'dns-servers': ['2.2.2.2']},
            'as2border2': {'ntp-servers': ['18.18.18.18'], 'dns-servers': ['1.1.1.1']},
            'as3border1': {'ntp-servers': ['18.18.18.18', '23.23.23.23'], 'dns-servers': ['2.2.2.2']},
            'as3border2': {'ntp-servers': ['18.18.18.18', '23.23.23.23'], 'dns-servers': ['2.2.2.2']},
           }


Note that:
* There is an extra property in this dictionary that we don't care about comparing right now: `dns-server` (we will just filter this out below, before comparnig the dataframe from Batfish to the one we are generating from the database)
* **as1border1** has **23.23.23.23** listed as its `ntp-servers`, which does not match the empty list of servers in the `Batfish` dataframe
* **as1border2** has only **23.23.23.23** as its `ntp-servers`, which is missing **18.18.18.18** when compared to the two servers listed in the `Batfish` dataframe

After a little tweaking, the database and `Batfish` dataframes can be compared to generate two sets of servers: missing (defined in the database but not in the configurations) and extra (defined in the configurations but not in the database).

In [198]:
col_name = "ntp-servers"

# Transpose database data so each node has its own row
database_df = pd.DataFrame(data=database).transpose()

# Index on node for easier comparison
df_bf = node_props.set_index('node')

# Select only columns present in node_props (get rid of the extra dns-servers column)
df_db = database_df[list(df2)]

# Convert server lists into sets to support arithmetic below
df_bf[col_name] = df_bf[col_name].apply(set)
df_db[col_name] = df_db[col_name].apply(set)

# Figure out what servers are in the configs but not the database and vice versa
missing_servers = (df_db - df_bf).rename(columns={col_name: 'missing-{}'.format(col_name)})
extra_servers = (df_bf - df_db).rename(columns={col_name: 'extra-{}'.format(col_name)})
result = pd.concat([missing_servers, extra_servers], axis=1, sort=False)
result

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


Unnamed: 0_level_0,missing-ntp-servers,extra-ntp-servers
node,Unnamed: 1_level_1,Unnamed: 2_level_1
as1border1,{23.23.23.23},{}
as1border2,{},{18.18.18.18}
as2border1,{},{}
as2border2,{},{}
as3border1,{},{}
as3border2,{},{}


That's it for now!