## Getting Started with Batfish

This notebook uses pybatfish, a Python-based SDK for Batfish, to analyze a sample network. It shows how to submit your configurations and other network data for analysis and how to extract information from its vendor neutral data model. 

### Initializing a Network and Snapshot

A *network* is a logical group of routers and links. It can be your entire network or a subset of it. A *snapshot* is a collection of information (configuration files, routing data, etc…) that represent the network state at a point in time. Snapshots can contain the actual configuration of network devices or candidate configurations.

**NB: Make sure that the Batfish service is running on localhost before proceeding further**

In [1]:
# Import pybatfish and other needed packages
%run startup.py

In [2]:
# Assign a friendly name to your network and snapshot
NETWORK_NAME = "example_network"
SNAPSHOT_NAME = "example_snapshot"

# Update SNAPSHOT_PATH to point to a directory containing your network snapshots.
# Example snapshots are available in the test_rigs folder of the Batfish repository.
# See [here](https://github.com/batfish/batfish/wiki/Packaging-testrigs-for-analysis) for instructions on packaging your own data for analysis.
SNAPSHOT_PATH = "../test_rigs/example"

# Now create the network and initialize the snapshot
bf_set_network(NETWORK_NAME)
bf_init_snapshot(SNAPSHOT_PATH, name=SNAPSHOT_NAME, overwrite=True)

'example_snapshot'

### Extracting Information
Batfish creates a comprehensive vendor neutral device and network model from which information such as list of devices, interface state, VRFs etc. can be extracted and validated. This notebook focuses on extractions. See other notebooks on how to validate extracted information. 

*Questions* are the mechanism by which you extract information from the Batfish service about your network and snapshot(s).

In [3]:
# Load questions from Batfish
load_questions()

In [4]:
# To see available questions, you can use
list_questions()

# You can also use tab-completion on the Batfish question module - bfq. -> press TAB key,
# uncomment and try on the following line
# bfq.

# You can find out usage for a question by calling .help() on it; note the `()` before .help()
bfq.nodeProperties().help()

Gets properties of nodes


:param nodeRegex:  Only include nodes that match this specification.  
:type nodeRegex: nodeSpec

:param propertySpec:  Which properties to fetch; default is all of them.  
:type propertySpec: nodePropertySpec



### Getting status of parsed files

Batfish may ignore certain lines in the configuration. To retrieve the parsing status of snapshot files, use the fileParseStatus question

In [5]:
parse_status = bfq.fileParseStatus().answer().frame()

`answer()` runs the question and returns the answer in a JSON format. 

`frame()` wraps the answer as [pandas dataframe](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html).

Additional post-processing can be done on this data, like filtering for values in one or multiple columns, reducing the number of columns, etc. using pandas. This post-processing can be the basis of lightweight validation; Batfish also includes a range of questions for deeper validation (as discussed in other notebooks).

Information on pandas-based processing can be found in the [pandas tutorial on filtering](http://nbviewer.jupyter.org/github/jvns/pandas-cookbook/blob/v0.2/cookbook/Chapter%203%20-%20Which%20borough%20has%20the%20most%20noise%20complaints%20%28or%2C%20more%20selecting%20data%29.ipynb).

In [6]:
# An example: use a filter on the returned dataframe to see which files failed to parse completely
parse_status[parse_status['Status'] != 'PASSED']  # change '!=' to '==' to get the files which passed

Unnamed: 0,Filename,Status,Hosts


In [7]:
# View details if some of the files were not parsed completely
parse_warning = bfq.parseWarning().answer().frame()

parse_warning

Unnamed: 0,Filename,Text,Line,Parser_Context,Comment


### Inspecting referential integrity of configuration structures
Network configuratons define and reference named structures like route maps, access control lists (ACLs), prefix lists, etc. Two common indicators of buggy configurations include references to structures that are not defined anywhere or defined structures that are not referenced anywhere. Batfish makes it easy to flag such instances because it understand the underlying semantics of configuration.

##### TODO: cells for undefined references and unused structures

### Extracting properties of network entities
Entities in the network refer to things like nodes, interfaces, routing processes, and VRFs. Batfish makes it trivial to extract configured properties of such entities in a vendor neutral manner. 

##### Node properties
The nodeProperties question extract information on nodes in the snapshot.

In [8]:
# Extract the properties of all nodes whose names match the regular expression '.*border.*'
node_properties = bfq.nodeProperties(nodeRegex=".*border.*").answer().frame()

In [9]:
# View what columns (properties) are present in the answer
node_properties.columns

Index(['node', 'domain-name', 'ip6-access-lists', 'tacacs-servers',
       'logging-servers', 'tacacs-source-interface', 'ipsec-vpns',
       'snmp-source-interface', 'hostname', 'ntp-source-interface',
       'configuration-format', 'routing-policies', 'dns-servers',
       'dns-source-interface', 'ike-policies', 'device-type',
       'route6-filter-lists', 'canonical-ip', 'route-filter-lists',
       'interfaces', 'ip-access-lists', 'logging-source-interface',
       'authentication-key-chains', 'ipsec-policies', 'zones',
       'community-lists', 'ip-spaces', 'default-cross-zone-action',
       'default-inbound-action', 'ipsec-proposals', 'snmp-trap-servers',
       'as-path-access-lists', 'ntp-servers', 'ike-gateways', 'vendor-family',
       'vrfs'],
      dtype='object')

In [10]:
# To extract only a subset of properties, use the propertySpec parameter
node_properties_trunc = bfq.nodeProperties(nodeRegex=".*border.*", propertySpec="domain-name|ntp-servers|interfaces").answer().frame()

node_properties_trunc

Unnamed: 0,node,domain-name,interfaces,ntp-servers
0,as2border1,lab.local,"[GigabitEthernet0/0, GigabitEthernet1/0, GigabitEthernet2/0, Ethernet0/0, Loopback0]","[18.18.18.18, 23.23.23.23]"
1,as3border1,lab.local,"[GigabitEthernet0/0, GigabitEthernet1/0, Ethernet0/0, Loopback0]","[18.18.18.18, 23.23.23.23]"
2,as1border1,lab.local,"[GigabitEthernet0/0, GigabitEthernet1/0, Ethernet0/0, Loopback0]",[]
3,as1border2,lab.local,"[GigabitEthernet0/0, GigabitEthernet1/0, GigabitEthernet2/0, Ethernet0/0, Loopback0]","[18.18.18.18, 23.23.23.23]"
4,as2border2,lab.local,"[GigabitEthernet0/0, GigabitEthernet1/0, GigabitEthernet2/0, Ethernet0/0, Loopback0]",[18.18.18.18]
5,as3border2,lab.local,"[GigabitEthernet0/0, GigabitEthernet1/0, Ethernet0/0, Loopback0]","[18.18.18.18, 23.23.23.23]"


An alternative (client-side) way to restrict the list of columns displayed is to use pandas-based column filtering ([pandas tutorial](http://nbviewer.jupyter.org/github/jvns/pandas-cookbook/blob/v0.2/cookbook/Chapter%202%20-%20Selecting%20data%20%26%20finding%20the%20most%20common%20complaint%20type.ipynb)).

In [11]:
# Let's remove the interfaces column from our result
node_properties_trunc = node_properties_trunc[["node", "domain-name", "ntp-servers"]]

node_properties_trunc

Unnamed: 0,node,domain-name,ntp-servers
0,as2border1,lab.local,"[18.18.18.18, 23.23.23.23]"
1,as3border1,lab.local,"[18.18.18.18, 23.23.23.23]"
2,as1border1,lab.local,[]
3,as1border2,lab.local,"[18.18.18.18, 23.23.23.23]"
4,as2border2,lab.local,[18.18.18.18]
5,as3border2,lab.local,"[18.18.18.18, 23.23.23.23]"


You can add additional filters to restrict entries based on values of columns.

In [12]:
# View only nodes with **23.23.23.23** as one of the configured ntp-servers
node_properties_trunc[node_properties_trunc['ntp-servers'].apply(lambda x:'23.23.23.23' in x)]

Unnamed: 0,node,domain-name,ntp-servers
0,as2border1,lab.local,"[18.18.18.18, 23.23.23.23]"
1,as3border1,lab.local,"[18.18.18.18, 23.23.23.23]"
3,as1border2,lab.local,"[18.18.18.18, 23.23.23.23]"
5,as3border2,lab.local,"[18.18.18.18, 23.23.23.23]"


#### Interface properties
To retrieve information about interfaces present and the properties of them, use the **interfaceProperties** question

In [13]:
interface_properties = bfq.interfaceProperties(nodeRegex=".*border.*", propertySpec="interface-type|bandwidth|vrf|primary-address").answer().frame()

If you wanted to just find interfaces with the primary ip address in <b>10.12.0.0/16</b>, you can filter the results as shown below.

**na=False** is required in order to ignore interfaces without any configured IP addresses, such as ethernet switchports.


In [14]:
interface_properties[interface_properties['primary-address'].str.match("10.12", na=False)]

Unnamed: 0,interface,bandwidth,interface-type,vrf,primary-address
4,as2border1:GigabitEthernet0/0,1000000000.0,PHYSICAL,default,10.12.11.2/24
14,as1border1:GigabitEthernet1/0,1000000000.0,PHYSICAL,default,10.12.11.1/24


Similar questions extract properties of other entities (e.g., bgpProperties() extracts properties of BGP processes).

### Exploring Routing and Forwarding Tables (RIBs and FIBs)
Batfish can compute routing and forwarding tables (aka dataplane) of the network from snapshot data itself. This data plane can be examined to understand the routing and forwarding behavior of the network. 

In [15]:
# Fetch the routing table of all VRFs on all nodes in the snapshot
routes_df = bfq.routes().answer().frame()

(For a large network, the first time you run a question that needs the dataplane, fetching the answer can take a few minutes. Subsequent questions are quick as the generated dataplane is saved by Batfish.)

As used above, the routes() question can generate a lot of results. You may restrict the output by using filter in the question. To restrict the results to **border** routers, use **nodeRegex = "\.\*border\.\*"** as parameters for the routes question. 

As for properties, you can also just filter the results on the client side using pandas. For example, if you wanted to see all the routes on all the nodes/VRFs for the network **90.90.90.0/24** with an **Admin Distance of 0**", you can filter using multiple conditions in [pandas](http://pandas.pydata.org/pandas-docs/version/0.15/indexing.html#boolean-indexing)

In [16]:
routes_df[(routes_df['Network'] == "90.90.90.0/24") & (routes_df["AdminDistance"] == 0)]

Unnamed: 0,Node,VRF,Network,Protocol,Tag,NextHopIp,NextHop,AdminDistance,Metric
21,as3core1,default,90.90.90.0/24,connected,-1,AUTO/NONE(-1l),,0,0
22,as3core1,default,90.90.90.0/24,connected,-1,AUTO/NONE(-1l),,0,0


***
That's it for now! Feel free to explore further by adding cells and running other questions, or play with the other notebooks in the repository.