## Introduction to Path Analysis using Batfish

Path analysis is one of the most common tasks a network engineer will undertake, but one of the most complicated ones. Traditionally, path analysis is performed by using `traceroute`. In a lot of instances, this must be performed from multiple locations in the network. This distributed debugging is highly complex even in a moderately-sized network. Batfish makes this task extremely simple by providing an easy-to-use queries.

In this notebook, we will look at how you can perform path analysis with Batfish.

![Analytics](https://ga-beacon.appspot.com/UA-100596389-3/open-source/pybatfish/jupyter_notebooks/intro-path-analysis?pixel&useReferer)

In [1]:
# Import packages and load questions
%run startup.py

  "Pybatfish public API is being updated, note that API names and parameters will soon change.")


 ### Initializing the Network and Snapshot

`SNAPSHOT_PATH` below can be updated to point to a custom snapshot directory, see the [Batfish instructions](https://github.com/batfish/batfish/wiki/Packaging-snapshots-for-analysis) for how to package data for analysis.<br>
More example networks are available in the [networks](https://github.com/batfish/batfish/tree/master/networks) folder of the Batfish repository.

In [2]:
# Initialize a network and snapshot
NETWORK_NAME = "example_network"
SNAPSHOT_NAME = "example_snapshot"

SNAPSHOT_PATH = "networks/example"

bf_set_network(NETWORK_NAME)
bf_init_snapshot(SNAPSHOT_PATH, name=SNAPSHOT_NAME, overwrite=True)

'example_snapshot'

The network snapshot that we initialized above is illustrated below. You can download/view devices' configuration files [here](https://github.com/batfish/pybatfish/tree/master/jupyter_notebooks/networks/example).

![example-network](https://raw.githubusercontent.com/batfish/pybatfish/master/jupyter_notebooks/networks/example/example-network.png)

All of the information we will show you in this notebook is dynamically computed by Batfish based on the configuration files for the network devices.

### Find the path taken by AS3 Core routers to reach host1, the DNS Server in AS2

To do this, we will use the `traceroute` question in Batfish. The question has a lot of parameters that you can specify, but we will start with the simple case.

By specifying `dst = "host1"` BF will automatically pick the IP address for host1 as the destination IP address for the question. If `host1` had multiple interfaces and IP address, Batfish would pick one. Further in the notebook, we will cover how you can explicitly control which of the addresses it will choose.

We want the query to start from the `Loopback0` interface on `as3core1`, and we want to use the IP address of that interface as the source address. We can accomplish both of these goals by specifying the correct value as `traceStart`. Setting `traceStart` to `as3core1[Loopback0]` tells Batfish to start the traceroute at `as3core1` with the IP address of `Loopback0` as the source IP

In [16]:
# start the traceroute from the Loopback0 interface of as3core1 to host1
tracert = bfq.traceroute(traceStart = "as3core1[Loopback0]", dst = 'host1').answer()

Let's take a look at the results of the query. 

Compared to running traceroute from an actual node in your network, Batfish returns additional information:
1. All active parallel paths between the source and destination
2. The reason why each hop in a path is taken (the specific routing entry that was matched)
3. Disposition of the packet for each path

First, lets find out how many parallel active paths are present between the source and destination

In [21]:
# number of paths between as3core1 and host1
path_count = len(tracert['answerElements'][0]['rows'][0]['Traces'])
print("Number of active paths between source and destination: {}".format(path_count))

Number of active paths between source and destination: 4


Now let's take a look at one of those paths in more detail. Let's start with the first path

In [20]:
# path number we want to see
num_path = 0
# print the hops for the first trace
hop_df = pd.DataFrame(tracert['answerElements'][0]['rows'][0]['Traces'][num_path]['hops'])
hop_df

Unnamed: 0,edge,filterIn,routes
0,"{'node1': 'as3core1', 'node1interface': 'GigabitEthernet1/0', 'node2': 'as3border1', 'node2interface': 'GigabitEthernet0/0'}",,"[BgpRoute<2.128.0.0/16,nhip:10.23.21.2,nhint:dynamic>_fnhip:3.0.1.1]"
1,"{'node1': 'as3border1', 'node1interface': 'GigabitEthernet1/0', 'node2': 'as2border2', 'node2interface': 'GigabitEthernet0/0'}",{OUTSIDE_TO_INSIDE}{permit ip any any},"[BgpRoute<2.128.0.0/16,nhip:10.23.21.2,nhint:dynamic>_fnhip:10.23.21.2]"
2,"{'node1': 'as2border2', 'node1interface': 'GigabitEthernet1/0', 'node2': 'as2core2', 'node2interface': 'GigabitEthernet0/0'}",,"[BgpRoute<2.128.0.0/24,nhip:2.34.101.4,nhint:dynamic>_fnhip:2.12.22.2, BgpRoute<2.128.0.0/24,nhip:2.34.201.4,nhint:dynamic>_fnhip:2.12.22.2]"
3,"{'node1': 'as2core2', 'node1interface': 'GigabitEthernet2/0', 'node2': 'as2dist2', 'node2interface': 'GigabitEthernet0/0'}",,"[BgpRoute<2.128.0.0/24,nhip:2.34.201.4,nhint:dynamic>_fnhip:2.23.22.3]"
4,"{'node1': 'as2dist2', 'node1interface': 'GigabitEthernet2/0', 'node2': 'as2dept1', 'node2interface': 'GigabitEthernet1/0'}",,"[BgpRoute<2.128.0.0/24,nhip:2.34.201.4,nhint:dynamic>_fnhip:2.34.201.4]"
5,"{'node1': 'as2dept1', 'node1interface': 'GigabitEthernet2/0', 'node2': 'host1', 'node2interface': 'eth0'}",,"[ConnectedRoute<2.128.0.0/24,nhip:AUTO/NONE(-1l),nhint:GigabitEthernet2/0>_fnhip:null]"


Let's look at this table in a bit more detail. It has the following columns:
1. `edge`: This is telling you the link that the packet traverses
2. `filterIn`: This tells you the name of any input filter that is applied on the input interface of the 2nd node in the `edge`
3. `routes`: This tells you the exact route on the first node in the `edge` that caused the packet to traverse that `edge`

What we see from the above output is that the packet was delivered to `host1` interface `eth0` by the node `as2dept1` from interface `GigabitEthernet2/0`. So now let's see what happened to the packet at `host1`

In [23]:
disposition = tracert['answerElements'][0]['rows'][0]['Traces'][0]['disposition']
print("Traceroute disposition was: {}".format(disposition))

Traceroute disposition was: DENIED_IN


The packet was `DENIED_IN`, which means that the input filter on the host denied the packet. Let's determine the name of the filter and the specific rule that matched the packet

In [24]:
detailed_disposition = tracert['answerElements'][0]['rows'][0]['Traces'][0]['notes']
print("Details about packet disposition: {}".format(detailed_disposition))

Details about packet disposition: DENIED_IN{filter::INPUT}{default}


### Abstract path analysis (reachability)


Traceroute allows you to find the paths taken by a specified flow through the network, as computed by Batfish. This is very useful in analyzing a connectivity issue.

Batfish has an even more powerful query, `reachability`, that will allow you to explore the network in a more abstract fashion. This is very useful 
when you want to build a set of tests for the network to ensure security and reliability.

The `reachability` question allows you to find those needle in the haystack issues. If you want to find potential flows that will be `DROPPED` or `ACCEPTED` in the network, without advanced knowledge of the `header space (srcIp, dstIp, IpProtocol, srcPort, dstPort)` then `reachability` is the way to go.

Let's take the same start location, but instead of analyzing traffic to host1, let's analyze traffic for any host in the same subnet.
That subnet is connected to `as2dept1` interface `GigabitEthernet2/0`.

And instead of using the IP address of `as3core1` interface `Loopback0`, we will analyze the enter IPv4 address space

In [33]:
reach = bfq.reachabilityExt(start = "enter(as3core1)", srcIp = "0.0.0.0/0", dstIp = "ofLocation(enter(as2dept1[GigabitEthernet2/0]))", actions = ['DROP']).answer()
reach = reach['answerElements'][0]
reach

{'class': 'org.batfish.question.jsonpath.JsonPathQuestionPlugin$JsonPathAnswerElement',
 'results': {'0': {'extractedValues': {"'traces'->'Flow<ingressNode:as3core1 iface:GigabitEthernet0/0 srcIp:2.128.4.0 dstIp:2.128.0.254 ipProtocol:TCP srcPort:0 dstPort:SSH(22) dscp: 0 ecn:0 fragmentOffset:0 packetLength:0 state:NEW tcpFlags:00000000 tag:BASE>'": {'flow': {'dscp': 0,
      'dstIp': '2.128.0.254',
      'dstPort': 22,
      'ecn': 0,
      'fragmentOffset': 0,
      'icmpCode': 255,
      'icmpVar': 255,
      'ingressInterface': 'GigabitEthernet0/0',
      'ingressNode': 'as3core1',
      'ipProtocol': 'TCP',
      'packetLength': 0,
      'srcIp': '2.128.4.0',
      'srcPort': 0,
      'state': 'NEW',
      'tag': 'BASE',
      'tcpFlagsAck': 0,
      'tcpFlagsCwr': 0,
      'tcpFlagsEce': 0,
      'tcpFlagsFin': 0,
      'tcpFlagsPsh': 0,
      'tcpFlagsRst': 0,
      'tcpFlagsSyn': 0,
      'tcpFlagsUrg': 0},
     'flowTraces': [{'disposition': 'DENIED_IN',
       'hops': [{'edge

Now, what if we wanted to know what flows that are not generated by sources in your network, would be able to reach `host1`, we would run the following `reachability` query

In [36]:
reach = bfq.reachabilityExt(start = "as2border.*", srcIp = "0.0.0.0/0", dstIp = "ofLocation(host1)", actions = ['ACCEPT']).answer()
reach = reach['answerElements'][0]
reach

{'class': 'org.batfish.question.jsonpath.JsonPathQuestionPlugin$JsonPathAnswerElement',
 'results': {'0': {'extractedValues': {"'traces'->'Flow<ingressNode:as2border1 ingressVrf:default srcIp:0.0.32.0 dstIp:2.128.0.101 ipProtocol:UDP srcPort:0 dstPort:DOMAIN(53) dscp: 0 ecn:0 fragmentOffset:0 packetLength:0 state:NEW tag:BASE>'": {'flow': {'dscp': 0,
      'dstIp': '2.128.0.101',
      'dstPort': 53,
      'ecn': 0,
      'fragmentOffset': 0,
      'icmpCode': 255,
      'icmpVar': 255,
      'ingressNode': 'as2border1',
      'ingressVrf': 'default',
      'ipProtocol': 'UDP',
      'packetLength': 0,
      'srcIp': '0.0.32.0',
      'srcPort': 0,
      'state': 'NEW',
      'tag': 'BASE',
      'tcpFlagsAck': 0,
      'tcpFlagsCwr': 0,
      'tcpFlagsEce': 0,
      'tcpFlagsFin': 0,
      'tcpFlagsPsh': 0,
      'tcpFlagsRst': 0,
      'tcpFlagsSyn': 0,
      'tcpFlagsUrg': 0},
     'flowTraces': [{'disposition': 'ACCEPTED',
       'hops': [{'edge': {'node1': 'as2border1',
          

As you can see, this query identified flows entering the network at both `as2border1` and `as2border2` destined for `host1` that would be delivered.
And in all cases, it was DNS traffic that was permitted, as we would have expected based on the configuration of `host1`

### want to be able to run a query that asks if ANY DNS flows originating inside AS2 destined for host1 would be dropped

need to add additional protocol fields to reachability query.

### want to be able to run a query that asks if ANY DNS flow originating outside AS2 destined for host1 would be dropped.

### want to be able to find any flows from outside AS2 that would be accepted by `host1` that are not DNS or SSH

### WILL DELETE THESE CELLS AFTER NOTEBOOK IS FINISHED


### View IP addresses for ALL devices and ALL VRFs
Batfish tracks all of the configured IP addresses and makes it easily accessible. Let's take a look at how you can retrieve the specific information you want.

In [28]:
# Get configured IP addresses on all nodes
ip_all = bfq.ipOwners().answer().frame()

We are not going to print this table as it has a large number of entries. 

### View IP addresses for the hosts in the snapshot

In [29]:
# Get configured IP address for hosts
hosts_ip = ip_all[ip_all['Hostname'].str.contains('host')]
hosts_ip

Unnamed: 0,Hostname,VRF,Interface,IP,Mask,Active
12,host1,default,eth0,2.128.0.101,24,True
22,host2,default,eth0,2.128.1.101,24,True


### View IP addresses for all devices in AS3 in the snapshot

In [30]:
# Get configured IP addresses for all AS3 devices
as3_ip = ip_all[ip_all['Hostname'].str.contains('as3')]
as3_ip

Unnamed: 0,Hostname,VRF,Interface,IP,Mask,Active
2,as3border2,default,Loopback0,3.2.2.2,32,True
3,as3border1,default,GigabitEthernet1/0,10.23.21.3,24,True
5,as3core1,default,GigabitEthernet2/0,90.90.90.1,24,True
7,as3border2,default,GigabitEthernet0/0,10.13.22.3,24,True
20,as3core1,default,Loopback0,3.10.1.1,32,True
21,as3core1,default,GigabitEthernet1/0,3.0.1.2,24,True
27,as3border1,default,Loopback0,3.1.1.1,32,True
29,as3core1,default,GigabitEthernet0/0,3.0.2.2,24,True
31,as3border1,default,GigabitEthernet0/0,3.0.1.1,24,True
40,as3core1,default,GigabitEthernet3/0,90.90.90.2,24,True


### Wrap-up

This concludes the notebook. To recap, in this notebook we covered the foundational tasks for path analysis:

1. Retrieving the list of IP addresses configured on every interface on every device in the snapshot
2. Determining the paths taken, and disposition along each path, for a flow using `traceroute`

We hope you found this notebook useful and informative. Future notebooks will dive into more advanced topics like path analysis, debugging ACLs and firewall rules, validating routing policy, etc.. so stay tuned! 

### Want to know more? 

Reach out to us through [Slack](https://join.slack.com/t/batfish-org/shared_invite/enQtMzA0Nzg2OTAzNzQ1LTUxOTJlY2YyNTVlNGQ3MTJkOTIwZTU2YjY3YzRjZWFiYzE4ODE5ODZiNjA4NGI5NTJhZmU2ZTllOTMwZDhjMzA) or [Github](https://github.com/batfish/batfish) to learn more, or send feedback.