# Introduction to Forwarding Change Validation


Network engineers frequently have to make changes to forwarding behavior: add new routes, open or close flows, route traffic traffic through different devices, etc. These changes are often hard to get right and hard to validate. This notebook will show how batfish can help validate changes to network forwarding _before_ you deploy them. We will do this using batfish's differential analyses to compare the forwarding behavior of two snapshots of the network. As we will see, these anaylses are a powerful way to understand, test, and validate changes to the network. 

In [1]:
# Import packages and load questions
%run startup.py

  return f(*args, **kwds)
  "Pybatfish public API is being updated, note that API names and parameters will soon change.")


In this notebook we will be focusing on the autonomous system AS2 within the network shown in the following diagram.

You can view and download the device configuration files [here](https://github.com/batfish/pybatfish/tree/master/jupyter_notebooks/networks/differential-ex1-base).

![example-network](https://raw.githubusercontent.com/batfish/pybatfish/master/jupyter_notebooks/networks/example/example-network.png)

***

In these configurations, the network is overprovisioned with failover redundancy for the core routers. All traffic is routed through `as2core1`, but will automatically switch to use `as2core2` in case of a failure or during maintenance.

In this notebook, we want to shift traffic from `as2core1` to `as2core2` so we can service `as2core1`. We'll implement a change to cost out `as2core1`, and verify that it does not affect network behavior.



## Step 1: Test current behavior


Before beginning, let's check that the network is working as expected (routing through `as2core1`). First we load our snapshot into batfish.

In [2]:
EX1_NETWORK_NAME = "differential-example1"
EX1_BASE_NAME = "base"
EX1_BASE_PATH = "networks/differential-ex1-base"

bf_set_network(EX1_NETWORK_NAME)
bf_init_snapshot(EX1_BASE_PATH, name=EX1_BASE_NAME, overwrite=True)

'base'

Batfish will automatically compute the RIBs and FIBs from the device configuration files in the snapshot, allowing us to test the forwarding behavior offline. Let's do that now, by using the `reachability` question to search for flows from the border routers `as2border1` or `as2border2` to the hosts `host1` or `host2` that are delivered successfully.

In [3]:
answer = bfq.reachability(
    pathConstraints = PathConstraints(startLocation="as2border.*"), 
    headers = HeaderConstraints(dstIps="ofLocation(host.*)")
).answer(snapshot = EX1_BASE_NAME)
display_html(answer.frame())

Unnamed: 0,Flow,Traces,TraceCount
0,Src IP: 2.1.1.1 Src Port: 49152 Dst IP: 2.128.0.101 Dst Port: 22 IP Protocol: TCP Start Location: as2border1,"ACCEPTED 1. node: as2border1  ORIGINATED(default)  FORWARDED(Routes: ibgp [Network: 2.128.0.0/24, Next Hop IP:2.34.201.4])  TRANSMITTED(GigabitEthernet1/0) 2. node: as2core1  RECEIVED(GigabitEthernet0/0)  FORWARDED(Routes: ibgp [Network: 2.128.0.0/24, Next Hop IP:2.34.201.4])  TRANSMITTED(GigabitEthernet3/0) 3. node: as2dist2  RECEIVED(GigabitEthernet1/0)  FORWARDED(Routes: bgp [Network: 2.128.0.0/24, Next Hop IP:2.34.201.4])  TRANSMITTED(GigabitEthernet2/0) 4. node: as2dept1  RECEIVED(GigabitEthernet1/0)  FORWARDED(Routes: connected [Network: 2.128.0.0/24, Next Hop IP:AUTO/NONE(-1l)])  TRANSMITTED(GigabitEthernet2/0) 5. node: host1  RECEIVED(eth0: filter::INPUT)  ACCEPTED(InboundStep)",1
1,Src IP: 2.1.1.2 Src Port: 49152 Dst IP: 2.128.0.101 Dst Port: 22 IP Protocol: TCP Start Location: as2border2,"ACCEPTED 1. node: as2border2  ORIGINATED(default)  FORWARDED(Routes: ibgp [Network: 2.128.0.0/24, Next Hop IP:2.34.201.4])  TRANSMITTED(GigabitEthernet2/0) 2. node: as2core1  RECEIVED(GigabitEthernet1/0)  FORWARDED(Routes: ibgp [Network: 2.128.0.0/24, Next Hop IP:2.34.201.4])  TRANSMITTED(GigabitEthernet3/0) 3. node: as2dist2  RECEIVED(GigabitEthernet1/0)  FORWARDED(Routes: bgp [Network: 2.128.0.0/24, Next Hop IP:2.34.201.4])  TRANSMITTED(GigabitEthernet2/0) 4. node: as2dept1  RECEIVED(GigabitEthernet1/0)  FORWARDED(Routes: connected [Network: 2.128.0.0/24, Next Hop IP:AUTO/NONE(-1l)])  TRANSMITTED(GigabitEthernet2/0) 5. node: host1  RECEIVED(eth0: filter::INPUT)  ACCEPTED(InboundStep)",1


The `reachability` question returns an example flow from each of the border routers. As we can see in the `Traces` column, both flows are routed through `as2core1`.  For more detail on `reachability` question, see the notebook [Introduction to Forwarding Analysis](https://github.com/batfish/pybatfish/blob/master/jupyter_notebooks/Introduction%20to%20Forwarding%20Analysis.ipynb).


Next, we'll cost out `as2core1`, causing traffic to route through `as2core2` instead of `as2core1`. Below you can see the configuration changes we're going to make: adjust some ospf costs and shutdown BGP neighbors to ensure no traffic passes throught `as2core1`. We'll implement this change offline in a new snapshot, and validate that the change won't affect end-to-end reachability. Then we can push the change to the network with complete confidence.

We'll validate the change using a two-step process, verifying that it has the intended effect, and that it causes no collateral damage. More specifically, the change must:
1. Ensure that no traffic is routed through `as2core1`.
1. Have no impact on the reachability matrix.

Each of these requirements is essential -- without either one, traffic could be disrupted.

## Step 2: Ensure that no traffic is routed transit through `as2core1`



Next we initialize a new snapshot with the updated configurations. Below is the diff summarizing the change. We add the command `ip ospf cost 500` to each interface on `as2core1`, increasing its OSPF cost from the previous value of `1`. This will cause the lower-cost routes through `as2core2` to be preferred.

```
$ diff -r networks/differential-ex1-base networks/differential-ex1-change
diff -r networks/differential-ex1-base/configs/as2core1.cfg networks/differential-ex1-change/configs/as2core1.cfg
67a68
>  ip ospf cost 500
71a73
>  ip ospf cost 500
76a79
>  ip ospf cost 500
81a85
>  ip ospf cost 500
```

In [4]:
EX1_CHANGE_NAME = "change"
EX1_CHANGE_PATH = "networks/differential-ex1-change"

bf_init_snapshot(EX1_CHANGE_PATH, name=EX1_CHANGE_NAME, overwrite=True)

'change'

Next, let's run our previous `reachability` query again, this time on the change snapshot.  

In [5]:
answer = bfq.reachability(
    pathConstraints = PathConstraints(startLocation="as2border.*"), 
    headers = HeaderConstraints(dstIps="ofLocation(host.*)")
).answer(snapshot = EX1_CHANGE_NAME)
display_html(answer.frame())

Unnamed: 0,Flow,Traces,TraceCount
0,Src IP: 2.1.1.1 Src Port: 49152 Dst IP: 2.128.1.101 Dst Port: 22 IP Protocol: TCP Start Location: as2border1,"ACCEPTED 1. node: as2border1  ORIGINATED(default)  FORWARDED(Routes: ibgp [Network: 2.128.1.0/24, Next Hop IP:2.34.201.4])  TRANSMITTED(GigabitEthernet2/0) 2. node: as2core2  RECEIVED(GigabitEthernet1/0)  FORWARDED(Routes: ibgp [Network: 2.128.1.0/24, Next Hop IP:2.34.201.4])  TRANSMITTED(GigabitEthernet2/0) 3. node: as2dist2  RECEIVED(GigabitEthernet0/0)  FORWARDED(Routes: bgp [Network: 2.128.1.0/24, Next Hop IP:2.34.201.4])  TRANSMITTED(GigabitEthernet2/0) 4. node: as2dept1  RECEIVED(GigabitEthernet1/0)  FORWARDED(Routes: connected [Network: 2.128.1.0/24, Next Hop IP:AUTO/NONE(-1l)])  TRANSMITTED(GigabitEthernet3/0) 5. node: host2  RECEIVED(eth0: filter::INPUT)  ACCEPTED(InboundStep)",1
1,Src IP: 2.1.1.2 Src Port: 49152 Dst IP: 2.128.1.101 Dst Port: 22 IP Protocol: TCP Start Location: as2border2,"ACCEPTED 1. node: as2border2  ORIGINATED(default)  FORWARDED(Routes: ibgp [Network: 2.128.1.0/24, Next Hop IP:2.34.201.4])  TRANSMITTED(GigabitEthernet1/0) 2. node: as2core2  RECEIVED(GigabitEthernet0/0)  FORWARDED(Routes: ibgp [Network: 2.128.1.0/24, Next Hop IP:2.34.201.4])  TRANSMITTED(GigabitEthernet2/0) 3. node: as2dist2  RECEIVED(GigabitEthernet0/0)  FORWARDED(Routes: bgp [Network: 2.128.1.0/24, Next Hop IP:2.34.201.4])  TRANSMITTED(GigabitEthernet2/0) 4. node: as2dept1  RECEIVED(GigabitEthernet1/0)  FORWARDED(Routes: connected [Network: 2.128.1.0/24, Next Hop IP:AUTO/NONE(-1l)])  TRANSMITTED(GigabitEthernet3/0) 5. node: host2  RECEIVED(eth0: filter::INPUT)  ACCEPTED(InboundStep)",1


Good! The traffic we previously saw being routed through `as2core1` is now routed through `as2core2`. It appears that our change correctly moved traffic off of `as2core1`. However,  this doesn't ensure that **no** traffic is routed through `as2core1`; for that we need to search for counter-examples: traffic that *is* routed through `as2core1`. If no counterexamples are found, we have proven that `as2core1` is never used. We do this by running `reachability` again, using the `transitLocations` parameter to search for flows that transit `as2core1`. We'll broaden the search to include *all flows* by removing the `startLocation` constraint and setting the `actions` parameter to `SUCCESS,FAILURE` to include dropped flows as well as those that are successfully delivered.

In [6]:
# Search for any traffic routed through as2core1
answer = bfq.reachability(
    pathConstraints = PathConstraints(transitLocations="as2core1"),
    actions = "SUCCESS,FAILURE"
).answer(snapshot=EX1_CHANGE_NAME)
display_html(answer.frame())

Unnamed: 0,Flow,Traces,TraceCount
0,Src IP: 2.1.2.1 Src Port: 0 Dst IP: 2.1.1.1 Dst Port: 0 IP Protocol: ICMP Start Location: as2core1,"ACCEPTED 1. node: as2core1  ORIGINATED(default)  FORWARDED(Routes: ospf [Network: 2.1.1.1/32, Next Hop IP:2.12.11.1])  TRANSMITTED(GigabitEthernet0/0) 2. node: as2border1  RECEIVED(GigabitEthernet1/0)  ACCEPTED(InboundStep)",1


Again, batfish returns an example flow from each source location satisfying the query.  We get a single result, from `as2core1` itself. But since there are no other results, we are guaranteed that no traffic from *any other device* in the network will route through `as2core1`. This verifies the first requirement of the change. Having done so, let's check our second requirement -- that end-to-end network behavior is completely unchanged.



## Step 3: End-to-end network behavior is unchanged.


In this step, we'll compare the forwarding behavior of the candidate change snapshot against the original using the `differentialReachability` question. In particular, we'll use the question to search for flows that are successfully delivered in either snapshot but not the other. If the change is correct, no such flows will be found, because costing out `as2core1` should have no effect on the end-to-end network behavior.

In [16]:
answer = bfq.differentialReachability().answer(
    snapshot=EX1_CHANGE_NAME, 
    reference_snapshot=EX1_BASE_NAME)
display_html(answer.frame())

Unnamed: 0,Flow,Snapshot_Traces,Snapshot_TraceCount,Reference_Traces,Reference_TraceCount
0,Src IP: 2.1.1.1 Src Port: 0 Dst IP: 2.128.0.0 Dst Port: 0 IP Protocol: ICMP Start Location: as2border1,"NULL_ROUTED 1. node: as2border1  ORIGINATED(default)  FORWARDED(Routes: ibgp [Network: 2.128.0.0/24, Next Hop IP:2.34.201.4])  TRANSMITTED(GigabitEthernet2/0) 2. node: as2core2  RECEIVED(GigabitEthernet1/0)  FORWARDED(Routes: static [Network: 2.128.0.0/24, Next Hop IP:AUTO/NONE(-1l)])  NULL_ROUTED(null_interface)",1,"EXITS_NETWORK 1. node: as2border1  ORIGINATED(default)  FORWARDED(Routes: ibgp [Network: 2.128.0.0/24, Next Hop IP:2.34.201.4])  TRANSMITTED(GigabitEthernet1/0) 2. node: as2core1  RECEIVED(GigabitEthernet0/0)  FORWARDED(Routes: ibgp [Network: 2.128.0.0/24, Next Hop IP:2.34.201.4])  TRANSMITTED(GigabitEthernet3/0) 3. node: as2dist2  RECEIVED(GigabitEthernet1/0)  FORWARDED(Routes: bgp [Network: 2.128.0.0/24, Next Hop IP:2.34.201.4])  TRANSMITTED(GigabitEthernet2/0) 4. node: as2dept1  RECEIVED(GigabitEthernet1/0)  FORWARDED(Routes: connected [Network: 2.128.0.0/24, Next Hop IP:AUTO/NONE(-1l)])  EXITS_NETWORK(GigabitEthernet2/0)",1
1,Src IP: 2.1.1.2 Src Port: 0 Dst IP: 2.128.0.0 Dst Port: 0 IP Protocol: ICMP Start Location: as2border2,"NULL_ROUTED 1. node: as2border2  ORIGINATED(default)  FORWARDED(Routes: ibgp [Network: 2.128.0.0/24, Next Hop IP:2.34.201.4])  TRANSMITTED(GigabitEthernet1/0) 2. node: as2core2  RECEIVED(GigabitEthernet0/0)  FORWARDED(Routes: static [Network: 2.128.0.0/24, Next Hop IP:AUTO/NONE(-1l)])  NULL_ROUTED(null_interface)",1,"EXITS_NETWORK 1. node: as2border2  ORIGINATED(default)  FORWARDED(Routes: ibgp [Network: 2.128.0.0/24, Next Hop IP:2.34.201.4])  TRANSMITTED(GigabitEthernet2/0) 2. node: as2core1  RECEIVED(GigabitEthernet1/0)  FORWARDED(Routes: ibgp [Network: 2.128.0.0/24, Next Hop IP:2.34.201.4])  TRANSMITTED(GigabitEthernet3/0) 3. node: as2dist2  RECEIVED(GigabitEthernet1/0)  FORWARDED(Routes: bgp [Network: 2.128.0.0/24, Next Hop IP:2.34.201.4])  TRANSMITTED(GigabitEthernet2/0) 4. node: as2dept1  RECEIVED(GigabitEthernet1/0)  FORWARDED(Routes: connected [Network: 2.128.0.0/24, Next Hop IP:AUTO/NONE(-1l)])  EXITS_NETWORK(GigabitEthernet2/0)",1


In our scenario, there has been some drift between our core routers, so moving traffic from `as2core1` to `as2core2` does affect end-to-end behavior: some traffic that was being delivered before the change is being *null routed* after. This means if we deploy the change now, there will be a loss of connectivity. Fortunately the `differentialReachability` question was able to identify that bug before we deployed the change. 

The results include an example flow from each start location that has traffic affected by the change. Each flow comes with detailed traces of all the paths it can take through the network, which helps us diagnose the problem:
`as2core2` has a rogue null route for `2.180.0.0/24` that should have been removed.

We fix this bug by removing the static route, and then upload the fixed change snapshot. 

## Step 2 (again): Ensure that no traffic is routed transit through as2core1


Having removed the bad null route from both the base and the change snapshots, we load them into batfish again and perform the same validation steps:

In [8]:
EX1_BASE_FIXED_NAME = "base-fixed"
EX1_BASE_FIXED_PATH = "networks/differential-ex1-base-fixed"
bf_init_snapshot(EX1_BASE_FIXED_PATH, name=EX1_BASE_FIXED_NAME, overwrite=True)

EX1_CHANGE_FIXED_NAME = "change-fixed"
EX1_CHANGE_FIXED_PATH = "networks/differential-ex1-change-fixed"
bf_init_snapshot(EX1_CHANGE_FIXED_PATH, name=EX1_CHANGE_FIXED_NAME, overwrite=True)

'change-fixed'

In [9]:
# Requirement 1: No traffic is routed through as2core1.
answer = bfq.reachability(
    pathConstraints = PathConstraints(transitLocations="as2core1"),
    actions = "SUCCESS,FAILURE"
).answer(snapshot = EX1_CHANGE_FIXED_NAME)
display_html(answer.frame())


Unnamed: 0,Flow,Traces,TraceCount
0,Src IP: 2.1.2.1 Src Port: 0 Dst IP: 2.1.1.1 Dst Port: 0 IP Protocol: ICMP Start Location: as2core1,"ACCEPTED 1. node: as2core1  ORIGINATED(default)  FORWARDED(Routes: ospf [Network: 2.1.1.1/32, Next Hop IP:2.12.11.1])  TRANSMITTED(GigabitEthernet0/0) 2. node: as2border1  RECEIVED(GigabitEthernet1/0)  ACCEPTED(InboundStep)",1


Again, the only traffic that routes through `as2core1` is traffic from `as2core1` itself. So `as2core1` is still correctly costed-out. 



## Step 3 (again): End-to-end network behavior is unchanged.


We now move on to check that after removing the bad null route, costing out `as2core1` has no impact on the reachability matrix:

In [10]:
# Requirement 2: End-to-end network behavior is unchanged.
answer = bfq.differentialReachability().answer(
    snapshot = EX1_CHANGE_FIXED_NAME, 
    reference_snapshot = EX1_BASE_FIXED_NAME)
display_html(answer.frame())

Unnamed: 0,Flow,Traces,TraceCount,Reference_Traces,Reference_TraceCount


Success! We have now verified that our change will correctly cost-out `as2core1` without affecting end-to-end network behavior. We are ready to deploy the change and do the maintenance work for `as2core1` with complete confidence. 



# Summary


In this notebook, you saw how batfish can help you validate changes to forwarding behavior before you depoly them to the network.

Let's recap the steps we took to verify this change:
1. First, we verified that the primary intent of the change is achieved: traffic is moved from `as2core1` to `as2core2`. We used the `reachability` query to search *all flows* in the network, and verified that none will transit `as2core1` after the change.
1. Second, we verified that moving the traffic did not affect end-to-end reachability. For this, we used the `differentialReachability` query to compare the forwarding behavior of two snapshots. This verified that *no flow* will be affected by the change.