<a href="https://colab.research.google.com/github/Caiyunwei/AI_Hardware_Project_Template/blob/main/CS6501_HW2_BGP.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **CS6501/ECE6502 Spring 2025, Assignment 2**

In this assignment, we will analyze real-world BGP data during the 2008 Youtube hijacking event.

Note: this assignment is individual work. Please do not collaborate or share answers with classmates. It's ok to discuss, but please list all names whom you've discussed with in this text block. Please contact course staff with any questions.


> Names (if applicable):



## **Get Started**

First, download [hw2data.zip](https://www.cs.virginia.edu/~ys3kz/courses/spring25/cs6501/hw2data.zip).

You should have 7 files: two “ribs.\*” files and five “updates.\*” files, where the name of each file indicates the timestamp.

If you run the notebook on Google Colab, make sure that you [upload the files to the notebook](https://www.cs.virginia.edu/~ys3kz/courses/spring25/cs6501/upload_colab.png).

If you run the notebook locally, make sure that the files are in the same directory as this notebook.

## **Understanding BGP data**

The data was collected at the [RouteViews BGP collector](http://www.routeviews.org/routeviews/index.php/map/) in Oregon. Thus, it shows the BGP announcements seen at this collector, while other collectors likely have different views. The data we will use in this assignment was collected from 18:22 to 21:11 on Feb 24, 2008, UTC. You can find the original data source [here](http://archive.routeviews.org/bgpdata/2008.02/) and the complete data archive on [RouteViews](http://archive.routeviews.org/).

There are two types of data: RIB (rib.\*) and UPDATE (updates.\*). RIB stands for *Routing Information Base*, which shows the full routing table. For example, *rib.20080224.1822* is a snapshot of the full routing table at 18:22 UTC on 02/24/2008. RIB files are usually large.

UPDATE files only show BGP update messages received during a time interval. The collector saves BGP updates to file every 15 minutes. For example, *updates.20080224.1839* contains all BGP announcements received from 18:39 to 18:54. UPDATE files are much smaller than RIB files.

*Note:* the original file type is .bz2, which is a format used for BGP data. We have already parsed out the files for you in this assignment into txt format. If you want to explore additional data (e.g., for your final project), you will need to parse them out. You can do so by using [mrtparse](https://github.com/t2mune/mrtparse/tree/master/examples) or [BGPReader](https://bgpstream.caida.org/docs/tools/bgpreader).

### **Data fields you may need to use in this assignment:**

Example line in RIB:

`TABLE_DUMP|02/24/08 18:24:13|B|144.228.241.81|1239|222.255.224.0/19|1239 6762 7473 7643|IGP`

*   02/24/08 18:24:13 --> timestamp
*   222.255.224.0/19 --> prefix
*   1239 --> neighboring AS (should be the same as the first ASN in the AS path)
*   1239 6762 7473 7643 --> AS path

Example line in UPDATES:

`BGP4MP|02/24/08 18:39:21|A|195.219.96.239|6453|89.4.128.0/24|6453 39386 24731|IGP`

Fields are similar to RIB. The main difference is that "A" stands for advertise/announce and "W" stands for withdraw.

## **In this assignment, we will only focus on routes involving 208.65.152.0/22 AND all its sub-prefixes. All questions refer to 208.65.152.0/22 AND all its sub-prefixes.**

In [None]:
# TODO: implement a function to find which AS(es) is announcing 208.65.152.0/22 OR any of its subprefixes.
# The program should take a file as input (RIB or UPDATES).
# The program should return a list of distinct tuples, where each tuple is (prefix, ASN), indicating that the AS is announcing the prefix (208.65.152.0/22 OR any of its subprefixes).

# Example output: [('208.65.152.0/22', 12345), ('208.65.152.0/23', 54321)]
# Note: the example output uses arbitrary values.

# IMPORTANT NOTE: please include the link to any resource you use in implementing this

# You may find the ipaddress library helpful. But feel free to use other libraries too.
import ipaddress

def find_origins(fname):
  #TODO
  return []


### **Part 1: before the attack**

Let's look at rib.20080224.1822, which is a snapshot of the full routing table at 18:22.

In [None]:
find_origins("rib.20080224.1822.txt")

Answer the following questions.

*   What's the name of organization corresponding to each AS in the output? Hint: you can lookup on CAIDA.
> Answer:

*   Can you query WHOIS to find out the organization that owns each prefix in the output?
> Answer:

### **Part 2: start of the attack**

Let's look at updates.20080224.1839, which contains all BGP updates from 18:39 to 18:54

In [None]:
find_origins("updates.20080224.1839.txt")

Answer the following questions.

*   What's the name of organization corresponding to each AS in the output? Hint: you can lookup on CAIDA.
> Answer:

*   Which AS will receive packets to IP address 208.65.153.0? And why? Please refer to the program output in your explanation.
> Answer:

*   Which AS will receive packets to IP address 208.65.152.0? And why? Please refer to the program output in your explanation.
> Answer:

### **Part 3: reaction from the victim**

Let's look at updates.20080224.1954, which contains all BGP updates from 19:54 to 20:09.

In [None]:
find_origins("updates.20080224.1954.txt")

Answer the following questions.

*   Which AS will receive packets to IP address 208.65.153.0? And why? Please refer to the program output in your explanation.
> Answer:

*   Why did the AS make the (new) announcement you observe in this file? Please refer to the program output in your explanation.
> Answer:

### **Part 4: more reaction from the victim**

Let's look at updates.20080224.2009, which contains all BGP updates from 20:09 to 20:24.

In [None]:
find_origins("updates.20080224.2009.txt")

Answer the following questions.

*   Which AS will receive packets to IP address 208.65.153.0? And why? Please refer to the program output in your explanation.
> Answer:

*   Why did the AS make the (new) announcement you observe in this file? Specifically, what different effects that this new announcement would create compared to the announcement seen in Part 3?
> Answer:

### **Part 5: another look at RIB**

After all the above activites… Let's take another look at the full routing table! Look at rib.20080224.2024, which is a snapshot of the full routing table at 20:24.

In [None]:
find_origins("rib.20080224.2024.txt")

### **Part 6: reaction from upstream AS**

Let's look at updates.20080224.2041, which contains all BGP updates from 20:41 to 21:56.

In [None]:
# TODO: implement a function that outputs the full routing info received from neighbor AS 13237 for 208.65.152.0/22 AND all its subprefixes.
# The program should take a file as input (UPDATES).
# The program should print all lines from the file where neighbor AS is 13237 and prefix is 208.65.152.0/22 or any of its subprefixes.

# Example output:
# BGP4MP|02/24/08 20:41:18|A|209.161.175.4|13237|208.65.152.0/22|13237 100 200 300|IGP
# Note: the example output uses arbitrary values.

def find_routes(fname):
  #TODO
  return None

find_routes("updates.20080224.2041.txt")

Answer the following questions.

*   Did the AS path change in the announcements? If so, specify the change, i.e., changed from __ to __. Please refer to the program output in your explanation.
> Answer:

*   What are the effects of this change? Specifically, would the origin AS receive more or less traffic after the change, and why?
> Answer:

In [None]:
# TODO: implement a function that outputs the list of neighbor ASes from which there is a similar change in the AS path as AS 13237 above.
# The program should take a file as input (UPDATES).
# The program should return a list of neighbor ASes, whose paths exhibit a similar pattern as the new path from AS 13237. Do not include 13237 in the list.

# Example output: [12345, 54321]
# Note: the example output uses arbitrary values.

def find_neighbors(fname):
  #TODO
  return []

find_neighbors("updates.20080224.2041.txt")

### **Part 7: more reaction from upstream AS**

Let's look at updates.20080224.2056, which contains all BGP updates from 20:56 to 21:09.

We will only look at updates received from the neighbor AS 13237.

In [None]:
find_routes("updates.20080224.2056.txt")

Answer the following questions.

*   Please refer to the program output and explain what happened in the updates. Please go through each line one by one, in chronological order, and explain what happened in the update and what is the effect.
> Answer:
