This software is Copyright © 2024 The Regents of the University of California. All Rights Reserved. Permission to copy, modify, and distribute this software and its documentation for educational, research and non-profit purposes, without fee, and without a written agreement is hereby granted, provided that the above copyright notice, this paragraph and the following three paragraphs appear in all copies. Permission to make commercial use of this software may be obtained by contacting:

Office of Innovation and Commercialization

9500 Gilman Drive, Mail Code 0910

University of California

La Jolla, CA 92093-0910

(858) 534-5815

invent@ucsd.edu

This software program and documentation are copyrighted by The Regents of the University of California. The software program and documentation are supplied “as is”, without any accompanying services from The Regents. The Regents does not warrant that the operation of the program will be uninterrupted or error-free. The end-user understands that the program was developed for research purposes and is advised not to rely exclusively on the program for any reason.

IN NO EVENT SHALL THE UNIVERSITY OF CALIFORNIA BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. THE UNIVERSITY OF CALIFORNIA SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN “AS IS” BASIS, AND THE UNIVERSITY OF CALIFORNIA HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.

# This notebook calculates the overlap between prefix origin pairs in IRR and BGP (Paper section 5.1.3)

This notenook requires the caida prefix2as dataset to run:  https://www.caida.org/catalog/datasets/routeviews-prefix2as/

In [30]:
import pandas as pd
import numpy as np
from datetime import datetime
import time
import gzip
import json
import bz2

# Load the Data

## Helper functions and variables

In [2]:
def delta_date(x):
    return (str_to_date(x['end_date']) - str_to_date(x['start_date'])).days + 1

In [3]:
def str_to_date(d):
    return datetime.strptime(d, "%Y-%m-%d")

In [253]:
def overlap_db(db):
    overlap = db.merge(bgp, left_on=['route', 'origin'], right_on=['prefix', 'asn'], suffixes=('_route', '_prefix'))
    return overlap

## Load BGP

Provide the path of downloaded prefix2as file

In [8]:
bgpsource = []
with gzip.open('data/routeviews-rv2-20240101-1200.pfx2as.gz', 'rt') as file:
    for line in file:
        data = line.strip().split()
        for asn in data[2].split('_'):
            bgpsource.append((data[0]+'/'+data[1], 'AS'+asn))

bgp = pd.DataFrame(bgpsource, columns=['prefix', 'asn'])

In [9]:
bgp

Unnamed: 0,prefix,asn
0,1.0.0.0/24,AS13335
1,1.0.4.0/22,AS38803
2,1.0.5.0/24,AS38803
3,1.0.16.0/24,AS2519
4,1.0.32.0/24,AS141748
...,...,...
993518,223.255.250.0/24,AS63199
993519,223.255.251.0/24,AS63199
993520,223.255.252.0/24,AS58519
993521,223.255.253.0/24,AS58519


## Load Altdb

In [10]:
irrs = ['afrinic', 'jpirr', 'canarie', 'apnic', 'arin-nonauth', 'level3', 'nestegg', 'bboi', 'idnic', 'wcgdb', 'rgnet', 'tc', 'lacnic', 'ripe-nonauth', 'openface', 'panix', 'arin', 'radb', 'altdb', 'ripe', 'nttcom', 'bell']

In [13]:
irr_dict = {}
for irr in irrs:
    try:
        irr_dict[irr] = pd.read_json('data/{}.route.json.gz'.format(irr), lines=True)
    except FileNotFoundError:
        print(irr, 'no longer exist')

arin-nonauth no longer exist
rgnet no longer exist
openface no longer exist


# Calculate number of prefix-origin pairs in every IRR, overlap of prefix-origin pairs with BGP and all IRRs

In [25]:
def numerator_overlap(df):
    overlap=len(df.merge(bgp, left_on=['route', 'origin'], right_on=['prefix', 'asn']).drop_duplicates(['route', 'origin']))
    return overlap

In [26]:
def denominator(df):
    return len(df.drop_duplicates(['route', 'origin']))

In [27]:
def percentage_frac(df):
    num=numerator_overlap(df)
    denom=denominator(df)
    return str(round((num*100/denom),2))+"\% ("+str(num)+"/"+str(denom)+")"

In [32]:
print("IRR","&","total prefix origin pairs in IRR","&", "percentage of overlapping prefix-origin pairs", chr(92)+chr(92))
for i in irr_dict:
    print(i,"&",denominator(irr_dict[i]),"&", percentage_frac(irr_dict[i]), chr(92)+chr(92))

IRR & total prefix origin pairs in IRR & percentage of overlapping prefix-origin pairs \\
afrinic & 110346 & 20.3\% (22395/110346) \\
jpirr & 12976 & 66.55\% (8636/12976) \\
canarie & 1424 & 54.99\% (783/1424) \\
apnic & 694344 & 15.53\% (107842/694344) \\
level3 & 74912 & 21.75\% (16291/74912) \\
nestegg & 4 & 0.0\% (0/4) \\
bboi & 840 & 42.26\% (355/840) \\
idnic & 6078 & 59.3\% (3604/6078) \\
wcgdb & 52035 & 6.03\% (3137/52035) \\
tc & 21324 & 51.7\% (11024/21324) \\
lacnic & 9739 & 66.91\% (6516/9739) \\
ripe-nonauth & 51993 & 23.39\% (12162/51993) \\
panix & 40 & 15.0\% (6/40) \\
arin & 79146 & 57.12\% (45211/79146) \\
radb & 1200188 & 27.91\% (334963/1200188) \\
altdb & 24523 & 51.34\% (12590/24523) \\
ripe & 406607 & 52.08\% (211764/406607) \\
nttcom & 379455 & 13.76\% (52203/379455) \\
bell & 29254 & 2.93\% (857/29254) \\
