This software is Copyright © 2024 The Regents of the University of California. All Rights Reserved. Permission to copy, modify, and distribute this software and its documentation for educational, research and non-profit purposes, without fee, and without a written agreement is hereby granted, provided that the above copyright notice, this paragraph and the following three paragraphs appear in all copies. Permission to make commercial use of this software may be obtained by contacting:

Office of Innovation and Commercialization

9500 Gilman Drive, Mail Code 0910

University of California

La Jolla, CA 92093-0910

(858) 534-5815

invent@ucsd.edu

This software program and documentation are copyrighted by The Regents of the University of California. The software program and documentation are supplied “as is”, without any accompanying services from The Regents. The Regents does not warrant that the operation of the program will be uninterrupted or error-free. The end-user understands that the program was developed for research purposes and is advised not to rely exclusively on the program for any reason.

IN NO EVENT SHALL THE UNIVERSITY OF CALIFORNIA BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. THE UNIVERSITY OF CALIFORNIA SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN “AS IS” BASIS, AND THE UNIVERSITY OF CALIFORNIA HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.

# This notebooks parses IRR dumps and keeps the route objects in json format

You can retrieve the latest IRR dumps from https://www.irr.net/docs/list.html

In [1]:
import pandas as pd
import gzip
from collections import OrderedDict, defaultdict
import ipaddress
import math
import radix
import re
import numpy as np
import matplotlib.pyplot as plt
import urllib
import multiprocessing as mp
import os
from datetime import datetime, date
import ftplib
import bz2

In [7]:
split = re.compile(r'(.+?):\s*(.*)')

class Entry(OrderedDict):
    def __repr__(self):
        output = []
        for key, value in self.items():
            output.append('{}:\t{}'.format(key, value))
        return '\n'.join(output)
    
    def __hash__(self):
        return hash(str(self))
    
    @property
    def date(self):
        # self refers to the object, which in this case is a subclass of dict
        # the .get method for dict objects retrieves the value if the key exists,
        # otherwise it returns the default value
        changed = self.get('changed', None)
        if changed is not None:
            try:
                date = changed.split()[1]
            except IndexError:
                return '17000101'
        else:
            try:
                date = self['last-modified'].replace('-', '')
            except KeyError:
                return '16000101'
        return date
    
    
def parse_irr(filename):
    
    with gzip.open(filename, 'rt', encoding='latin-1') as f:
        items = []  # list to hold whois items
        item = Entry()  # an object to hold an individual entry

        # Iterate over the lines in the whois file
        for line in f:
            ol = line  # original version of the line
            line = line.strip()  # get rid of whitespace at beginning and end of the line

            # If the original line is not just a newline character,
            # and the line is only whitespace or starts with a comment character,
            # skip the line
            if ol != '\n' and (not line or line[0] == '#'):
                continue

            # If the line is not just whitespace
            if line:
                
                # if original line start with whitespace, append it to previous value
                if re.match(r'\s', ol):
                    try:
                        item[k] += '\n' + line
                        continue
                    except:
                        print(item)
                        continue
                
                # See if the line matches the regex
                m = split.match(line)
                # If it does
                if m:
                    # There are 2 possible matching groups, so assign them to k and v
                    k, v = m.groups()
                    # If the key is already in the item, concatenate the value to the existing value
                    if k in item and k != 'origin':
                        try:
                            item[k] += '\n' + v
                        except:
                            print(item)
                    # When the key does not yet exist in the item, add the key and value
                    else:
                        try:
                            item[k] = v
                        except:
                            print(item)
                # If it does not match
                else:
                    # Add the value to the previous key in the item
                    # This is a value with newline breaks
                    try:
                        item[k] += '\n' + line
                    except:
                        item[k] = line

            # If the line is just a newline break, finish the current item, and start a new one
            else:
                if item:
                    items.append(item)  # Add item to list of items
                    item = Entry()  # Start new item
    return items

## fname in the function below should be the file path to the IRR dump file you just downloaded

Example filename: `radb.db.gz`

In [15]:
def build_json(fname):
    provider = fname[fname.rfind('/')+1:fname.find('.db')]
        
    if provider == 'apnic':
        provider = fname[fname.rfind('/')+1:fname.find('.gz')]
        
    routeobjs = parse_irr(fname)
    df = pd.DataFrame([i for i in routeobjs if 'route' in i])
    df.to_json('data/{}.route.json.gz'.format(provider), orient='records', lines=True)

## call this function to generate json file

In [None]:
build_json('filename') ## replace filename with file path