GitHub - dave-andersen/bgp_predict: BGP routing table regression code used for AIP sigcomm 2008 paper

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README		README
bgp-active.txt		bgp-active.txt
daily-filter.rb		daily-filter.rb
plotsize.gp		plotsize.gp
plotupdates.gp		plotupdates.gp
projections.rb		projections.rb
regress.r		regress.r
updates_per_week.rb		updates_per_week.rb

Repository files navigation

This is the code used to produce projections of BGP routing
table growth via a best-fit exponential regression, used
in our SIGCOMM 2008 paper, "Accountable Internet Protocol."
Requires ruby and R.

bgp-active.txt:  BGP table size #s from Geoff Huston
bgp-daily.txt:  The BGP table info stripped down to one point per day
regress.r:  An R script to produce a linear and exponential fit to bgp-daily.txt
bgpfit.{eps,pdf}:  The output of regress.r

To re-create the results from our 2008 SIGCOMM paper, "Accountable Internet Protocol",:

1)  Extract the last entry from each (Julian) day from Geoff Huston's #s:

      ./daily-filter.rb bgp-active.txt > bgp-daily.txt

   (Ignore the rest of the stuff in daily-filter - it used to try to do
    everything, but it was easier to just do the regression in R)

   Produces output in the form of:  AMJD  count
   (AMJD = Astronomical modified julian day)


2)  Run the regression, which puts an output graph in bgpfit.ps
    and prints to stdout the sum of squares error and fit:

     r -f regress.r

    The output graph will include both a best linear fit (not likely to work well)
    and a best exponential fit (hopefully better).  The exp fit will look like:

    The exponential fit is the *last* text output (expfit2), and should
    look, on the original data, like:

-----------------
Nonlinear regression model
  model:  entries ~ exp(alpha + beta * day) 
   data:  data 
    alpha      beta 
9.3735436 0.0004253 
 residual sum-of-squares: 2.632e+11
-----------------

3)  Convert the output of the regression to something more friendly, because
    the regression is a linear fit on the log data.

    predicted = exp(intercept + day*x)

4)  Insert those values into "plotsize.gp"
    gnuplot plotsize.gp

5)  Optionally, run "projections.rb" to get the output of the 
    then (2007-based) predicted costs of compute resources
    needed to handle the predicted table sizes and churn.

6)  Optionally, grab a copy of the daily-updates count
    (att-daily-updates.txt and sprint-daily-updates, for example), and run

     ./updates_per_week.rb <file1> > file1-weekly.txt
     ./updates_per_week.rb <file2> > file2-weekly.txt

     And then plot them using plotupdates.gp:

     gnuplot plotupdates.gp