# X-Y Treasury Bond Fund Simulator

This is based off of [longinvest's post on bogleheads](1). He implemented it all in a spreadsheet (which is linked in the thread).

The goal is to calculate returns of a simulated bond fund given a bunch of interest rates.

First we need to import some libraries....

In [1]:
import numpy
from collections import deque
import pandas
import math

# Simulating the Bond Fund
Simulating the bond fund is conceptually straightforward.

You have a ladder of bonds, one for each year. Something like this:

    deque([Maturity: 1 | Yield: 5.00% | Face Value: $50.00,
       Maturity: 2 | Yield: 5.00% | Face Value: $52.50,
           Maturity: 3 | Yield: 5.00% | Face Value: $55.12,
       Maturity: 4 | Yield: 5.00% | Face Value: $57.88,
           Maturity: 5 | Yield: 5.00% | Face Value: $60.78,
       Maturity: 6 | Yield: 5.00% | Face Value: $63.81,
           Maturity: 7 | Yield: 5.00% | Face Value: $67.00,
       Maturity: 8 | Yield: 5.00% | Face Value: $70.36,
           Maturity: 9 | Yield: 5.00% | Face Value: $73.87])
           
Every year three things will happen:

1. All of the bonds will pay out their cash coupon. This is based on their yield and their face value.
1. When a bond gets "too young" (I'll come back to this) we sell it. The exact price will also be explained later. Every year you will sell one bond of the youngest maturity.
1. Now you've got a pile of cash and one fewer bond. Use the cash to buy a new bond of the longest maturity.

## Youngest maturity & oldest maturity

When you create the bond fund, you can select the youngest maturity and the oldest maturity. Say that you want fund where the oldest bond has a 10-year maturity and the youngest bond has a 2-year maturity. As a shorthand, we'll call that a 10-2 fund. Every year a 2-year bond will be sold and replaced with a brand new 10-year bond.

In [2]:
def iterate_fund(ladder, yield_curve, steps=9):
    payments = get_payments(ladder)

    sold_bond = ladder.popleft()
    payments += sold_bond.value(yield_curve)

    new_bond = Bond(payments, yield_curve[-1], steps+1)
    ladder.append(new_bond)
    
    # This happens *after* we sell the shortest bond and buy a new long one
    # (at least, that's what longinvest does...)
    nav = get_nav(ladder, yield_curve)
    
    reduce_maturity(ladder)

    return (ladder, payments, nav)

def get_nav(ladder, rates):
    return sum((b.value(rates) for b in ladder))

def get_payments(ladder):
    return sum((b.gen_payment() for b in ladder))

def reduce_maturity(ladder):
    for b in ladder:
        b.maturity -= 1
    return ladder

# Bond Mechanics

A bond is just three things: a yield, a face value, and a maturity. If you called up your broker you would say, "I want to buy \$100 of the 10-year Treasury that is yielding 3.2%." The maturity is 10-years; the face value is $100; and the yield is 3.2%.

There are only two things you can do with a bond.

### Receive your payment
Every year the bond will generate a payment -- a "coupon" in bond-speak. This is simply the **yield × face value**. Going back to the previous example, with a face value of \$100 and a yield of 3.2%, every year you would get a payment of $3.20. (Not very impressive, admittedly.)

### Check the current value of the bond
Bonds are designed to be held until their maturity. At that point you'll receive a payment for the face value. In our example, that would mean after holding the bond for 10 years you would get your full \$100 back.

But what if you wanted to sell the bond **before** maturity? That's (usually) possible but the exact price will depend on current rates. Say we want to sell our bond after 9 years. In essence, we have a 1-year bond that yields 3.2%. What if the current going yield for 1-year bonds was 2.5%? Then our bond will be worth a little more. If the current going yield for 1-year bonds is 4.2% then our bond will be worth a little less.

* take the current maturity remaining on the bond
* take the current yield on bonds of that maturity
* take the bond face value

Then mix all of them into present value calculation: **pv(current yield, current maturity, face value)**. (The pv function is found in every spreadsheet and many calculators.)

If the current yields are 2.5% and you have a face value of \$100 then the present value is \$97.56. That is, someone would be willing to pay \$97.56 for the bond. This is the bond's current value and how much you would get if you were to sell it.

### From rates to returns
At the end of the day, checking the current value of the bonds we hold is what we're trying to achieve. By adding up the value of all the bonds we hold we can figure out the Net Asset Value (NAV) of our fund. And then we compare that NAV over time. This is what we wanted: to be able to calculate the returns of a (simulated) bond fund.

In [3]:
class Bond:
    def __init__(self, face_value, yield_pct, maturity):
        self.face_value = face_value
        self.yield_pct = yield_pct
        self.maturity = maturity
        
    def __repr__(self):
        return ('Maturity: %d | Yield: %.2f%% | Face Value: $%.2f' % (self.maturity, self.yield_pct * 100, self.face_value))
        
    def gen_payment(self):
        return self.face_value * self.yield_pct
    
    def value(self, rates):
        value = numpy.pv(rates[self.maturity - 1], self.maturity, self.gen_payment(), self.face_value)
        return -value

# Bootstrapping the Ladder

Our bond ladder is straightforward enough. Sell the youngest bond and buy another one of the old bonds, using whatever cash we currently have available.

But how do you get the ladder **started**? Where do those first bonds come from?

Here's where things get a little bit unavoidably hacky. In the real world, you could slowly build up a ladder over time. For instance, buy 1/10th of the ladder every year for a decade. That takes, well, a decade. Which means there's an entire decade in our simulation with no results. We can shortcut that at the cost of a slight loss of accuracy.

If we're building a 10-2 ladder then we have 9 bonds (we don't have a bond with 1-year maturity, hence only 9 bonds). We bootstrap the ladder by buying all ten instantly. That means they will all have the same yield -- whatever the current yield is.

In [4]:
def bootstrap(bond_yield, max_bonds, min_maturity):
    ladder = deque()
    starting_face_value = 50 # chosen arbitrarily (to match longinvest)
    for i, j in zip(range(max_bonds), range(min_maturity, max_bonds+1)):
        face_value = pow(1 + bond_yield, i) * starting_face_value
        b = Bond(face_value, bond_yield, j)
        ladder.append(b)
    return ladder
bootstrap(.0532, 10, 2)

deque([Maturity: 2 | Yield: 5.32% | Face Value: $50.00,
       Maturity: 3 | Yield: 5.32% | Face Value: $52.66,
       Maturity: 4 | Yield: 5.32% | Face Value: $55.46,
       Maturity: 5 | Yield: 5.32% | Face Value: $58.41,
       Maturity: 6 | Yield: 5.32% | Face Value: $61.52,
       Maturity: 7 | Yield: 5.32% | Face Value: $64.79,
       Maturity: 8 | Yield: 5.32% | Face Value: $68.24,
       Maturity: 9 | Yield: 5.32% | Face Value: $71.87,
       Maturity: 10 | Yield: 5.32% | Face Value: $75.69])

Why do we have a different face value for each one? Why not just $50 for each? Good question. I don't know. But that's what longinvest did in the spreadsheet, so I'm assuming there was a good reason for it :)

# Rates

Now that we understand how the ladder works and how to bootstrap it, we need a source of rates in order to drive the engine.

We have a number of sources of rate data.

* Shiller provides 10 year yields on Treasuries, going back to 1871
* Shiller provides 1 year interest rates, going back to 1871
* [FRED provides 1-, 2-, 3-, 5-, and 7-year rates](1). The data begins in the 1954-1977 range. When available, we prefer the FRED data over Shiller data.

So we will start by importing those. (I've spliced them all into a single CSV file to make importing things simpler.)

[1]: https://fred.stlouisfed.org/series/GS1

In [5]:
historical_rates = pandas.read_csv('bond_rates.csv')
historical_rates.head()

Unnamed: 0,Year,1 year,2 year,3 year,5 year,7 year,10 year
0,1871,0.0635,,,,,0.0532
1,1872,0.0781,,,,,0.0536
2,1873,0.0835,,,,,0.0558
3,1874,0.0686,,,,,0.0547
4,1875,0.0496,,,,,0.0507


## Rate interpolation

For a given year, we will have **some** rate data. At the very least we will have the 1-year and 10-year rates; the data on those go back the further thanks to Shiller.

However, we may *also* have other rate data from FRED.

But we need to have rate data for every year on the yield curve. That is: 1-, 2-, 3-, 4-, 5-, 6-, 7-, 9-, and 10-year rates. When we don't have the data available we will perform linear interpolation from data we *do* have to fill in the gaps.

So if we only have the 1- and 10-year data then we need to do a linear interpolation for the other 8 years. If we have 1-, 3-, and 10-year data then we do linear interpolation between the 1- and 3-year data to fill in the 2-year data. And we'll do linear interpolation between the 3- and 10-year data for the rest.

### Potential problems with linear interpolation

This linear interpolation is not perfect: it assumes that the yield curve is linear and that may not be the case. In particular, [look at this post from Fryxell](1) where he notes that before the 1920s the yield curve may have looked very different from what it did today.

Still, trying to handle that is beyond the scope of this simulation. The more historical data (like those extra FRED data points) that we have, the less of this linear interpolation we need to do. That makes our post-1954 numbers better than the earlier numbers.

[1]: https://www.bogleheads.org/forum/viewtopic.php?f=10&t=179425&start=100#p2973643

In [6]:
def interpolate_rates(raw_rates, steps=9):
    # First try to pre-load any FRED rates.
    # We use NaN to indicate "the data needs to be interpolated"
    s = pandas.Series(math.nan, index=numpy.arange(steps+1))
    s.iloc[0] = raw_rates['1 year']
    s.iloc[1] = raw_rates['2 year']
    s.iloc[2] = raw_rates['3 year']
    s.iloc[4] = raw_rates['5 year']
    s.iloc[6] = raw_rates['7 year']
    s.iloc[9] = raw_rates['10 year']
    
    def left_number(series, index):
        if not math.isnan(series.iloc[index]):
            return index
        else:
            return left_number(series, index-1)
        
    def right_number(series, index):
        if not math.isnan(series.iloc[index]):
            return index
        else:
            return right_number(series, index+1)
        
    # now fill in the gaps with linear interpolation.
    for i in (1, 2, 3, 4, 5, 6, 7, 8):
        if math.isnan(s.iloc[i]):
            #print('Interpolating year', i+1)
            left = left_number(s, i)
            right = right_number(s, i)
            steps = right - left
            # Once we've figured out where there is good data around the hole,
            # we can interpolate.
            rate = s.iloc[left] + ((s.iloc[right] - s.iloc[left]) * (i - left) / steps)
            s.iloc[i] = rate

    return s.tolist()

In [7]:
interpolate_rates(historical_rates.iloc[0])

[0.0635,
 0.06235555555555555,
 0.061211111111111105,
 0.060066666666666664,
 0.05892222222222222,
 0.057777777777777775,
 0.05663333333333333,
 0.055488888888888886,
 0.054344444444444445,
 0.0532]

# Putting it all together

Now we have all the building blocks. We have a source of rates. We have a way to bootstrap our ladder. We have a way to see how the NAV changes over time.

We only have one decision left to make -- what are the youngest & oldest maturities that we care about? Do we want a 10-2 fund? Or 10-4 fund?

Well...due to the way I've implemented things, you always have to use a 10-year old fund as your oldest. Sorry. That's just the way it is. But you get to choose what the youngest fund is.

Do you want a 10-2 fund, or 10-4 fund, or something else? That's actually done by the way you create the bootstrap ladder. This is how you build a 10-2 ladder.

In [8]:
ladder = bootstrap(historical_rates.iloc[0]['10 year'], 10, 2)

In [9]:
def loop(ladder, rates):    
    df = pandas.DataFrame(columns=['NAV', 'Payments', 'Change'], index=numpy.arange(1871, 2017))

    for (i, current_rates) in rates:
        (ladder, payments, nav) = iterate_fund(ladder, interpolate_rates(current_rates))
        df.iloc[i] = {'NAV' : nav, 'Payments' : payments}

    return df

In [10]:
returns = loop(ladder, historical_rates.iterrows())
returns.head()

Unnamed: 0,NAV,Payments,Change
1871,579.462,78.8835,
1872,594.259,81.8208,
1873,613.245,85.6115,
1874,662.586,91.6101,
1875,728.869,98.4884,


In [11]:
def calculate_returns(df):
    # Calculate the NAV changes
    max_row = df.shape[0]
    for i in range(max_row - 1):
        next_nav = df.iloc[i+1]['NAV']
        nav = df.iloc[i]['NAV']
        change = (next_nav - nav) / nav
        df.iloc[i]['Change'] = change
    return df

calculate_returns(returns)
returns.head()

Unnamed: 0,NAV,Payments,Change
1871,579.462,78.8835,0.0255342
1872,594.259,81.8208,0.0319495
1873,613.245,85.6115,0.0804593
1874,662.586,91.6101,0.100036
1875,728.869,98.4884,0.0578791


It matches the 10-2 spreadsheet perfectly up until 1954...at which point longinvest starts using FRED data where available (e.g. for 1-year, 3-year, and 5-year) instead of continuing with linear interpolation.

TODO

- First, generalise things. Make it so I can do 10-2, 10-4, 4-2, 3-2, 30-11, 20-2.
- Handle the 2-year fund. What's this?
- Handle "Bond Ladder". The "Bond Fund" sells prior to maturity. The "Bond Ladder" holds to maturity.
- Once everything works for annual data, try monthly.