# Treasury Preprocessing

Goal: Tansform Date column in the yields data to a common format that will be used in each data frame. Will need the same date format to properly join the data frames.

Yields data was sourced from the U.S. Department of the Treasury Daily Treasury Yield Curve Rates (https://www.treasury.gov/resource-center/data-chart-center/interest-rates/Pages/TextView.aspx?data=yield)

Will be used to inform the risk-free rate that corresponds in date and time to maturity for the options data (which will ultimately be used as an input to the multi-layer perceptron models)

In [1]:
import pandas as pd

In [2]:
%cd '/Users/benjochem/Desktop/Junior/Research'

/Users/benjochem/Desktop/Junior/Research


In [3]:
yields = pd.read_excel('Project/data/raw/yields.xlsx')
yields

Unnamed: 0,Date,1 Mo,2 Mo,3 Mo,6 Mo,1 Yr,2 Yr,3 Yr,5 Yr,7 Yr,10 Yr,20 Yr,30 Yr
0,2010-01-04,0.05,,0.08,0.18,0.45,1.09,1.66,2.65,3.36,3.85,4.60,4.65
1,2010-01-05,0.03,,0.07,0.17,0.41,1.01,1.57,2.56,3.28,3.77,4.54,4.59
2,2010-01-06,0.03,,0.06,0.15,0.40,1.01,1.60,2.60,3.33,3.85,4.63,4.70
3,2010-01-07,0.02,,0.05,0.16,0.40,1.03,1.62,2.62,3.33,3.85,4.62,4.69
4,2010-01-08,0.02,,0.05,0.15,0.37,0.96,1.56,2.57,3.31,3.83,4.61,4.70
...,...,...,...,...,...,...,...,...,...,...,...,...,...
2497,2019-12-24,1.55,1.58,1.58,1.61,1.53,1.62,1.64,1.72,1.83,1.90,2.20,2.33
2498,2019-12-26,1.59,1.60,1.58,1.61,1.53,1.64,1.65,1.72,1.85,1.90,2.19,2.33
2499,2019-12-27,1.56,1.56,1.57,1.59,1.51,1.59,1.60,1.68,1.80,1.88,2.18,2.32
2500,2019-12-30,1.51,1.53,1.57,1.60,1.57,1.58,1.59,1.68,1.81,1.90,2.21,2.34


In [4]:
yields.isnull().sum()

Date        0
1 Mo        1
2 Mo     2201
3 Mo        1
6 Mo        1
1 Yr        1
2 Yr        1
3 Yr        1
5 Yr        1
7 Yr        1
10 Yr       1
20 Yr       1
30 Yr       1
dtype: int64

In [5]:
# Convert date column to a format corresponding with options data (yyyymmdd)
#  yields will need to be matched to options by date
def date_to_numeric(date = []):
    converted = []
    for d in date:
        d = str(d)
        d = d[0:11]
        d = d.replace('-','')[0:8]
        d = int(d)
        converted.append(d)
    return converted

In [6]:
# apply function to Date column in yields and add correctly formatted date column
test = date_to_numeric(yields['Date'])
yields['date'] = test
yields.head()

Unnamed: 0,Date,1 Mo,2 Mo,3 Mo,6 Mo,1 Yr,2 Yr,3 Yr,5 Yr,7 Yr,10 Yr,20 Yr,30 Yr,date
0,2010-01-04,0.05,,0.08,0.18,0.45,1.09,1.66,2.65,3.36,3.85,4.6,4.65,20100104
1,2010-01-05,0.03,,0.07,0.17,0.41,1.01,1.57,2.56,3.28,3.77,4.54,4.59,20100105
2,2010-01-06,0.03,,0.06,0.15,0.4,1.01,1.6,2.6,3.33,3.85,4.63,4.7,20100106
3,2010-01-07,0.02,,0.05,0.16,0.4,1.03,1.62,2.62,3.33,3.85,4.62,4.69,20100107
4,2010-01-08,0.02,,0.05,0.15,0.37,0.96,1.56,2.57,3.31,3.83,4.61,4.7,20100108


In [7]:
# save correctly formated yields data to interim data directory
yields.drop('Date', axis=1).to_csv('Project/Data/interim/fixed_yields.csv', index = False)