\graphicspath{ {C:/MFE/MFE Sem 3/STAT 439/STAT_439/Final Project/figures/} }
\usepackage[font=scriptsize,labelfont=bf]{caption}

\tableofcontents

\pagebreak

# Introduction: 

Exchage traded funds, ETF for short, are growing to be one of the largest new asset classes in the public markets today. In 2002 the total ETF assets under mamangement (AUM) was just $102 bb and in 2021, it had grown to $7,191 bb. This number is still growing. There are ETFs which just follow an index all the way to those which are essentially multi strategy hedge funds whose AUM grows through the ***create and redeem*** process. This process is not the focus of this paper but to quickly discuss this. Imagine an actively managed hedge fund has a portfolio of assets, they then are able to have "*buckets*" for creating or redeeming a share of said ETF. For the create side, an investor would approach a broker dealer to with the said "*create bucket*", just a portfolio of assets deemed by said ETF to be able to exchange for a share of the ETF. Then the broker dealer will initiate the exchange. The ETF will get the bucket of assets and the investor will get a share of the ETF. When the bucket in question is the "*redeem bucket*", this exchange of assets is just in reverse. This new ability for people to have access to either indexes, money managers, sectors, leverage by investing in a levered ETF, etc., along with the added liquidity these products provide is the main reason for their exponential growth. 

Since the ETF space is growing exponentially and allowing investors easier access to whole sectors, product types, strategies, funds, you name it, as well as the fact that these products are also tradeable in the market (regular buying and selling on an exchange), there comes a new opportunity. With a strong model for forecasting these products, a smart investor and/or trader now can take advantage of movements in areas once not able to take advantage of with the risks that come with trading derivatives. This is the goal of this paper. 

We will be conducting research into formualting an ARIMA model for Investment Grade (IG) Corporate Bond ETFs. The bond is not as liquid as the equity/derivative market but through ETFs, this liquidity issue is dampened. Thus, forming a model to be able to forecast price movements of an IG corporate bond ETF would allow a fixed income portfolio manager (PM) to not suffer from the lack of liquidity that trading singular bonds entails as well as avoiding the risk of derivatives that come with trading the accompanying index/s. For our purposes we will choose to model the BlackRock ETF "*iShares iBoxx $ Investment Grade Corporate Bond ETF*", which has the ticker symbol **LQD**. LQD's investment objective on [LQD's homepage](https://www.ishares.com/us/products/239566/LQD?cid=ppc:ishares_us:google:fund-names-priorities&gclid=CjwKCAiAs8acBhA1EiwAgRFdw4oE1UeE7tNO8c2Wdk4wXc1Bws2tNVdtapvDEiNLBcCoDFIBXoNhHhoCNIcQAvD_BwE&gclsrc=aw.ds):

> The iShares iBoxx $ Investment Grade Corporate Bond ETF seeks to track the investment results of an index composed of U.S. dollar-denominated, investment grade corporate bonds.

LQD is also one of if not the most popular IG corporate bond ETFs. We can measure this "*popularity*" by KPIs such as average volume and net assets. These stats can be found by going to ***yahoofinance.com*** and searching "*LQD*" in the search bar (or by clicking [here](https://finance.yahoo.com/quote/LQD?p=LQD&.tsrc=fin-srch)).

Our data, adjusted daily closing prices, was collected using the Python API **yfiance** ([documentation here](https://pypi.org/project/yfinance/)). This package is a webscraping package used to get historical data on any number of given names freely avaliable on [***yahoofinance.com***](https://finance.yahoo.com/). The reasoning behind using adjusted closing prices instead of closing prices is the dividend adjustment. If we were to model the actual price, forecasts will be off due to the LQD paying dividends. Since we are delaing with a publicly traded asset, the market is closed on weekends and some holidays. Thus, instead of using 365 days for a year, our data is actually 252 days per year on average. 

In [1]:
'''Importing Packages'''
import pandas as pd
import numpy as np
import seaborn as sb
import matplotlib.pyplot as plt
from matplotlib import dates
from IPython.display import Markdown as md
import statsmodels.tsa.stattools as ts
import statsmodels.api as sm
import datetime
from statsmodels.api import stats as sm
from loess import loess_1d
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from openpyxl import Workbook, load_workbook
from sklearn import linear_model
from statsmodels.tsa.ar_model import AutoReg, ar_select_order
from scipy.linalg import toeplitz
import math
import scipy.stats as stats
from statsmodels.tsa.api import ExponentialSmoothing, SimpleExpSmoothing, Holt
from statsmodels.tsa.exponential_smoothing.ets import ETSModel
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.arima.model import ARIMA
import yfinance as yf
from IPython.display import Latex as Latex
from Converter import *
%matplotlib inline

SyntaxError: invalid syntax (1630057074.py, line 26)

# Data

We apply the general "*10-year*" rule to the amount of data we bring in (if the holding period of a strategy is a day or less, a backtest should be no less than 10 years). Our start and end dates for our data are 1/3/2012 and 1/3/2022 respectively. These are the first trading days of 2012 and 2022. This gives us a total of $2518$ data points.  

In [34]:
'''importing the data'''
start = datetime.date(2012,1,3)
end = datetime.date(2022,1,4)
data = yf.download(['LQD'], start=start, end=end, progress=False)['Adj Close']
data = pd.DataFrame(data)
data.columns = ['LQD']

In [39]:
'''Graphing the time-series'''
with plt.style.context('seaborn'):
    fig = plt.figure(figsize=(12,6))
    ax = plt.axes()
    plt.plot(data)
    #plt.scatter(data.index, data['LQD'].values)
    plt.title('LQD\n {} - {}'.format(start.strftime('%m/%d/%Y'), (end-datetime.timedelta(days = 1)).strftime('%m/%d/%Y')))
    plt.xticks([data.loc[i].index.tolist()[0] for i in [str(j) for j in range(2012, 2023)]], [str(j) for j in range(2012, 2023)])
    plt.ylabel('Price')
    plt.xlabel('Date')
    #plt.show()
    plt.savefig('./figures/LQD_Price.png')
    plt.close()



In [None]:
bShowInline = True  # Set = False for document generation

def makeplot( plt, figlabel, figcaption):
    figname = figlabel+'.png'

    plt.savefig(figname)

    if bShowInline:
        plt.show()
    else:
        plt.close()

    strLatex="""
    \\begin{figure}[b]
    \centering
        \includegraphics[width=\\textwidth]{%s}
        \caption{%s}
        \label{fig:%s}
    \end{figure}"""%(figname, figcaption, figlabel) 
    return display(Latex(strLatex)) 

In [54]:
figname = 'LQD_Price.png'
figcaption = '''time-series plot of LQD's price from 01/03/2012 - 01/03/2022'''
figlabel =  'price1'
strLatex="""
    \\begin{figure}[b]
    \centering
        \includegraphics[width=\\textwidth]{%s}
        \caption{%s}
        \label{fig:%s}
    \end{figure}"""%(figname, figcaption, figlabel)
Latex(strLatex)

<IPython.core.display.Latex object>